博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Training set size for neural networks considering curse of dimensionality
阅读量:4088 次
发布时间:2019-05-25

本文共 2454 字,大约阅读时间需要 8 分钟。

I'm learning the ropes of neural networks. Recently, I read stuff about the curse of dimensionality and how it might lead to overfitting (e.g. ).

If I understand correctly, the number of features (dimensions) d of a given dataset with n data points is very important when considering the size t of the training set.

QUESTIONS

(...not sure if all my questions are really connected to the curse of dimensionality)

  1. How do I choose the correct training size t considering d and n? Is t a function of d and n?
  2. Do I have to consider d for regularization?

 

One of the rules of thumb is to have at least 10x more data points as the number of dimensions. Using some intelligent prior information (e.g. good kernel in SVMs), you might even learn a good machine with less data points as dimensions.

The lecture about VC dimension from Yaser Abu-Mostafa motivates this 10x rule with some nice charts. If you are not familiar with VC dimension concept, it is about the capacity of learning. The higher the dimension, the more complex problem we are trying to solve. For example, classical Perceptron has d+1 VC dimensions. Some problems have infinite VC dimensions, such problems are impossible to learn.

A neural net is a linear model in derived variables. Take the regression case, because it is a little bit simpler:

where XX is your data (i.e.: your features), ΓΓ are matrices of weights, γγ are "biases", and ββ are your weights connecting the topmost hidden layer to the output. You see that it is nothing more than a linear model, but in nonlinear functions of XX .

Just like in a linear model, you can overfit when you have too many parameters. A typical strategy for avoiding overfit is regularization. Rather than solving

a

in ridge regression, for example. Selecting λλ by cross-validation, you're effectively letting the data tell you how much to use your many dimensions.

This generalized directly to neural nets, except that there is no closed form solution to the minimization problem, as there is in ridge regression. You'll overfit if you do

where θθ is a concatenated vector of all of your weights.

Note that the quadratic penalty here isn't the only form of regularization. You could also do L1 regularization, or dropout regularization.

But the idea is the same: build a model that will overfit the data, and then find a regularization parameter (by some variant of cross validation) that will constrain the variability such that you don't overfit.

 

转载地址:http://vzuii.baihongyu.com/

你可能感兴趣的文章
Springboot+Mybatis+Maven项目导出.csv文件
查看>>
Springboot中实现跨域问题,实现前后端完全分离并方便测试.
查看>>
springboot下csv文件下载需要注意得一些细节以及功能得完善
查看>>
MongoDB 环境配置和基本操作-----专属学习笔记
查看>>
Ubuntu 16.04系统下搭建GitLab Server
查看>>
vim编辑器基础指令
查看>>
Linux常用指令笔记
查看>>
GitLab Server 发送邮件给新增用户使其通过邮箱中的链接去指定地址修改登录密码
查看>>
Ubuntu16.04环境下Mysql5.7服务器搭建教程
查看>>
Gradle中配置MyBatis Generator生成映射文件以及映射接口
查看>>
atom常用配置笔记
查看>>
redis安装以及三种启动配置方式
查看>>
Ubuntu下JDK安装教程
查看>>
ubuntu16.04环境下离线安装nodejs教程
查看>>
Ubuntu16.04 环境RabbitMq部署
查看>>
RabbitMq的基本原理概念特性以及使用场景理解
查看>>
Ubuntu16.04 环境kafka部署以及项目demo
查看>>
zookeeper知识点总结
查看>>
Ubuntu 16.04环境下部署maven
查看>>
Ubuntu 16.04 环境使用Sonatype Nexus 搭建私有maven仓库
查看>>