本文共 1263 字,大约阅读时间需要 4 分钟。
吴恩达 Andrew Ng
Hyperparameter Tuning
Tuning process
- α α is the most important
- and then β β , hidden units, mini-batch size
- last layers, learning rate decay
Some advices
- Try random values, don’t use a grip, just sampling randomly
- Coarse to fine 由粗到细的搜索
Using an appropriate scale to pick hyperparameters
- 对于层数和隐藏单元数量,可以在一定范围内随机选取
- 对于 α α ,取对数后再在数轴上随机选取
- 考虑 1−β 1 − β
- 对于极小的变化也很敏感
Hyperparameters tuning in practice: Pandas vs. Caviar
- Re-evaluate occasionally
- Babysitting one model (panda)
- Training many models in parallel (caviar)
- 根据计算资源的情况选取不同的方式
Batch Normalization
Normalizing activations in a network
- 对于隐藏结点的输出也进行归一化
- 可以是对 A[l] A [ l ] 或者 Z[l] Z [ l ] (隐藏层的)
- Znorm=Z−μσ2+ϵ√ Z n o r m = Z − μ σ 2 + ϵ ,变为均值为0,方差为1的分布( ϵ ϵ 是个很小的数,防止分母为0的情况出现)
- Z~=γZnorm+β Z ~ = γ Z n o r m + β , 、γ、β 、 γ 、 β 是需要学习的参数,改变其分布
Adding batch norm to a network
tf.nn.batch_normalization()
- 计算完一层的输出Z后,接着归一化
- 加速学习
- batch norm handles data one mini-batch at a time
Why does batch norm work
- covariate shift
- 减少了隐藏单元的分布变化
- slight regularization effect
Softmax 回归
- 多分类
- generalization of logistic regression to more than two classes
- mapping from Z Z to probability
- dz=y^−y (back propagation)
Deep learning frameworks
Caffe、CNTK、DL4J、Keras、Lasagne、mxnet、PaddlePaddle、Tensorflow、Theano、Torch
转载地址:http://hudii.baihongyu.com/