deeplearning.ai - 超参数调试、Batch正则化、程序框架-白红宇

deeplearning.ai - 超参数调试、Batch正则化、程序框架

阅读量：4090 次

发布时间：2019-05-25

本文共 1263 字，大约阅读时间需要 4 分钟。

吴恩达 Andrew Ng

Hyperparameter Tuning

Tuning process

$\alpha$ is the most important

and then $\beta$ , hidden units, mini-batch size

last layers, learning rate decay

Some advices

Try random values, don’t use a grip, just sampling randomly

Coarse to fine 由粗到细的搜索

Using an appropriate scale to pick hyperparameters

对于层数和隐藏单元数量，可以在一定范围内随机选取

对于 $\alpha$ ，取对数后再在数轴上随机选取

考虑 $1-\beta$

对于极小的变化也很敏感

Hyperparameters tuning in practice: Pandas vs. Caviar

Re-evaluate occasionally

Babysitting one model (panda)

Training many models in parallel (caviar)

根据计算资源的情况选取不同的方式

Batch Normalization

Normalizing activations in a network

对于隐藏结点的输出也进行归一化

可以是对 $A^{[l]}$ 或者 $Z^{[l]}$ (隐藏层的)

$Z_{norm}=\frac{Z-\mu}{\sqrt{\sigma\, ^2+\epsilon}}$ ，变为均值为0，方差为1的分布（ $\epsilon$ 是个很小的数，防止分母为0的情况出现）

$\tilde Z = \gamma\, Z_{norm}+\beta$ ， $、\gamma 、\beta$ 是需要学习的参数，改变其分布

Adding batch norm to a network

tf.nn.batch_normalization()

计算完一层的输出Z后，接着归一化

加速学习

batch norm handles data one mini-batch at a time

Why does batch norm work

covariate shift

减少了隐藏单元的分布变化

slight regularization effect

Softmax 回归

多分类

generalization of logistic regression to more than two classes

mapping from $Z$ to probability

$dz=\hat y-y$ (back propagation)

Deep learning frameworks

Caffe、CNTK、DL4J、Keras、Lasagne、mxnet、PaddlePaddle、Tensorflow、Theano、Torch

转载地址：http://hudii.baihongyu.com/

你可能感兴趣的文章

Spring websocket在线聊天室

查看>>

使用天码营前端预览工具：Web前端开发（HTML/CSS/JavaScript）实验

Android App的设计架构：MVC,MVP,MVVM与架构经验谈

使用Spring Data JPA访问关系型数据库

查看>>

Spring Data Jpa: 分页和排序

查看>>

Spring Data Jpa 使用@Query标注自定义查询语句

查看>>

使用Spring Security进行权限验证

查看>>

OAuth2.0认证和授权机制讲解

查看>>

Leetcode 1180. Count Substrings with Only One Distinct Letter [Python]

查看>>

Leetcode 1478. Allocate Mailboxes [Python]

查看>>