This blog post contains a summary of Andrew Ng’s advice regarding choosing the mini-batch size for gradient descent while training a deep learning model. Fortunately, this hint is not complicated, so the blog post is going to be extremely short ;)

Table of Contents

  1. Get Weekly AI Implementation Insights

Andrew Ng recommends not using mini-batches if the number of observations is smaller then 2000. In all other cases, he suggests using a power of 2 as the mini-batch size. So the minibatch should be 64, 128, 256, 512, or 1024 elements large.

Get Weekly AI Implementation Insights

Join engineering leaders who receive my analysis of common AI production failures and how to prevent them. No fluff, just actionable techniques.

The most important aspect of the advice is making sure that the mini-batch fits in the CPU/GPU memory! If data fits in CPU/GPU, we can leverage the speed of processor cache, which significantly reduces the time required to train a model!

Get Weekly AI Implementation Insights

Join engineering leaders who receive my analysis of common AI production failures and how to prevent them. No fluff, just actionable techniques.

Older post

How to deal with underfitting and overfitting in deep learning

The lessons learned from Andrew Ng’s online course

Newer post

Which hyperparameters of deep learning model are important and how to find them

How to speed up finding the right hyperparameters of a machine learning model

Engineering leaders: Is your AI failing in production? Take the 10-minute assessment
>