Using keras-tuner to tune hyperparameters of a TensorFlow model

In this article, I am going to show how to use the random search hyperparameter tuning method with Keras. I decided to use the keras-tuner project, which at the time of writing the article has not been officially released yet, so I have to install it directly from the GitHub repository.

Table of Contents

Parameters
1. Range
2. Linear
3. Choice
4. Fixed
How to define the model
Configure the tuner
Hyperparameter tuning
Using the model

#remove ! if your are not running it in Jupyter Notebook
!git clone https://github.com/keras-team/keras-tuner.git
!pip install ./keras-tuner

As an example, I will use the Fashion-MNIST dataset, so the goal is to perform a multiclass classification of images. First, I have to load the training and test dataset. Fashion-MNIST is available as one of the Keras built-in datasets, so the following code downloads everything I need.

import tensorflow as tf
from tensorflow import keras
import numpy as np

fashion_mnist = keras.datasets.fashion_mnist

(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

The images have been already preprocessed, so currently, the dataset contains one channel (gray-scale) of color values in the range 0-255). I want to scale the values to range between 0 and 1, so I divide them by 255.

train_images = train_images / 255.0
test_images = test_images / 255.0

I am going to reshape the dataset to use it as an input of the convolutional layer.

train_images = train_images.reshape(len(train_images), 28, 28, 1)
test_images = test_images.reshape(len(test_images), 28, 28, 1)

Parameters

Keras-tuner needs a function that accepts the set of parameters and returns a compiled model, so I have to define such function.

There are four kinds of parameters available: range, choice, linear, and fixed.

Range

The range returns integer values between the given minimum and maximum. The values are incremented by the step parameter.

hp.Range('conv_1_filter', min_value=64, max_value=128, step=16)

Linear

The liner parameter is similar to the range but works with float numbers. In this case, the step is called resolution.

hp.Linear('learning_rate', min_value=0.01, max_value=0.1, resolution=0.1)

Choice

The choice parameter is much simpler. We give it a list of values, and it returns one of them.

hp.Choice('learning_rate', values=[1e-2, 1e-3])

Fixed

Finally, we can set a constant as the parameter value. It is useful when we want to let keras-tuner tune all parameters except one. The fixed parameter works only with the predefined models: Xception and ResNet.

hp.Fixed('learning_rate', value=1e-4)

How to define the model

Here is my function that builds a neural network using the parameters given by keras-tuner. Even though it is not necessary in this case, I will parameterize all layers and the learning rate, to show that it is possible.

def build_model(hp):
  model = keras.Sequential([
    keras.layers.Conv2D(
        filters=hp.Range('conv_1_filter', min_value=64, max_value=128, step=16),
        kernel_size=hp.Choice('conv_1_kernel', values = [3,5]),
        activation='relu',
        input_shape=(28,28,1)
    ),
    keras.layers.Conv2D(
        filters=hp.Range('conv_2_filter', min_value=32, max_value=64, step=16),
        kernel_size=hp.Choice('conv_2_kernel', values = [3,5]),
        activation='relu'
    ),
    keras.layers.Flatten(),
    keras.layers.Dense(
        units=hp.Range('dense_1_units', min_value=32, max_value=128, step=16),
        activation='relu'
    ),
    keras.layers.Dense(10, activation='softmax')
  ])

  model.compile(optimizer=keras.optimizers.Adam(hp.Choice('learning_rate', values=[1e-2, 1e-3])),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

  return model

Configure the tuner

When the function is ready, I have to configure the tuner. We need to specify the objective, which is the metric used to compare models. In this case, I want to use validation set accuracy.

The other important parameter is the number of trails. That parameter tells the tuner how many hyperparameter combinations it has to test.

I must also specify the name and the output directory. It tells the tuner where it should store the debugging data.

Note that I passed the function defined above as the first parameter!

from kerastuner.tuners import RandomSearch

tuner = RandomSearch(
    build_model,
    objective='val_accuracy',
    max_trials=5,
    directory='output',
    project_name='FashionMNIST')

Hyperparameter tuning

Now, I have a configured tuner. It is time to run it. I need the training datasets, and the number of epochs is every trial. I must also specify the validation dataset or the percentage of training dataset that will be used for validation.

I call the search function, and eventually, I will get the results of the tuning.

tuner.search(train_images, train_labels, epochs=2, validation_split=0.1)

Using the model

When the search is done, I can get the best model and either start using it or continue training.

model = tuner.get_best_models(num_models=1)[0]

In this example, I trained the model for only two epochs, so I will continue training it, starting from the third epoch.

model.fit(train_images, train_labels, epochs=10, validation_split=0.1, initial_epoch=2)

Using keras-tuner to tune hyperparameters of a TensorFlow model

Parameters

Range

Linear

Choice

Fixed

How to define the model

Configure the tuner

Hyperparameter tuning

Using the model

Understanding the Keras layer input shapes

Using Hyperband for TensorFlow hyperparameter tuning with keras-tuner

Using keras-tuner to tune hyperparameters of a TensorFlow model

Parameters

Range

Linear

Choice

Fixed

How to define the model

Configure the tuner

Hyperparameter tuning

Using the model

Understanding the Keras layer input shapes

Using Hyperband for TensorFlow hyperparameter tuning with keras-tuner

Related Posts

Why do we use dropout in artificial neural networks?

Using Boltzmann distribution as the exploration policy in TensorFlow-agent reinforcement learning models

How to turn Pandas data frame into time-series input for RNN