Using Hyperband for TensorFlow hyperparameter tuning with keras-tuner

In the previous article, I have shown how to use keras-tuner to find hyperparameters of the model randomly. Fortunately, there is a way better method of searching for hyperparameters.

Hyperband

The method is called Hyperband. It is based on the idea that when the hyperparameters give us poor results, we can quickly spot it, so it makes no sense to continue training.

Because of that, the implementation of Hyperband trains multiple models for a small number of epochs. After that, it picks the best performing models and continues training them for a few more epochs. The cycle of picking the best models and training them a little bit more continues until we get the one, best model.

Keras-tuner

This algorithm is one of the tuners available in the keras-tuner library. In the previous article, I have described how to install the library (I had to install it directly from the GitHub repository because at the time of writing this article it was still in a pre-alpha version).

We must define a function that gets the parameters as the argument and returns a compiled model. Later, we will pass that function to the tuner.

def build_model(hp):
  model = keras.Sequential([
    keras.layers.Conv2D(
        filters=hp.Range('conv_1_filter', min_value=64, max_value=128, step=16),
        kernel_size=hp.Choice('conv_1_kernel', values = [3,5]),
        activation='relu',
        input_shape=(28,28,1)
    ),
    keras.layers.Flatten(),
    keras.layers.Dense(
        units=hp.Range('dense_1_units', min_value=32, max_value=128, step=16),
        activation='relu'
    ),
    keras.layers.Dense(10, activation='softmax')
  ])

  model.compile(optimizer=keras.optimizers.Adam(),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

  return model

When the function is ready, I can define the tuner. Note that I have to set four parameters:

  • max_trials - the maximal number of model training sessions
  • min_epochs - the model will be trained for at least that number of epochs before its performance is compared with other models
  • max_epochs - the maximal number of training epochs
  • factor - the denominator of the number of models trained in every iteration of the training loop.

For example, if the tuner has trained 9 models in the first loop. 9 gets divided by 3, and it continues training 3 models for a few more epochs. After that, it divides the number of models again and picks 1 model for additional training.

from kerastuner.tuners import Hyperband

tuner = Hyperband(
    build_model,
    objective='val_accuracy',
    max_trials=10,
    factor=3,
    min_epochs=2,
    max_epochs=5,
    directory='output',
    project_name='FashionMNIST')

After configuring the tuner, we can call the search function and start searching for the best hyperparameters.

tuner.search(train_images, train_labels, validation_split=0.1)

TypeError: ‘<’ not supported between instances of ‘NoneType’ and ‘float’

However, there is one little problem. If this bug has not been fixed yet, the code will fail with a TypeError. It happens because, in the current implementation of Hyperband, a None value is appended to an array which is supposed to contain only numbers.

There are two ways to fix that problem. The first one (probably better), don’t use a library which is still in a pre-alpha version ;)

The second one, if you still want to use keras-tuner, do a little bit of “monkey-patching.” The problematic code is in the _select_candidates function of the HyperbandOracle class, which is used inside Hyperband.

We can copy the code from GitHub, paste it into our Jupyter Notebook, and fix the bug. When I define a new type with the order defined in such a way that it always ends up at the end of a sorted array and I modify the lambda function used for sorting, I get this code:

from functools import total_ordering

@total_ordering
class MaxType(object):
    def __le__(self, other):
        return False

    def __eq__(self, other):
        return (self is other)

Max = MaxType()

def _select_candidates(self):
        sorted_candidates = sorted(list(range(len(self._candidates))),
                                   key=lambda i: Max if self._candidate_score[i] is None else self._candidate_score[i])
        num_selected_candidates = self._model_sequence[self._bracket_index]
        for index in sorted_candidates[:num_selected_candidates]:
            self._queue.put(index)

In Python, we can replace object methods at runtime (even though it is a terrible practice) by creating a new MethodType and assigning it to the function name.

import types

tuner.oracle._select_candidates = types.MethodType(_select_candidates, tuner.oracle)

After doing that ugly hack, I can rerun the search function. This time everything works fine.

When the function finishes, I can retrieve the model with the best hyperparameters from the tuner and either start using it or continue training it.

best_model = tuner.get_best_models(num_models=1)[0]
Older post

Using keras-tuner to tune hyperparameters of a TensorFlow model

Tuning Keras hyperparameters with keras-tuner

Newer post

How to automatically select the hyperparameters of a ResNet neural network

Training ResNet network for multiclass image classification using keras-tuner