This lesson is still being designed and assembled (Pre-Alpha version)

Tuning Neural Network Models for Better Accuracy

Overview

Teaching: 15 min
Exercises: 50 min
Questions
  • What is model tuning in Deep Learning?

  • What are the different types of tuning applicable to a neural network model?

  • What are the effects of tuning a particular hyperparameter to the performance of a model?

  • Is Jupyter notebook the best platform for such experiments?

Objectives
  • Tweak and tune deep learning models to obtain optimal performance.

  • Understand the effect of tuning the different hyperparameters.

  • Acquiring the art and common sense of the hyperparameter tuning.

  • Convert Python codes in a Jupyter notebook to a script and submit the script to HPC using a job scheduler.

Introduction

In the previous episode, we successfully built and trained a few neural network (NN) models to distinguish 18 running apps in an Android phone. We tested a model without hidden layer, as well as a model with one hidden layer. We saw a significant improvement of accuracy by adding just one hidden layer. This poses an interesting question: What is the limit of NN models in achieving the highest accuracy (or a similar performance metric)? We can intuitively speculate that adding more layers into the model would result in better and better accuracy. We can continue this refinement by constructing models with two, three, four hidden layers, and so on and so forth. The number of combinations will explode quickly, as each layer may also be varied in the number of hidden neurons. Every modified model must be retrained, which will make the entire process prohibitively expensive. We inevitably would have to stop the refinement at a certain point. All these testings are part of model tuning, an iterative process of refining an NN model so that it yields the best performance for the given task (such as smartphone app classification, in our case). In the model tuning process, we iteratively modify the NN model hyperparameters to find one model that has the most optimal performance.

In this episode, we will present a typical scenario for tuning an NN model. Consider the case of 18-apps classification task again: All the models have 19 input and 18 output nodes. The hidden layers of the model can be greatly varied, for example:

The following hyperparameters in a model can be adjusted to find the best-performing NN model:

  1. the number of hidden layers (i.e. the depth of the network)
  2. the number of neurons per hidden layer (the width of the layer)

Collectively, the size of network inputs and outputs, plus the number of hidden layers and the number of neurons on each hidden layer, determine the architecture of an NN model.

The learning rate and batch size can also be adjusted. Although they are not part of the network architecture per se, they may affect the final accuracy of the model. So it is also important to find the optimal values for these hyperparameters as well.

Basic Procedure of Neural Network Model Tuning

The tuning process involves scanning the hyperparameter space, re-training the newly modified network and evaluate the model performance. A basic recipe for NN model tuning involves the following steps:

  1. First, define the hyperparameter space we want to scan (e.g. the number of hidden layers = 1, 2, 3, …; the number of hidden neurons = 25, 50, 75, …).

  2. Define (build) a new NN model with a specified hyperparameter setting (number of hidden layers, number of neurons in each layer, learning rate, batch size, …).

  3. Train and validate the new model. From this process, we will want to compute and save the performance metrics of this model (i.e., one or more of: accuracy, precision, recall, etc.).

  4. Repeat steps 1 and 2 until all the configurations we want to test have been tested. As you may anticipate, we will have to do a lot of trainings (at least one training per model).

  5. Once we obtain all the performance metrics from each model, we will analyze these results to decide the most optimal NN model hyperparameter setting to achieve the best performance.

The following diagram shows the cycle of NN model tuning:

Typical diagram of tuning for Machine Learning/Neural Network models

The optimal configuration is determined by the trade-off of the maximally achieveable accuracy versus the computational cost of training even more complex NN models.

Preparing Python Environment & the Dataset

Before diving into the model tuning experiments, let us prepare our Python environment in the same way as in previous episode, then load and preprocess the sherlock_18apps data. (If you have just completed the previous episode on NN modeling for the sherlock_18apps dataset within the same interactive Python/Jupyter session that will be used for this hands-on activity, you do not need to perform this step.)

Loading Libraries & sherlock_18apps Dataset

First, load the Python libraries and the sherlock_18apps dataset by running the commands in the Prep_ML.py script. In your current Jupyter session, use the magic command %load to run the data preparation script:

%load Prep_ML.py

Press Shift+Enter to execute this command; the contents of the script will be loaded into the active cell. Once loaded, press Shift+Enter once more to execute all the Python commands read from Prep_ML.py and get your environment ready. Please refer to “Data Preprocessing and Cleaning: A Review” section on the previous episode for the contents of Prep_ML.py and the expected outcome.

As the last step, do remember to do one-hot encoding for the labels (Prep_ML.py already takes care of one-hot encoding in the feature matrix):

train_L_onehot = pd.get_dummies(train_labels)
test_L_onehot = pd.get_dummies(test_labels)

Next, load the TensorFlow, Keras, and visualization libraries:

# Import libraries
import tensorflow as tf
import tensorflow.keras as keras

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras import optimizers
from tensorflow.keras.model import save_model, load_model

import matplotlib.pyplot as plt

Python Library: Gathering Useful Tools into a Toolbox

Programming Challenge: Writing a Function for Data Preprocessing

From this point, we will program in Python more intensively as we need to repeat many computations that are very similar (or identical) in nature. As the first case, we have repeated the data preparation above, which was first used in the previous episode.

many parts of the program written above.

Instead of calling the Prep_ML.py file everytime we want to preprocess the dataset, we can create a function that performs the preprocessing for us. We can save this function to a file called ML_toolbox.py, so it can be imported for easy use.

def prep_ml(df):
    """ToDo: Summarize the dataset"""
   
    """ToDo: Delete irrelevant features and missing or bad data"""
   
    """ToDo: Separate labels from features"""
    
    """ToDo: Perform one-hot encoding for **all** categorical features."""
   
    """ToDo: Feature scaling using StandardScaler."""
   
    """ToDo: Perform train-test split on the master dataset."""

    return train_features, test_features, train_labels, test_labels

To test the function, create a new script that imports ML_toolbox.py, run the function, and then print out the return values.

The Baseline Model

Let us start by building a simple neural network model with one hidden layer. This will serve as a baseline model, which we will attempt to improve through the tuning process below:

def NN_Model_1H(hidden_neurons, learning_rate):
    """Definition of deep learning model with one dense hidden layer"""
    model = Sequential([
        # More hidden layers can be added here
        Dense(hidden_neurons, activation='relu', input_shape=(19,),
              kernel_initializer='random_normal'), # Hidden Layer
        Dense(18, activation='softmax',
              kernel_initializer='random_normal')  # Output Layer
    ])
    adam_opt = Adam(lr=learning_rate, beta_1=0.9, beta_2=0.999, amsgrad=False)
    model.compile(optimizer=adam_opt,
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    return model

Reasoning for the Baseline Model

Why do we use a model with one hidden layer as a baseline, instead of the model with no hidden layer? Discuss this with your peer.

Solution

We usually want to start with a fairly reasonable model as the baseline for tuning. The no-hidden-layer model has no hidden neurons by definition, so it lacks an important hyperparameter. Therefore the model’s usefulness as a baseline will be limited. We therefore use the one-hidden-layer model as our baseline.

More specifically, the baseline neural network model will have 18 neurons in the hidden layer. It will be trained with Adam optimizer with learning rate of 0.0003, batch size of 32, and epoch of 10. Let us construct and train this model:

model_1H = NN_Model_1H(18,0.0003)
model_1H_history = model_1H.fit(train_features,
                                train_L_onehot,
                                epochs=10, batch_size=32,
                                validation_data=(test_features, test_L_onehot),
                                verbose=2)
Epoch 1/10
6827/6827 - 10s - loss: 1.1037 - accuracy: 0.6752 - val_loss: 0.5488 - val_accuracy: 0.8702
Epoch 2/10
6827/6827 - 9s - loss: 0.4071 - accuracy: 0.9047 - val_loss: 0.3205 - val_accuracy: 0.9245
Epoch 3/10
6827/6827 - 9s - loss: 0.2743 - accuracy: 0.9319 - val_loss: 0.2425 - val_accuracy: 0.9385
Epoch 4/10
6827/6827 - 9s - loss: 0.2177 - accuracy: 0.9468 - val_loss: 0.1990 - val_accuracy: 0.9509
Epoch 5/10
6827/6827 - 9s - loss: 0.1818 - accuracy: 0.9592 - val_loss: 0.1692 - val_accuracy: 0.9628
Epoch 6/10
6827/6827 - 7s - loss: 0.1561 - accuracy: 0.9664 - val_loss: 0.1470 - val_accuracy: 0.9671
Epoch 7/10
6827/6827 - 9s - loss: 0.1363 - accuracy: 0.9703 - val_loss: 0.1296 - val_accuracy: 0.9708
Epoch 8/10
6827/6827 - 9s - loss: 0.1209 - accuracy: 0.9740 - val_loss: 0.1171 - val_accuracy: 0.9739
Epoch 9/10
6827/6827 - 9s - loss: 0.1089 - accuracy: 0.9769 - val_loss: 0.1058 - val_accuracy: 0.9770
Epoch 10/10
6827/6827 - 7s - loss: 0.0995 - accuracy: 0.9786 - val_loss: 0.0970 - val_accuracy: 0.9792

Let us visualize the model training history by borrowing the following functions from the previous episode:

def plot_loss(model_history):
    # summarize history for loss
    plt.plot(model_history.history['loss'])
    plt.plot(model_history.history['val_loss'])
    plt.title('Model Loss')
    plt.ylabel('loss')
    plt.xlabel('epoch')
    plt.legend(['train', 'val'], loc='upper right')
    plt.show()

def plot_acc(model_history):
    # summarize history for accuracy
    plt.plot(model_history.history['accuracy'])
    plt.plot(model_history.history['val_accuracy'])
    plt.title('Model Accuracy')
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.legend(['train', 'val'], loc='upper left')
    plt.show()
plot_loss(model_1H_history)
plot_acc(model_1H_history)

Loss function of the baseline (1H18N) model as a function of training iteration

Accuracy of the baseline (1H18N) model as a function of training iteration

History of Model Training in Keras

TODO: Describe the History object returned by model_1H.fit() function. See the fit method in https://keras.io/api/models/model_training_apis/. The History object returned by model_1H.fit() function in Keras is a record of the training process. It contains various metrics recorded at each epoch during training, as well as validation metrics if validation data was provided.

The History.history attribute of the History object is a dictionary with keys representing different metrics such as ‘loss’, ‘accuracy’, ‘val_loss’, and ‘val_accuracy’. The corresponding values are lists containing the values of these metrics at successive epochs. For easier understanding, we use print(model_1H_history.history) to print out the History.history attribute as shown below:

{'loss': [1.1036714315414429, 0.4070623219013214, 0.2742951810359955, 0.2177169770002365, 0.18181894719600677, 0.15613007545471191, 0.13633036613464355, 0.12089456617832184, 0.10898750275373459, 0.09969013184309006], 
'accuracy': [0.6751502752304077, 0.9047427177429199, 0.9318596720695496, 0.9468188881874084, 0.9591963887214661, 0.9663967490196228, 0.9702372550964355, 0.9738854765892029, 0.9767922163009644, 0.9783851504325867], 
'val_loss': [0.5488188862800598, 0.32053589820861816, 0.24251176416873932, 0.19900383055210114, 0.16920670866966248, 0.14694736897945404, 0.12959441542625427, 0.11719824373722076, 0.10589829087257385, 0.0972539409995079],
'val_accuracy': [0.8701662421226501, 0.9244909882545471, 0.9384795427322388, 0.950728714466095, 0.9627581834793091, 0.96715247631073, 0.9707595109939575, 0.9740735292434692, 0.9768748879432678, 0.9789988398551941]}

Before we attempt more sophisticated improvements, it is imperative to verify if our model training has converged. Here are several common methods for determining whether the model has converged:

  1. Keep training until the change in loss (i.e., the loss value computed with the training set) or accuracy (i.e., the proportion of correctly classified instances) between epochs N and N+1 is less than the predetermined threshold.
  2. Monitoring the convergence curve until the curve depicting loss or accuracy typically levels off or plateaus. This indicates that the model is no longer making significant improvements and has likely converged.
  3. Set a suitable number of training epochs based on experience. Typically, the model will converge after this number of epochs.

Discuss the Training Convergence

In the training of model_1H above, please observe the changes in loss, val_loss, acc, and val_acc as more epochs unfold.

  • What are the changes in the earliest iterations (e.g. between epochs 1 and 2) and the latter iterations (e.g. between epochs 9 and 10)?
  • How different are the values (and the changes in values) between those estimated from the training set (loss and acc) and those from the validation set (val_loss and val_acc).
  • Has the model converged well enough with 10 epochs?

Solutions

The changes in the loss and accuracy at the beginning and ending of the training loops are as follows:

Between epochs change in
loss
change in
accuracy
change in
val_loss
change in
val_accuracy
1 –> 2 (beginning) -0.6966 0.2295 -0.2283 0.0543
9 –> 10 (ending) -0.0094 0.0017 -0.0088 0.0022

The changes in the loss and accuracy, for both training and validation datasets are dramatic at the beginning of the iteration. By the 10th iteration, the changes somewhat leveled off: the loss still decreases by about -0.009, and the accuracy still increases by about 0.2%.

Has this model converged? This is difficult to answer a priori (i.e. without prior knowledge). To determine whether our training has achieved good convergence, we should also consider this complementary question: What do you think is the limit of the accuracy of this model, before any tuning? Is this model capable only of 98% accuracy (which was achieved already by 10 epochs)? Can we reach 99%? How about 99.9%? Remember that in cybersecurity, we will really want to achieve as high an accuracy as possible to minimize wrong predictions. The best way to judge the convergence of the model is to run a few more training iterations, each time continuing from the previously trained model.

Checking Model Convergence: Train with More Epochs!

In order to continue the training on a model (model_1H in this case), we simply call the fit function again on the model that has been previously trained. You can decide the number of additional epochs to try. For the second iteration below, we will train for 15 more epochs:

model_1H_history_p2 = model_1H.fit(train_features,
                                   train_L_onehot,
                                   epochs=15, batch_size=32,
                                   validation_data=(test_features, test_L_onehot),
                                   verbose=2)
Epoch 1/15
6827/6827 - 9s - loss: 0.0921 - accuracy: 0.9799 - val_loss: 0.0906 - val_accuracy: 0.9806
Epoch 2/15
6827/6827 - 9s - loss: 0.0860 - accuracy: 0.9811 - val_loss: 0.0854 - val_accuracy: 0.9821
Epoch 3/15
6827/6827 - 9s - loss: 0.0807 - accuracy: 0.9823 - val_loss: 0.0808 - val_accuracy: 0.9835
Epoch 4/15
6827/6827 - 9s - loss: 0.0761 - accuracy: 0.9840 - val_loss: 0.0760 - val_accuracy: 0.9838
Epoch 5/15
6827/6827 - 9s - loss: 0.0721 - accuracy: 0.9854 - val_loss: 0.0726 - val_accuracy: 0.9850
Epoch 6/15
6827/6827 - 9s - loss: 0.0688 - accuracy: 0.9866 - val_loss: 0.0694 - val_accuracy: 0.9849
Epoch 7/15
6827/6827 - 9s - loss: 0.0659 - accuracy: 0.9874 - val_loss: 0.0666 - val_accuracy: 0.9873
Epoch 8/15
6827/6827 - 9s - loss: 0.0633 - accuracy: 0.9880 - val_loss: 0.0650 - val_accuracy: 0.9867
Epoch 9/15
6827/6827 - 9s - loss: 0.0609 - accuracy: 0.9884 - val_loss: 0.0622 - val_accuracy: 0.9881
Epoch 10/15
6827/6827 - 9s - loss: 0.0588 - accuracy: 0.9887 - val_loss: 0.0597 - val_accuracy: 0.9886
Epoch 11/15
6827/6827 - 9s - loss: 0.0569 - accuracy: 0.9891 - val_loss: 0.0579 - val_accuracy: 0.9888
Epoch 12/15
6827/6827 - 9s - loss: 0.0551 - accuracy: 0.9891 - val_loss: 0.0563 - val_accuracy: 0.9890
Epoch 13/15
6827/6827 - 9s - loss: 0.0535 - accuracy: 0.9894 - val_loss: 0.0552 - val_accuracy: 0.9883
Epoch 14/15
6827/6827 - 9s - loss: 0.0519 - accuracy: 0.9895 - val_loss: 0.0539 - val_accuracy: 0.9888
Epoch 15/15
6827/6827 - 9s - loss: 0.0506 - accuracy: 0.9897 - val_loss: 0.0525 - val_accuracy: 0.9896

Since we are not tuning yet, we must not change the values of the hyperparameters other than epochs. The history from the second round of training is stored in model_1H_history_p2 object (where p2 stands for “part 2”), which we will need for plotting and analysis.

Reviewing the Second Training Round

Let us discuss the outcome of the second round of the training by answering the following questions:

  1. At the end of the second call to the fit() function, how many total epochs has this model been trained with?

  2. Compare the loss and accuracy of the model at the end of the first and second rounds of training. Additionally, what are the changes in the loss and accuracy at the end of the second round?

  3. Estimate what would happen if we further train with more epochs?

Solutions

  1. In our case above, after the second round of training, we have trained the model for a total of (10+15) = 25 epochs.

  2. At the end of the first round of training, we got an accuracy of nearly 98%, which still increased by about 0.2%. After the second round, the accuracy increased to almost 99% (a 1 percent improvement, which is not negligible!), and it was still increasing by less than 0.1%. Compare the following two outputs:
    # the end of first fit() training:
    Epoch 10/10
    6827/6827 - 12s - loss: 0.0996 - accuracy: 0.9785 - val_loss: 0.0970 - val_accuracy: 0.9792
    
    # the end of second fit() training:
    Epoch 15/15
    6827/6827 - 13s - loss: 0.0507 - accuracy: 0.9897 - val_loss: 0.0524 - val_accuracy: 0.9896
    
  3. With further training, the accuracy can continue to improve, but at slower and slower rates.

Plot the Progress of the Second Training Round

As a simple exercise, plot the progression of the loss function and accuracy in the second training round (hint: use the values in model_1H_history_p2).

Solution

plot_loss(model_1H_history_p2)
plot_acc(model_1H_history_p2)

Loss function of the baseline (1H18N) model as a function of training iteration (2nd round)

Accuracy of the baseline (1H18N) model as a function of training iteration (2nd round)

TODO Write the 3-subpanel figures for loss and accuracy changes during training rounds 2-4.

For 3rd and 4th Rounds

Hint: You can run more training iterations as you see fit! In other words, the model_0.fit can be called many times; every time, it starts with the previously optimized model and further refines the parameters.

##Solutions

#RUNIT
# Try to re-run:

model_1H_history_p3 = model_1H.fit(train_features,
            train_L_onehot,
            epochs=25, batch_size=32,
            validation_data=(test_features, test_L_onehot),
            verbose=2)
    Epoch 1/25
    6827/6827 - 9s - loss: 0.0492 - accuracy: 0.9899 - val_loss: 0.0510 - val_accuracy: 0.9898
    Epoch 2/25
    6827/6827 - 9s - loss: 0.0480 - accuracy: 0.9902 - val_loss: 0.0506 - val_accuracy: 0.9897
    Epoch 3/25
    6827/6827 - 9s - loss: 0.0467 - accuracy: 0.9904 - val_loss: 0.0493 - val_accuracy: 0.9896
    Epoch 4/25
    6827/6827 - 9s - loss: 0.0456 - accuracy: 0.9905 - val_loss: 0.0478 - val_accuracy: 0.9903
    Epoch 5/25
    6827/6827 - 9s - loss: 0.0446 - accuracy: 0.9906 - val_loss: 0.0470 - val_accuracy: 0.9905
    Epoch 6/25
    6827/6827 - 9s - loss: 0.0436 - accuracy: 0.9907 - val_loss: 0.0460 - val_accuracy: 0.9906
    Epoch 7/25
    6827/6827 - 9s - loss: 0.0427 - accuracy: 0.9908 - val_loss: 0.0455 - val_accuracy: 0.9905
    Epoch 8/25
    6827/6827 - 9s - loss: 0.0419 - accuracy: 0.9908 - val_loss: 0.0449 - val_accuracy: 0.9900
    Epoch 9/25
    6827/6827 - 9s - loss: 0.0412 - accuracy: 0.9909 - val_loss: 0.0441 - val_accuracy: 0.9906
    Epoch 10/25
    6827/6827 - 9s - loss: 0.0404 - accuracy: 0.9910 - val_loss: 0.0428 - val_accuracy: 0.9904
    Epoch 11/25
    6827/6827 - 9s - loss: 0.0396 - accuracy: 0.9911 - val_loss: 0.0423 - val_accuracy: 0.9908
    Epoch 12/25
    6827/6827 - 9s - loss: 0.0389 - accuracy: 0.9912 - val_loss: 0.0417 - val_accuracy: 0.9912
    Epoch 13/25
    6827/6827 - 9s - loss: 0.0383 - accuracy: 0.9913 - val_loss: 0.0412 - val_accuracy: 0.9907
    Epoch 14/25
    6827/6827 - 9s - loss: 0.0375 - accuracy: 0.9913 - val_loss: 0.0408 - val_accuracy: 0.9907
    Epoch 15/25
    6827/6827 - 9s - loss: 0.0370 - accuracy: 0.9915 - val_loss: 0.0398 - val_accuracy: 0.9913
    Epoch 16/25
    6827/6827 - 9s - loss: 0.0361 - accuracy: 0.9918 - val_loss: 0.0391 - val_accuracy: 0.9926
    Epoch 17/25
    6827/6827 - 9s - loss: 0.0355 - accuracy: 0.9920 - val_loss: 0.0384 - val_accuracy: 0.9917
    Epoch 18/25
    6827/6827 - 9s - loss: 0.0347 - accuracy: 0.9923 - val_loss: 0.0382 - val_accuracy: 0.9920
    Epoch 19/25
    6827/6827 - 9s - loss: 0.0340 - accuracy: 0.9925 - val_loss: 0.0372 - val_accuracy: 0.9920
    Epoch 20/25
    6827/6827 - 9s - loss: 0.0333 - accuracy: 0.9925 - val_loss: 0.0367 - val_accuracy: 0.9921
    Epoch 21/25
    6827/6827 - 9s - loss: 0.0327 - accuracy: 0.9928 - val_loss: 0.0360 - val_accuracy: 0.9928
    Epoch 22/25
    6827/6827 - 9s - loss: 0.0320 - accuracy: 0.9931 - val_loss: 0.0354 - val_accuracy: 0.9925
    Epoch 23/25
    6827/6827 - 9s - loss: 0.0313 - accuracy: 0.9932 - val_loss: 0.0350 - val_accuracy: 0.9930
    Epoch 24/25
    6827/6827 - 9s - loss: 0.0306 - accuracy: 0.9935 - val_loss: 0.0336 - val_accuracy: 0.9933
    Epoch 25/25
    6827/6827 - 9s - loss: 0.0299 - accuracy: 0.9936 - val_loss: 0.0340 - val_accuracy: 0.9941
#RUNIT
plot_loss(model_1H_history_p3)
plot_acc(model_1H_history_p3)

Loss function of the baseline (1H18N) model as a function of training iteration (3rd round)

Accuracy of the baseline (1H18N) model as a function of training iteration (3rd round)

#RUNIT
# Try to re-run:

model_1H_history_p4 = model_1H.fit(train_features,
            train_L_onehot,
            epochs=25, batch_size=32,
            validation_data=(test_features, test_L_onehot),
            verbose=2)
    Epoch 1/25
    6827/6827 - 9s - loss: 0.0294 - accuracy: 0.9937 - val_loss: 0.0326 - val_accuracy: 0.9935
    Epoch 2/25
    6827/6827 - 9s - loss: 0.0288 - accuracy: 0.9939 - val_loss: 0.0330 - val_accuracy: 0.9930
    Epoch 3/25
    6827/6827 - 9s - loss: 0.0283 - accuracy: 0.9940 - val_loss: 0.0328 - val_accuracy: 0.9936
    Epoch 4/25
    6827/6827 - 9s - loss: 0.0278 - accuracy: 0.9942 - val_loss: 0.0314 - val_accuracy: 0.9938
    Epoch 5/25
    6827/6827 - 9s - loss: 0.0273 - accuracy: 0.9943 - val_loss: 0.0311 - val_accuracy: 0.9939
    Epoch 6/25
    6827/6827 - 9s - loss: 0.0268 - accuracy: 0.9944 - val_loss: 0.0306 - val_accuracy: 0.9939
    Epoch 7/25
    6827/6827 - 9s - loss: 0.0264 - accuracy: 0.9944 - val_loss: 0.0304 - val_accuracy: 0.9944
    Epoch 8/25
    6827/6827 - 9s - loss: 0.0260 - accuracy: 0.9945 - val_loss: 0.0297 - val_accuracy: 0.9943
    Epoch 9/25
    6827/6827 - 9s - loss: 0.0256 - accuracy: 0.9946 - val_loss: 0.0300 - val_accuracy: 0.9940
    Epoch 10/25
    6827/6827 - 9s - loss: 0.0253 - accuracy: 0.9946 - val_loss: 0.0291 - val_accuracy: 0.9950
    Epoch 11/25
    6827/6827 - 9s - loss: 0.0249 - accuracy: 0.9946 - val_loss: 0.0283 - val_accuracy: 0.9952
    Epoch 12/25
    6827/6827 - 9s - loss: 0.0245 - accuracy: 0.9947 - val_loss: 0.0282 - val_accuracy: 0.9945
    Epoch 13/25
    6827/6827 - 9s - loss: 0.0244 - accuracy: 0.9948 - val_loss: 0.0282 - val_accuracy: 0.9942
    Epoch 14/25
    6827/6827 - 9s - loss: 0.0239 - accuracy: 0.9948 - val_loss: 0.0280 - val_accuracy: 0.9940
    Epoch 15/25
    6827/6827 - 9s - loss: 0.0235 - accuracy: 0.9949 - val_loss: 0.0276 - val_accuracy: 0.9945
    Epoch 16/25
    6827/6827 - 9s - loss: 0.0234 - accuracy: 0.9949 - val_loss: 0.0270 - val_accuracy: 0.9953
    Epoch 17/25
    6827/6827 - 9s - loss: 0.0230 - accuracy: 0.9950 - val_loss: 0.0285 - val_accuracy: 0.9945
    Epoch 18/25
    6827/6827 - 9s - loss: 0.0227 - accuracy: 0.9951 - val_loss: 0.0269 - val_accuracy: 0.9948
    Epoch 19/25
    6827/6827 - 9s - loss: 0.0226 - accuracy: 0.9951 - val_loss: 0.0270 - val_accuracy: 0.9946
    Epoch 20/25
    6827/6827 - 9s - loss: 0.0222 - accuracy: 0.9953 - val_loss: 0.0269 - val_accuracy: 0.9949
    Epoch 21/25
    6827/6827 - 9s - loss: 0.0220 - accuracy: 0.9952 - val_loss: 0.0261 - val_accuracy: 0.9949
    Epoch 22/25
    6827/6827 - 9s - loss: 0.0216 - accuracy: 0.9953 - val_loss: 0.0258 - val_accuracy: 0.9949
    Epoch 23/25
    6827/6827 - 9s - loss: 0.0215 - accuracy: 0.9954 - val_loss: 0.0261 - val_accuracy: 0.9952
    Epoch 24/25
    6827/6827 - 9s - loss: 0.0215 - accuracy: 0.9953 - val_loss: 0.0255 - val_accuracy: 0.9948
    Epoch 25/25
    6827/6827 - 9s - loss: 0.0210 - accuracy: 0.9954 - val_loss: 0.0250 - val_accuracy: 0.9956
#RUNIT

plot_loss(model_1H_history_p4)
plot_acc(model_1H_history_p4)

Loss function of the baseline (1H18N) model as a function of training iteration (4th round)

Accuracy of the baseline (1H18N) model as a function of training iteration (4th round)

#RUNIT

Interesting Observations (for developers)

In the second re-run of fit(), the loss function has crossed over between the training and validation data. The accuracy of the train and test data are still tracking each other but with greater and greater apparent discrepancy; but we have to realize the changes in the accuracy between successive epochs are getting smaller and smaller.

QUESTION: What are other adjustable hyperparameters in this model?

#RUNIT

hidden_neurons (the number of neurons in the hidden layer), epoch and batch_size are three important hyperparameters. Activation function can also be considered a hyperparameter that affects the architecture of the model.

Model Tuning Experiments

Now that we have built and trained the baseline neural network model, we will run a variety of experiments using different combinations of hyperparameters, in order to find the best performing model. Below is a list of hyperparameters that could be interesting to explore; feel free to experiment with your own ideas as well.

We will use the NN_Model_1H with 18 neurons in the hidden layer as a baseline. Starting from this model, let us:

NOTE: The easiest way to do this exploration is to simply copy the code cell where we constructed and trained the baseline model and paste it to a new cell below, since most of the parameters (hidden_neurons, learning_rate, batch_size, etc.) can be changed when calling the NN_Model_1H function or when fitting the model. However, to change the number of hidden layers (which we will do much later), the original NN_model_1H function must be duplicated and modified.

Tuning Experiments, Part 1: Varying Number of Neurons in Hidden Layers

In this round of experiments, we create several variants of NN_Model_1H models with varying the hidden_neurons hyperparameter, i.e. the number of neurons in the hidden layer. The loss and accuracy of each model will be assessed as a function of hidden_neurons. All the other hyperparameters (e.g. learning rate, epochs, batch_size, number of hidden layers) will be kept constant; they will be varied later. Not every number of hidden neurons is tested, so feel free to create new code cells with a different number of neurons as your curiousity leads you.

#RUNIT

Going in FEWER hidden neurons (vs input/output layers)

#RUNIT

Model “1H12N”: 12 neurons in the hidden layer
#RUNIT
# the model with 12 neurons in the hidden layer 
model_1H12N = NN_Model_1H(12,0.0003)
model_1H12N_history = model_1H12N.fit(train_features,
                                      train_L_onehot,
                                      epochs=10, batch_size=32,
                                      validation_data=(test_features, test_L_onehot),
                                      verbose=2)
plot_loss(model_1H12N_history)
plot_acc(model_1H12N_history)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:375: UserWarning: The `lr` argument is deprecated, use `learning_rate` instead.
  "The `lr` argument is deprecated, use `learning_rate` instead.")
Epoch 1/10
6827/6827 - 8s - loss: 1.1864 - accuracy: 0.6581 - val_loss: 0.6118 - val_accuracy: 0.8622
Epoch 2/10
6827/6827 - 9s - loss: 0.4592 - accuracy: 0.8992 - val_loss: 0.3700 - val_accuracy: 0.9217
Epoch 3/10
6827/6827 - 7s - loss: 0.3286 - accuracy: 0.9277 - val_loss: 0.2997 - val_accuracy: 0.9331
Epoch 4/10
6827/6827 - 7s - loss: 0.2803 - accuracy: 0.9349 - val_loss: 0.2659 - val_accuracy: 0.9381
Epoch 5/10
6827/6827 - 7s - loss: 0.2531 - accuracy: 0.9381 - val_loss: 0.2437 - val_accuracy: 0.9407
Epoch 6/10
6827/6827 - 9s - loss: 0.2328 - accuracy: 0.9413 - val_loss: 0.2253 - val_accuracy: 0.9448
Epoch 7/10
6827/6827 - 7s - loss: 0.2161 - accuracy: 0.9455 - val_loss: 0.2105 - val_accuracy: 0.9507
Epoch 8/10
6827/6827 - 7s - loss: 0.2026 - accuracy: 0.9496 - val_loss: 0.1983 - val_accuracy: 0.9549
Epoch 9/10
6827/6827 - 7s - loss: 0.1908 - accuracy: 0.9540 - val_loss: 0.1871 - val_accuracy: 0.9556
Epoch 10/10
6827/6827 - 9s - loss: 0.1777 - accuracy: 0.9565 - val_loss: 0.1731 - val_accuracy: 0.9593

png

png

#RUNIT

Model “1H8N”: 8 neurons in the hidden layer
#RUNIT
model_1H8N = NN_Model_1H(8,0.0003)
model_1H8N_history = model_1H8N.fit(train_features,
                                    train_L_onehot,
                                    epochs=10, batch_size=32,
                                    validation_data=(test_features, test_L_onehot),
                                    verbose=2)
plot_loss(model_1H8N_history)
plot_acc(model_1H8N_history)
Epoch 1/10
6827/6827 - 8s - loss: 1.4420 - accuracy: 0.5361 - val_loss: 0.9016 - val_accuracy: 0.7646
Epoch 2/10
6827/6827 - 7s - loss: 0.6984 - accuracy: 0.8269 - val_loss: 0.5523 - val_accuracy: 0.8674
Epoch 3/10
6827/6827 - 8s - loss: 0.4725 - accuracy: 0.8843 - val_loss: 0.4205 - val_accuracy: 0.8979
Epoch 4/10
6827/6827 - 7s - loss: 0.3848 - accuracy: 0.9112 - val_loss: 0.3640 - val_accuracy: 0.9167
Epoch 5/10
6827/6827 - 7s - loss: 0.3445 - accuracy: 0.9195 - val_loss: 0.3347 - val_accuracy: 0.9224
Epoch 6/10
6827/6827 - 7s - loss: 0.3217 - accuracy: 0.9235 - val_loss: 0.3158 - val_accuracy: 0.9258
Epoch 7/10
6827/6827 - 8s - loss: 0.3057 - accuracy: 0.9269 - val_loss: 0.3015 - val_accuracy: 0.9272
Epoch 8/10
6827/6827 - 7s - loss: 0.2935 - accuracy: 0.9302 - val_loss: 0.2909 - val_accuracy: 0.9323
Epoch 9/10
6827/6827 - 7s - loss: 0.2833 - accuracy: 0.9319 - val_loss: 0.2822 - val_accuracy: 0.9339
Epoch 10/10
6827/6827 - 7s - loss: 0.2747 - accuracy: 0.9341 - val_loss: 0.2734 - val_accuracy: 0.9362

png

png

#RUNIT

Tips & Tricks for Experimental Runs

Do you see the systematic names of the model and history variables, etc.? The variable called model_1H12N means “a model with one hidden layer (1H) that has 12 neurons (12N)”. The use of systematic names, albeit complicated, will be very helpful in keeping track of different experiments. For example, down below, we will have models with two hidden layers; such a model can be denoted by a variable name such as model_2H18N12N, etc.

DISCUSSION QUESTION: Why don’t we just name the variables model1, model2, model3, …? What are the advantages and disadvantages of naming them with this schema?

Keeping track of experimental results: At this stage, it may be helpful to keep track the final training accuracy (after 10 epochs) for each model with a distinct hidden_neurons value. You can use pen-and-paper, or build a spreadsheet with the following values:

hidden_neurons val_accuracy
1 ….
….
18 0.9792 (example)
….
80 ….

EXERCISES: create additional code cells to run models with 4, 2, 1 neurons in the hidden layer

Model “1H4N”: 4 neurons in the hidden layer
#RUNIT
model_1H4N = NN_Model_1H(4,0.0003)
model_1H4N_history = model_1H4N.fit(train_features,
                                    train_L_onehot,
                                    epochs=10, batch_size=32,
                                    validation_data=(test_features, test_L_onehot),
                                    verbose=2)
plot_loss(model_1H4N_history)
plot_acc(model_1H4N_history)
Epoch 1/10
6827/6827 - 8s - loss: 1.6787 - accuracy: 0.4199 - val_loss: 1.2247 - val_accuracy: 0.5768
Epoch 2/10
6827/6827 - 7s - loss: 1.0693 - accuracy: 0.6315 - val_loss: 0.9432 - val_accuracy: 0.6934
Epoch 3/10
6827/6827 - 8s - loss: 0.8440 - accuracy: 0.7524 - val_loss: 0.7699 - val_accuracy: 0.7884
Epoch 4/10
6827/6827 - 8s - loss: 0.7248 - accuracy: 0.7993 - val_loss: 0.6817 - val_accuracy: 0.8122
Epoch 5/10
6827/6827 - 8s - loss: 0.6571 - accuracy: 0.8325 - val_loss: 0.6291 - val_accuracy: 0.8449
Epoch 6/10
6827/6827 - 8s - loss: 0.6146 - accuracy: 0.8495 - val_loss: 0.5927 - val_accuracy: 0.8491
Epoch 7/10
6827/6827 - 8s - loss: 0.5846 - accuracy: 0.8541 - val_loss: 0.5679 - val_accuracy: 0.8601
Epoch 8/10
6827/6827 - 8s - loss: 0.5640 - accuracy: 0.8572 - val_loss: 0.5498 - val_accuracy: 0.8783
Epoch 9/10
6827/6827 - 8s - loss: 0.5483 - accuracy: 0.8659 - val_loss: 0.5352 - val_accuracy: 0.8762
Epoch 10/10
6827/6827 - 8s - loss: 0.5347 - accuracy: 0.8701 - val_loss: 0.5208 - val_accuracy: 0.8687

png

png

#RUNIT

Model “1H2N”: 2 neurons in the hidden layer
#RUNIT
model_1H2N = NN_Model_1H(2,0.0003)
model_1H2N_history = model_1H2N.fit(train_features,
                                    train_L_onehot,
                                    epochs=10, batch_size=32,
                                    validation_data=(test_features, test_L_onehot),
                                    verbose=2)
plot_loss(model_1H2N_history)
plot_acc(model_1H2N_history)
Epoch 1/10
6827/6827 - 9s - loss: 2.1385 - accuracy: 0.2973 - val_loss: 1.8072 - val_accuracy: 0.3491
Epoch 2/10
6827/6827 - 7s - loss: 1.6945 - accuracy: 0.3901 - val_loss: 1.6147 - val_accuracy: 0.4122
Epoch 3/10
6827/6827 - 7s - loss: 1.5603 - accuracy: 0.4286 - val_loss: 1.5186 - val_accuracy: 0.4345
Epoch 4/10
6827/6827 - 7s - loss: 1.4834 - accuracy: 0.4416 - val_loss: 1.4552 - val_accuracy: 0.4462
Epoch 5/10
6827/6827 - 7s - loss: 1.4283 - accuracy: 0.4535 - val_loss: 1.4069 - val_accuracy: 0.4610
Epoch 6/10
6827/6827 - 7s - loss: 1.3843 - accuracy: 0.4677 - val_loss: 1.3668 - val_accuracy: 0.4711
Epoch 7/10
6827/6827 - 7s - loss: 1.3467 - accuracy: 0.4811 - val_loss: 1.3322 - val_accuracy: 0.4803
Epoch 8/10
6827/6827 - 7s - loss: 1.3153 - accuracy: 0.4931 - val_loss: 1.3039 - val_accuracy: 0.4937
Epoch 9/10
6827/6827 - 8s - loss: 1.2892 - accuracy: 0.5089 - val_loss: 1.2802 - val_accuracy: 0.5120
Epoch 10/10
6827/6827 - 7s - loss: 1.2678 - accuracy: 0.5205 - val_loss: 1.2608 - val_accuracy: 0.5241

png

png

#RUNIT

Model “1H1N”: 1 neuron in the hidden layer
#RUNIT
model_1H1N = NN_Model_1H(1,0.0003)
model_1H1N_history = model_1H1N.fit(train_features,
                                    train_L_onehot,
                                    epochs=10, batch_size=32,
                                    validation_data=(test_features, test_L_onehot),
                                    verbose=2)
plot_loss(model_1H1N_history)
plot_acc(model_1H1N_history)
Epoch 1/10
6827/6827 - 8s - loss: 2.3351 - accuracy: 0.2485 - val_loss: 2.1355 - val_accuracy: 0.2752
Epoch 2/10
6827/6827 - 7s - loss: 2.0610 - accuracy: 0.2724 - val_loss: 2.0034 - val_accuracy: 0.2723
Epoch 3/10
6827/6827 - 7s - loss: 1.9741 - accuracy: 0.2745 - val_loss: 1.9494 - val_accuracy: 0.2825
Epoch 4/10
6827/6827 - 7s - loss: 1.9346 - accuracy: 0.2829 - val_loss: 1.9205 - val_accuracy: 0.2857
Epoch 5/10
6827/6827 - 7s - loss: 1.9118 - accuracy: 0.2885 - val_loss: 1.9036 - val_accuracy: 0.2934
Epoch 6/10
6827/6827 - 8s - loss: 1.8976 - accuracy: 0.2937 - val_loss: 1.8930 - val_accuracy: 0.3008
Epoch 7/10
6827/6827 - 7s - loss: 1.8883 - accuracy: 0.3000 - val_loss: 1.8856 - val_accuracy: 0.3009
Epoch 8/10
6827/6827 - 7s - loss: 1.8819 - accuracy: 0.3048 - val_loss: 1.8808 - val_accuracy: 0.3149
Epoch 9/10
6827/6827 - 7s - loss: 1.8774 - accuracy: 0.3097 - val_loss: 1.8772 - val_accuracy: 0.3114
Epoch 10/10
6827/6827 - 8s - loss: 1.8737 - accuracy: 0.3112 - val_loss: 1.8742 - val_accuracy: 0.3104

png

png

#RUNIT

Going in the direction of MORE hidden neurons

Models “1H40N” & “1H80N”: 40 & 80 neurons in the hidden layer

EXERCISES: create more code cells to run models with 40 and 80 neurons in the hidden layer. You are welcome to explore even higher numbers of hidden neurons. Observe carefully what happening!

#RUNIT
model_1H40N = NN_Model_1H(40,0.0003)
model_1H40N_history = model_1H40N.fit(train_features,
                                      train_L_onehot,
                                      epochs=10, batch_size=32,
                                      validation_data=(test_features, test_L_onehot),
                                      verbose=2)
plot_loss(model_1H40N_history)
plot_acc(model_1H40N_history)
Epoch 1/10
6827/6827 - 9s - loss: 0.8427 - accuracy: 0.7706 - val_loss: 0.3632 - val_accuracy: 0.9180
Epoch 2/10
6827/6827 - 9s - loss: 0.2798 - accuracy: 0.9339 - val_loss: 0.2265 - val_accuracy: 0.9456
Epoch 3/10
6827/6827 - 9s - loss: 0.1958 - accuracy: 0.9533 - val_loss: 0.1706 - val_accuracy: 0.9637
Epoch 4/10
6827/6827 - 9s - loss: 0.1519 - accuracy: 0.9658 - val_loss: 0.1364 - val_accuracy: 0.9689
Epoch 5/10
6827/6827 - 9s - loss: 0.1226 - accuracy: 0.9718 - val_loss: 0.1113 - val_accuracy: 0.9733
Epoch 6/10
6827/6827 - 9s - loss: 0.1014 - accuracy: 0.9770 - val_loss: 0.0931 - val_accuracy: 0.9796
Epoch 7/10
6827/6827 - 9s - loss: 0.0864 - accuracy: 0.9805 - val_loss: 0.0810 - val_accuracy: 0.9815
Epoch 8/10
6827/6827 - 9s - loss: 0.0755 - accuracy: 0.9825 - val_loss: 0.0704 - val_accuracy: 0.9822
Epoch 9/10
6827/6827 - 9s - loss: 0.0667 - accuracy: 0.9848 - val_loss: 0.0632 - val_accuracy: 0.9874
Epoch 10/10
6827/6827 - 9s - loss: 0.0596 - accuracy: 0.9875 - val_loss: 0.0570 - val_accuracy: 0.9884

png

png

#RUNIT
model_1H80N = NN_Model_1H(80,0.0003)
model_1H80N_history = model_1H80N.fit(train_features,
                                      train_L_onehot,
                                      epochs=10, batch_size=32,
                                      validation_data=(test_features, test_L_onehot),
                                      verbose=2)
plot_loss(model_1H80N_history)
plot_acc(model_1H80N_history)
Epoch 1/10
6827/6827 - 9s - loss: 0.6815 - accuracy: 0.8244 - val_loss: 0.2710 - val_accuracy: 0.9327
Epoch 2/10
6827/6827 - 9s - loss: 0.2048 - accuracy: 0.9492 - val_loss: 0.1580 - val_accuracy: 0.9629
Epoch 3/10
6827/6827 - 9s - loss: 0.1291 - accuracy: 0.9708 - val_loss: 0.1058 - val_accuracy: 0.9786
Epoch 4/10
6827/6827 - 9s - loss: 0.0900 - accuracy: 0.9808 - val_loss: 0.0764 - val_accuracy: 0.9829
Epoch 5/10
6827/6827 - 9s - loss: 0.0669 - accuracy: 0.9865 - val_loss: 0.0587 - val_accuracy: 0.9888
Epoch 6/10
6827/6827 - 9s - loss: 0.0525 - accuracy: 0.9902 - val_loss: 0.0463 - val_accuracy: 0.9915
Epoch 7/10
6827/6827 - 9s - loss: 0.0424 - accuracy: 0.9919 - val_loss: 0.0377 - val_accuracy: 0.9925
Epoch 8/10
6827/6827 - 9s - loss: 0.0351 - accuracy: 0.9931 - val_loss: 0.0326 - val_accuracy: 0.9933
Epoch 9/10
6827/6827 - 9s - loss: 0.0299 - accuracy: 0.9939 - val_loss: 0.0282 - val_accuracy: 0.9943
Epoch 10/10
6827/6827 - 9s - loss: 0.0258 - accuracy: 0.9945 - val_loss: 0.0244 - val_accuracy: 0.9948

png

png

Takeaways from Tuning Experiment #1

In the first experiment above, we tuned the NN_Model_1H model by varying the hidden_neurons hyperparameter.

CHALLENGE QUESTION: Please plot the final model accuracies against the number of hidden neurons.

Hint: you can do this in many ways! If you have kept track the accuracy vs. hidden_neurons table elsewhere, you can plot the results on a spreadsheet software (Google Sheets, Microsoft Excel, etc.). In this Python session, the final model accuracy can be found in the model history objects returned by the fit() function calls above. For example, the final accuracy from the model with 12 hidden neurons should be found in model_1H12N_history.history['val_accuracy'][-1] (which should be close to 0.96).


"""(Optional) Use this cell to generate the plot of val_acc vs. hidden_neurons:"""

## Example:

# expt1_acc = [
#    (1, model_1H1N_history['val_accuracy'][-1]),
#    # ... fill in the other values here
#    (12, model_1H12N_history['val_accuracy'][-1]),
#    # ... fill in the other values here
# ]

## Construct a dataframe from expt1_acc

# df_expt1_acc = pd.DataFrame(#TODO)

## Plot the data as an x-y line plot

# df_expt1_acc.plot.line(#TODO)
#RUNIT

def get_val_acc(hist):
    return hist.history['val_accuracy'][-1]
#RUNIT
model_1H12N_history.history['val_accuracy'][-1]

0.9592793583869934

#RUNIT
expt1_acc = [
    (1, get_val_acc(model_1H1N_history)),
    (2, get_val_acc(model_1H2N_history)),
    (4, get_val_acc(model_1H4N_history)),
    (8, get_val_acc(model_1H8N_history)),
    (12, get_val_acc(model_1H12N_history)),
    (18, get_val_acc(model_1H_history)),
    (40, get_val_acc(model_1H40N_history)),
    (80, get_val_acc(model_1H80N_history)),
]
#RUNIT
df_expt1_acc = pd.DataFrame(expt1_acc, columns=['hidden_neurons', 'val_accuracy'])
df_expt1_acc
hidden_neurons val_accuracy
0 1 0.310367
1 2 0.524095
2 4 0.868701
3 8 0.936154
4 12 0.959279
5 18 0.979219
6 40 0.988447
7 80 0.994800
#RUNIT
df_expt1_acc.plot.line(x='hidden_neurons', y='val_accuracy', style='o-')
plt.title("Tuning Expt #1: Accuracy vs num of hidden neurons")
Text(0.5, 1.0, 'Tuning Expt #1: Accuracy vs num of hidden neurons')

png

#RUNIT
[ d for d in globals() if d.startswith('model_1H') and d.endswith('_history') ]
['model_1H_history',
 'model_1H12N_history',
 'model_1H8N_history',
 'model_1H4N_history',
 'model_1H2N_history',
 'model_1H1N_history',
 'model_1H40N_history',
 'model_1H80N_history']

QUESTIONS: Let us recap what we learned from this experiments by answering the following questions:

In conclusion: In order to improve the accuracy of the model, we should use _____ (more or less?) hidden neurons.

DISCUSSION: What is an optimal value of hidden_neurons that will yield the desirable level of accuracy? For example, what is the value of hidden_neurons that will yield a 99% model accuracy? How about 99.5% accuracy? Can we reach 99.9% accuracy? Keep in mind that neural network model training is very expensive; increasing this hyperparameter may not improve the model significantly!

#RUNIT

Deciding an Optimal Hyperparameter

The example above shows a common theme with model tuning. The more neurons we train, the more accuracy we can achieve (subject to risk of overfitting, see below). You should have observed that at large enough hidden_neurons, the model accuracy started to level off (i.e. adding more neurons will not give significant gain in accuracy)?

Since training a neural network model is very expensive, we often have to make a trade-off between doing more trainings (which can be very costly, so may not be possible), and conserving effort against “point of diminishing return”, i.e. the point where improving the model does not yield a significant benefit in the model’s accuracy.

Where is the point of diminishing return?

This depends on the application. In some application we may really want to get as close as possible to 100%, then we have no choice but train more (bite the bullet).

Tuning Experiment #2: Varying Learning Rate

In this batch of experiment, the accuracy and loss function of each model will be compared while changing the ‘learning rate’. For simplicity, all the other parameters (e.g. the number of neurons, epochs, batch_size, hidden layers) will be kept constant. The one hidden layer with 18 neurons model will be used. Not every number of learning rate is tested, so feel free to create new code cells with a different learning rate.

"""Construct & train a NN_Model_1H with 18 neurons in the hidden layer & learning rate=0.0003""";

#model_1H18N_LR0_0003 = NN_Model_1H(#TODO...)
#model_1H18N_LR0_0003_history = #TODO

# Also plot the loss & accuracy (optional)

TODO

… (create additional code cells to run models (1H18N) with larger learning rates: 0.001, 0.01,0.1)

#RUNIT

Model “1H18N” With Learning Rate 0.0003


#RUNIT
# Reproducibility hacks!

np.random.seed(968172)
tf.random.set_seed(83018241)
def save_history(model_history, fileName):
    df = pd.DataFrame(model_history.history)
    df.to_csv(fileName)
    return df

def plot_loss(model_history):
    # summarize history for loss
    plt.plot(model_history.history['loss'])
    plt.plot(model_history.history['val_loss'])
    plt.title('Model Loss')
    plt.ylabel('loss')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper right')
    plt.show()

def plot_acc(model_history):
    # summarize history for accuracy
    plt.plot(model_history.history['acc'])
    plt.plot(model_history.history['val_acc'])
    plt.title('Model Accuracy')
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper left')
    plt.show()
    
def plot_ML_metrics(model_history, figName):
    """Plots only ML metrics in a panel format (loss func, accuracy)"""
    fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(15,4))
    ax[0].plot(model_history.history['loss'])
    ax[0].plot(model_history.history['val_loss'])
    ax[0].set_title('Model Loss')
    ax[0].set_xlabel('epoch')
    ax[0].set_ylabel('loss')
    ax[0].legend(['train', 'test'], loc='upper left')
    ax[1].plot(model_history.history['acc'])
    ax[1].plot(model_history.history['val_acc'])
    ax[1].set_title('Model Accuracy')
    ax[1].set_xlabel('epoch')
    ax[1].set_ylabel('accuracy')
    ax[1].legend(['train', 'test'], loc='upper left')
    #ax[2].plot(model_history.history['timing'])
    #ax[2].set_title('Model Timing')
    #ax[2].set_xlabel('epoch')
    #ax[2].set_ylabel('timing')
    fig.savefig(figName)
    fig.show()

def plot_all3(model_history, figName):
    """Plots all 3 data in a panel format (loss func, accuracy, and time)"""
    fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(15,4))
    ax[0].plot(model_history.history['loss'])
    ax[0].plot(model_history.history['val_loss'])
    ax[0].set_title('Model Loss')
    ax[0].set_xlabel('epoch')
    ax[0].set_ylabel('loss')
    ax[0].legend(['train', 'test'], loc='upper left')
    ax[1].plot(model_history.history['acc'])
    ax[1].plot(model_history.history['val_acc'])
    ax[1].set_title('Model Accuracy')
    ax[1].set_xlabel('epoch')
    ax[1].set_ylabel('accuracy')
    ax[1].legend(['train', 'test'], loc='upper left')
    ax[2].plot(model_history.history['timing'])
    ax[2].set_title('Model Timing')
    ax[2].set_xlabel('epoch')
    ax[2].set_ylabel('timing')
    fig.savefig(figName)
    fig.show()

The class definition below is a custom callback object that stores the timing of each epoch into the model history, since it is not stored there by default.

import time

# Idea from https://stackoverflow.com/questions/54527760/using-tensorflow-how-do-i-find-the-time-taken-for-an-epoch-during-fitting
class epochTiming(tf.keras.callbacks.Callback):
    def __init__(self):
        self.timings = []
        self.start = 0
        self.stop = 0
    def on_epoch_begin(self, epoch, logs=None):
        self.start = time.time()
    def on_epoch_end(self, epoch, logs=None):
        #self.epoch.append(epoch)
        self.stop = time.time()
        self.timings.append(self.stop-self.start)
        print(self.timings[epoch])
        logs['timing'] = self.timings[epoch]

Model Tuning Methods

1. Varying Number of Neurons

The accuracy and loss of each model will be compared while changing the ‘number of neurons’. For simplicity, all other parameters (e.g. learning rate, epochs, batch_size, hidden layers) will be kept constant. The one hidden layer model will be used since the zero hidden layer model has no hidden neurons by definition. Not every number of hidden neurons is tested, so feel free to create new code cells with a different number of neurons. file_base is defined for later use when saving image and csv files.

hidden_neurons = 25 # Change as we go
learning_rate = 0.0003 # Keep constant for now
file_base = 'model_%slr_%shn' % (learning_rate, hidden_neurons) # base file name
model_1H_25N = NN_Model_1H(hidden_neurons, learning_rate)
model_1H_25N_history=model_1H_25N.fit(train_features,
            train_L_onehot,
            epochs=10, batch_size=32,
            validation_data=(test_features, test_L_onehot),
            verbose=2, shuffle=False, callbacks=[epochTiming()])
Epoch 1/10
11.662319660186768
 - 12s - loss: 0.9649 - acc: 0.7347 - val_loss: 0.4313 - val_acc: 0.8968
Epoch 2/10
10.582706212997437
 - 11s - loss: 0.3147 - acc: 0.9261 - val_loss: 0.2461 - val_acc: 0.9372
Epoch 3/10
10.509607553482056
 - 11s - loss: 0.2122 - acc: 0.9420 - val_loss: 0.1877 - val_acc: 0.9489
Epoch 4/10
10.520312309265137
 - 11s - loss: 0.1683 - acc: 0.9540 - val_loss: 0.1522 - val_acc: 0.9600
Epoch 5/10
10.50895071029663
 - 11s - loss: 0.1383 - acc: 0.9657 - val_loss: 0.1277 - val_acc: 0.9693
Epoch 6/10
10.5454740524292
 - 11s - loss: 0.1175 - acc: 0.9718 - val_loss: 0.1103 - val_acc: 0.9732
Epoch 7/10
10.639808654785156
 - 11s - loss: 0.1020 - acc: 0.9757 - val_loss: 0.0968 - val_acc: 0.9759
Epoch 8/10
10.566517114639282
 - 11s - loss: 0.0897 - acc: 0.9795 - val_loss: 0.0858 - val_acc: 0.9791
Epoch 9/10
10.566912651062012
 - 11s - loss: 0.0796 - acc: 0.9823 - val_loss: 0.0766 - val_acc: 0.9825
Epoch 10/10
10.639428615570068
 - 11s - loss: 0.0714 - acc: 0.9852 - val_loss: 0.0689 - val_acc: 0.9852
type(model_1H_25N_history)
tensorflow.python.keras.callbacks.History
model_1H_25N_history.history
{'loss': [0.964901643646793,
  0.3147217525445136,
  0.21224113260668728,
  0.1682881320809661,
  0.1382946920054715,
  0.11748499731191367,
  0.10201239747299856,
  0.08974006067498123,
  0.0796067609282777,
  0.07139323706390785],
 'acc': [0.73471695,
  0.9261012,
  0.94203085,
  0.9539918,
  0.9656506,
  0.9718165,
  0.97571194,
  0.9794837,
  0.9823493,
  0.985201],
 'val_loss': [0.431267748679461,
  0.24607867231886926,
  0.18767869461481124,
  0.1521837008702025,
  0.12770225922573672,
  0.11027415678389638,
  0.09683312227947044,
  0.08576859160630224,
  0.07656934054796773,
  0.06891448614564732],
 'val_acc': [0.8967519,
  0.93719786,
  0.9489161,
  0.9599751,
  0.969313,
  0.97315806,
  0.9759411,
  0.9790904,
  0.98245937,
  0.9852424],
 'timing': [11.662319660186768,
  10.582706212997437,
  10.509607553482056,
  10.520312309265137,
  10.50895071029663,
  10.5454740524292,
  10.639808654785156,
  10.566517114639282,
  10.566912651062012,
  10.639428615570068]}

The training results can be graphed using one of the plotting functions defined earlier. Using plot_all, all results will be graphed and saved.

plot_ML_metrics(model_1H_25N_history, file_base + 'metrics.png')

png

The model loss and model accuracy graphs appear to be flipped relative to each other. Model loss started at a value of about 1 and rapidly decreased with each epoch. In contrast, model accuracy started at a value below 0.75 and increased suddenly after the first epoch, then proceeded at a constant rate. Lastly, the timing stayed roughly the same throughout the training with the exception of the first epoch.

The last step of this process is to save the history to a csv file for later using the save_history function.

df_1H_25N = save_history(model_1H_25N_history, file_base + '.csv')
df_1H_25N
loss acc val_loss val_acc timing
0 0.964902 0.734717 0.431268 0.896752 11.662320
1 0.314722 0.926101 0.246079 0.937198 10.582706
2 0.212241 0.942031 0.187679 0.948916 10.509608
3 0.168288 0.953992 0.152184 0.959975 10.520312
4 0.138295 0.965651 0.127702 0.969313 10.508951
5 0.117485 0.971816 0.110274 0.973158 10.545474
6 0.102012 0.975712 0.096833 0.975941 10.639809
7 0.089740 0.979484 0.085769 0.979090 10.566517
8 0.079607 0.982349 0.076569 0.982459 10.566913
9 0.071393 0.985201 0.068914 0.985242 10.639429
val_acc_1H_25N = df_1H_25N['val_acc']
val_acc_1H_25N[1:] - np.array(val_acc_1H_25N[:-1])
1    0.040446
2    0.011718
3    0.011059
4    0.009338
5    0.003845
6    0.002783
7    0.003149
8    0.003369
9    0.002783
Name: val_acc, dtype: float64
model_1H_25N_20e = NN_Model_1H(hidden_neurons, learning_rate)
model_1H_25N_20e_history = model_1H_25N_20e.fit(train_features,
            train_L_onehot,
            epochs=20, batch_size=32,
            validation_data=(test_features, test_L_onehot),
            verbose=2, shuffle=False, callbacks=[epochTiming()])
Train on 218461 samples, validate on 54616 samples
Epoch 1/20
12.817866086959839
 - 13s - loss: 0.9649 - acc: 0.7347 - val_loss: 0.4313 - val_acc: 0.8968
Epoch 2/20
12.30324649810791
 - 12s - loss: 0.3147 - acc: 0.9261 - val_loss: 0.2461 - val_acc: 0.9372
Epoch 3/20
12.322084903717041
 - 12s - loss: 0.2122 - acc: 0.9420 - val_loss: 0.1877 - val_acc: 0.9489
Epoch 4/20
12.295423984527588
 - 12s - loss: 0.1683 - acc: 0.9540 - val_loss: 0.1522 - val_acc: 0.9600
Epoch 5/20
12.374478578567505
 - 12s - loss: 0.1383 - acc: 0.9657 - val_loss: 0.1277 - val_acc: 0.9693
Epoch 6/20
12.800549507141113
 - 13s - loss: 0.1175 - acc: 0.9718 - val_loss: 0.1103 - val_acc: 0.9732
Epoch 7/20
12.686912775039673
 - 13s - loss: 0.1020 - acc: 0.9757 - val_loss: 0.0968 - val_acc: 0.9759
Epoch 8/20
12.365178108215332
 - 12s - loss: 0.0897 - acc: 0.9795 - val_loss: 0.0858 - val_acc: 0.9791
Epoch 9/20
12.500024557113647
 - 13s - loss: 0.0796 - acc: 0.9823 - val_loss: 0.0766 - val_acc: 0.9825
Epoch 10/20
12.848876953125
 - 13s - loss: 0.0714 - acc: 0.9852 - val_loss: 0.0689 - val_acc: 0.9852
Epoch 11/20
12.811977624893188
 - 13s - loss: 0.0648 - acc: 0.9869 - val_loss: 0.0630 - val_acc: 0.9864
Epoch 12/20
12.866056442260742
 - 13s - loss: 0.0594 - acc: 0.9880 - val_loss: 0.0581 - val_acc: 0.9877
Epoch 13/20
12.79757571220398
 - 13s - loss: 0.0548 - acc: 0.9887 - val_loss: 0.0539 - val_acc: 0.9881
Epoch 14/20
12.82905125617981
 - 13s - loss: 0.0508 - acc: 0.9894 - val_loss: 0.0504 - val_acc: 0.9890
Epoch 15/20
13.115238428115845
 - 13s - loss: 0.0475 - acc: 0.9900 - val_loss: 0.0474 - val_acc: 0.9895
Epoch 16/20
13.115918397903442
 - 13s - loss: 0.0447 - acc: 0.9902 - val_loss: 0.0446 - val_acc: 0.9901
Epoch 17/20
13.50842833518982
 - 14s - loss: 0.0421 - acc: 0.9906 - val_loss: 0.0422 - val_acc: 0.9904
Epoch 18/20
13.225265264511108
 - 13s - loss: 0.0398 - acc: 0.9911 - val_loss: 0.0401 - val_acc: 0.9909
Epoch 19/20
12.915255784988403
 - 13s - loss: 0.0377 - acc: 0.9916 - val_loss: 0.0380 - val_acc: 0.9912
Epoch 20/20
13.49785828590393
 - 13s - loss: 0.0358 - acc: 0.9921 - val_loss: 0.0363 - val_acc: 0.9918
df_1H_25N_20e = save_history(model_1H_25N_20e_history, file_base + '_20e.csv')
df_1H_25N_20e
loss acc val_loss val_acc timing
0 0.964902 0.734717 0.431268 0.896752 12.817866
1 0.314722 0.926101 0.246079 0.937198 12.303246
2 0.212241 0.942031 0.187679 0.948916 12.322085
3 0.168288 0.953992 0.152184 0.959975 12.295424
4 0.138295 0.965651 0.127702 0.969313 12.374479
5 0.117485 0.971816 0.110274 0.973158 12.800550
6 0.102012 0.975712 0.096833 0.975941 12.686913
7 0.089740 0.979484 0.085769 0.979090 12.365178
8 0.079607 0.982349 0.076569 0.982459 12.500025
9 0.071393 0.985201 0.068914 0.985242 12.848877
10 0.064801 0.986895 0.063006 0.986396 12.811978
11 0.059403 0.987957 0.058100 0.987659 12.866056
12 0.054757 0.988652 0.053907 0.988135 12.797576
13 0.050837 0.989376 0.050407 0.988959 12.829051
14 0.047506 0.989962 0.047355 0.989545 13.115238
15 0.044734 0.990236 0.044609 0.990076 13.115918
16 0.042087 0.990639 0.042161 0.990351 13.508428
17 0.039794 0.991078 0.040074 0.990882 13.225265
18 0.037724 0.991564 0.038004 0.991175 12.915256
19 0.035765 0.992067 0.036256 0.991834 13.497858
val_acc_1H_25N_20e = df_1H_25N_20e['val_acc']
val_acc_1H_25N_20e[1:] - np.array(val_acc_1H_25N_20e[:-1])
1     0.040446
2     0.011718
3     0.011059
4     0.009338
5     0.003845
6     0.002783
7     0.003149
8     0.003369
9     0.002783
10    0.001154
11    0.001263
12    0.000476
13    0.000824
14    0.000586
15    0.000531
16    0.000275
17    0.000531
18    0.000293
19    0.000659
Name: val_acc, dtype: float64

What do we learn from here?

1) The more we train, the more accuracy we can get (subject to risk of overfitting, see below) 2) Sometimes we have to make a trade-off between doing more training (which can be very costly, so may not be possible), and conserving effort against “point of diminishing return”.

Where is the point of diminishing return? This depends on the application. In some application we may really want to get as close as possible to 100%, then we have no choice but train more (bite the bullet).

Some observations:

Now lets change the hidden neuron number to 36.

hidden_neurons = 36 # Change as we go
learning_rate = 0.0003 # Keep constant for now

model_1H_36N = NN_Model_1H(hidden_neurons, learning_rate)
model_1H_36N_history=model_1H_36N.fit(train_features,
            train_L_onehot,
            epochs=10, batch_size=32,
            validation_data=(test_features, test_L_onehot),
            verbose=2, shuffle=False, callbacks=[epochTiming()])
Train on 218461 samples, validate on 54616 samples
Epoch 1/10
9.579867124557495
 - 10s - loss: 0.8719 - acc: 0.7563 - val_loss: 0.3802 - val_acc: 0.8987
Epoch 2/10
9.064791202545166
 - 9s - loss: 0.2906 - acc: 0.9280 - val_loss: 0.2362 - val_acc: 0.9448
Epoch 3/10
9.14033031463623
 - 9s - loss: 0.2027 - acc: 0.9514 - val_loss: 0.1776 - val_acc: 0.9596
Epoch 4/10
9.1494140625
 - 9s - loss: 0.1582 - acc: 0.9637 - val_loss: 0.1413 - val_acc: 0.9656
Epoch 5/10
9.101559162139893
 - 9s - loss: 0.1281 - acc: 0.9703 - val_loss: 0.1162 - val_acc: 0.9693
Epoch 6/10
9.147216081619263
 - 9s - loss: 0.1063 - acc: 0.9737 - val_loss: 0.0976 - val_acc: 0.9724
Epoch 7/10
9.136689901351929
 - 9s - loss: 0.0907 - acc: 0.9770 - val_loss: 0.0844 - val_acc: 0.9784
Epoch 8/10
9.14473295211792
 - 9s - loss: 0.0790 - acc: 0.9799 - val_loss: 0.0742 - val_acc: 0.9800
Epoch 9/10
9.155896186828613
 - 9s - loss: 0.0698 - acc: 0.9829 - val_loss: 0.0662 - val_acc: 0.9832
Epoch 10/10
9.205716133117676
 - 9s - loss: 0.0624 - acc: 0.9852 - val_loss: 0.0595 - val_acc: 0.9876
plot_all(model_1H_36N_history, file_base + '.png')

png

Comparing the 25 neurons and 36 neurons, what has changed?

Now as an exercise, let’s change the hidden neuron number to 64, 80, 150 and compare the output results. By increasing the number of hidden layers above certain point, could the accuracy of our model be more precise and accurate? what changes are you seeing along the way?

Flexible Neural Network Model Generator

The flexible neural network model generator below will allow one to customize the num of layers, as well as how many neurons per layer, and the activation functions to use (except or the last one).

# Multiple hidden-layer model (Flexible num of layers)
def NN_Model_generate_seq(hidden_neurons, learning_rate, activation='relu', k_init_seeds=range(1001, 2000)):
    """Definition of deep learning model with one dense hidden layer"""
    layers = []
    kseeds = list(k_init_seeds)
    first = True  # for first layer, needing an input layer
    for (i,HN) in enumerate(hidden_neurons):
        k_init = keras.initializer.RandomNormal(mean=0.0, stddev=0.05, seed=kseeds[i])
        act_f = activation if (i+1 < len(hidden_neurons)) else 'softmax'  # output activation is always softmax
        if first:
            L = keras.layer.Dense(HN, activation=act_f, kernel_initializer=k_init, input_shape=(19,))
            first = False
        else:
            L = keras.layer.Dense(HN, activation=act_f, kernel_initializer=k_init)

    model = keras.models.Sequential(layers)
    adam = keras.optimizers.Adam(lr=learning_rate, beta_1=0.9, beta_2=0.999, amsgrad=False)
    model.compile(optimizer=adam,
              loss='categorical_crossentropy',
              metrics=['accuracy'])
    return model

2. Varying Learning Rate

Next, the learning rate is changed while all other parameters are kept constant. A one hidden layer configuration will be used along with 25 hidden neurons.

hidden_neurons = 25 # Keep constant
learning_rate = 0.0003 # Change
model_1H_25N_00003LR = NN_Model_1H(hidden_neurons, learning_rate)
model_1H_25N_00003LR_history=model_1H_25N_00003LR.fit(train_features,
            train_L_onehot,
            epochs=10, batch_size=32,
            validation_data=(test_features, test_L_onehot),
            verbose=2, shuffle=False, callbacks=[epochTiming()])
Train on 218461 samples, validate on 54616 samples
Epoch 1/10
9.745660305023193
 - 10s - loss: 0.9649 - acc: 0.7347 - val_loss: 0.4313 - val_acc: 0.8968
Epoch 2/10
9.350954294204712
 - 9s - loss: 0.3147 - acc: 0.9261 - val_loss: 0.2461 - val_acc: 0.9372
Epoch 3/10
9.2421555519104
 - 9s - loss: 0.2122 - acc: 0.9420 - val_loss: 0.1877 - val_acc: 0.9489
Epoch 4/10
9.254833936691284
 - 9s - loss: 0.1683 - acc: 0.9540 - val_loss: 0.1522 - val_acc: 0.9600
Epoch 5/10
9.265074968338013
 - 9s - loss: 0.1383 - acc: 0.9657 - val_loss: 0.1277 - val_acc: 0.9693
Epoch 6/10
9.288970232009888
 - 9s - loss: 0.1175 - acc: 0.9718 - val_loss: 0.1103 - val_acc: 0.9732
Epoch 7/10
9.288819074630737
 - 9s - loss: 0.1020 - acc: 0.9757 - val_loss: 0.0968 - val_acc: 0.9759
Epoch 8/10
9.35093355178833
 - 9s - loss: 0.0897 - acc: 0.9795 - val_loss: 0.0858 - val_acc: 0.9791
Epoch 9/10
9.327398777008057
 - 9s - loss: 0.0796 - acc: 0.9823 - val_loss: 0.0766 - val_acc: 0.9825
Epoch 10/10
9.357542037963867
 - 9s - loss: 0.0714 - acc: 0.9852 - val_loss: 0.0689 - val_acc: 0.9852
plot_all(model_1H_25N_00003LR_history, file_base + '.png')

png

hidden_neurons = 25 # Keep constant
learning_rate = 0.03 # Change

model_1H_25N_003LR = NN_Model_1H(hidden_neurons, learning_rate)
model_1H_25N_003LR_history=model_1H_25N_003LR.fit(train_features,
            train_L_onehot,
            epochs=10, batch_size=32,
            validation_data=(test_features, test_L_onehot),
            verbose=2, shuffle=False, callbacks=[epochTiming()])
Train on 218461 samples, validate on 54616 samples
Epoch 1/10
10.399159669876099
 - 10s - loss: 0.1542 - acc: 0.9642 - val_loss: 0.1222 - val_acc: 0.9770
Epoch 2/10
10.096969842910767
 - 10s - loss: 0.1069 - acc: 0.9822 - val_loss: 0.1217 - val_acc: 0.9788
Epoch 3/10
10.215007543563843
 - 10s - loss: 0.1069 - acc: 0.9849 - val_loss: 0.0950 - val_acc: 0.9902
Epoch 4/10
10.269595861434937
 - 10s - loss: 0.0978 - acc: 0.9871 - val_loss: 0.1637 - val_acc: 0.9821
Epoch 5/10
10.294110298156738
 - 10s - loss: 0.0918 - acc: 0.9882 - val_loss: 0.0891 - val_acc: 0.9856
Epoch 6/10
10.300595998764038
 - 10s - loss: 0.0986 - acc: 0.9883 - val_loss: 0.0595 - val_acc: 0.9933
Epoch 7/10
10.320054292678833
 - 10s - loss: 0.1034 - acc: 0.9880 - val_loss: 0.0710 - val_acc: 0.9899
Epoch 8/10
10.337472677230835
 - 10s - loss: 0.1041 - acc: 0.9885 - val_loss: 0.0779 - val_acc: 0.9918
Epoch 9/10
10.33635425567627
 - 10s - loss: 0.0986 - acc: 0.9896 - val_loss: 0.1064 - val_acc: 0.9854
Epoch 10/10
10.342047691345215
 - 10s - loss: 0.1024 - acc: 0.9897 - val_loss: 0.0934 - val_acc: 0.9901
plot_all(model_1H_25N_003LR_history, file_base + '.png')

png

As and experiment you should run the model with other learning rates such as 0.003, 0.3, 3.

Something odd about the last two experiments?! Why the model behaved erattically?

3. Varying Number of Hidden Layers

Now, we can put those previously defined functions that include different hidden layers to good use. The number of hidden neurons and learning rate will be kept constant at 25 and 0.0003 respectively.

hidden_neurons = 25 # Constant
learning_rate = 0.0003 # Constant

No Hidden Layer

model_0H_25N = NN_Model_0H(learning_rate)
model_0H_25N_history=model_0H_25N.fit(train_features,
            train_L_onehot,
            epochs=10, batch_size=32,
            validation_data=(test_features, test_L_onehot),
            verbose=2, shuffle=False, callbacks=[epochTiming()])
Train on 218461 samples, validate on 54616 samples
Epoch 1/10
9.44406008720398
 - 9s - loss: 1.6992 - acc: 0.5481 - val_loss: 1.2650 - val_acc: 0.6864
Epoch 2/10
8.840239524841309
 - 9s - loss: 1.1140 - acc: 0.7376 - val_loss: 1.0039 - val_acc: 0.7621
Epoch 3/10
8.877403497695923
 - 9s - loss: 0.9271 - acc: 0.7866 - val_loss: 0.8685 - val_acc: 0.7893
Epoch 4/10
8.862443447113037
 - 9s - loss: 0.8179 - acc: 0.8051 - val_loss: 0.7801 - val_acc: 0.8058
Epoch 5/10
8.84777545928955
 - 9s - loss: 0.7428 - acc: 0.8176 - val_loss: 0.7165 - val_acc: 0.8167
Epoch 6/10
8.850785970687866
 - 9s - loss: 0.6875 - acc: 0.8271 - val_loss: 0.6684 - val_acc: 0.8253
Epoch 7/10
8.828944683074951
 - 9s - loss: 0.6446 - acc: 0.8380 - val_loss: 0.6301 - val_acc: 0.8417
Epoch 8/10
9.015966653823853
 - 9s - loss: 0.6101 - acc: 0.8573 - val_loss: 0.5990 - val_acc: 0.8584
Epoch 9/10
8.916113376617432
 - 9s - loss: 0.5818 - acc: 0.8699 - val_loss: 0.5732 - val_acc: 0.8722
Epoch 10/10
8.908658266067505
 - 9s - loss: 0.5581 - acc: 0.8775 - val_loss: 0.5515 - val_acc: 0.8759
plot_all(model_0H_25N_history, file_base + '.png')

png

One Hidden Layer

model_1H_25N = NN_Model_1H(hidden_neurons, learning_rate)
model_1H_25N_history=model_1H_25N.fit(train_features,
            train_L_onehot,
            epochs=10, batch_size=32,
            validation_data=(test_features, test_L_onehot),
            verbose=2, shuffle=False, callbacks=[epochTiming()])
Train on 218461 samples, validate on 54616 samples
Epoch 1/10
10.340590953826904
 - 10s - loss: 0.9649 - acc: 0.7347 - val_loss: 0.4313 - val_acc: 0.8968
Epoch 2/10
9.678043127059937
 - 10s - loss: 0.3147 - acc: 0.9261 - val_loss: 0.2461 - val_acc: 0.9372
Epoch 3/10
9.683821678161621
 - 10s - loss: 0.2122 - acc: 0.9420 - val_loss: 0.1877 - val_acc: 0.9489
Epoch 4/10
9.708028078079224
 - 10s - loss: 0.1683 - acc: 0.9540 - val_loss: 0.1522 - val_acc: 0.9600
Epoch 5/10
9.712745428085327
 - 10s - loss: 0.1383 - acc: 0.9657 - val_loss: 0.1277 - val_acc: 0.9693
Epoch 6/10
9.731047630310059
 - 10s - loss: 0.1175 - acc: 0.9718 - val_loss: 0.1103 - val_acc: 0.9732
Epoch 7/10
9.759651899337769
 - 10s - loss: 0.1020 - acc: 0.9757 - val_loss: 0.0968 - val_acc: 0.9759
Epoch 8/10
9.786976337432861
 - 10s - loss: 0.0897 - acc: 0.9795 - val_loss: 0.0858 - val_acc: 0.9791
Epoch 9/10
9.790604829788208
 - 10s - loss: 0.0796 - acc: 0.9823 - val_loss: 0.0766 - val_acc: 0.9825
Epoch 10/10
9.79500699043274
 - 10s - loss: 0.0714 - acc: 0.9852 - val_loss: 0.0689 - val_acc: 0.9852
plot_all(model_1H_25N_history, file_base + '.png')

png

Two Hidden Layers

model_2H_25N = NN_Model_2H(hidden_neurons, learning_rate)
model_2H_25N_history=model_2H_25N.fit(train_features,
            train_L_onehot,
            epochs=10, batch_size=32,
            validation_data=(test_features, test_L_onehot),
            verbose=2, shuffle=False, callbacks=[epochTiming()])
Train on 218461 samples, validate on 54616 samples
Epoch 1/10
11.330268859863281
 - 11s - loss: 0.8048 - acc: 0.7636 - val_loss: 0.3162 - val_acc: 0.9346
Epoch 2/10
10.720322132110596
 - 11s - loss: 0.2359 - acc: 0.9482 - val_loss: 0.1851 - val_acc: 0.9607
Epoch 3/10
10.687806129455566
 - 11s - loss: 0.1547 - acc: 0.9649 - val_loss: 0.1288 - val_acc: 0.9703
Epoch 4/10
10.726382970809937
 - 11s - loss: 0.1116 - acc: 0.9742 - val_loss: 0.0951 - val_acc: 0.9763
Epoch 5/10
10.76809549331665
 - 11s - loss: 0.0856 - acc: 0.9812 - val_loss: 0.0742 - val_acc: 0.9813
Epoch 6/10
10.811238765716553
 - 11s - loss: 0.0692 - acc: 0.9861 - val_loss: 0.0614 - val_acc: 0.9857
Epoch 7/10
10.848239421844482
 - 11s - loss: 0.0582 - acc: 0.9885 - val_loss: 0.0527 - val_acc: 0.9886
Epoch 8/10
10.884867429733276
 - 11s - loss: 0.0503 - acc: 0.9898 - val_loss: 0.0459 - val_acc: 0.9895
Epoch 9/10
10.871060132980347
 - 11s - loss: 0.0439 - acc: 0.9910 - val_loss: 0.0408 - val_acc: 0.9904
Epoch 10/10
11.004343748092651
 - 11s - loss: 0.0392 - acc: 0.9920 - val_loss: 0.0369 - val_acc: 0.9918
plot_all(model_2H_25N_history, file_base + '.png')

png

Other hyperparameters capable of tuning the model

1. Activation Functions

As introduced in neural network module lesson “2. Overview of Deep Neural Network Concepts”, An activation function is a nonlinear function used to activate neurons through the forward propagation of each epoch. Some common activation functions are softmax, ReLU, and sigmoid.

To get familiar with layer properties, let’s examine ‘Dense’ command to see layer properties and the role of actication function.

Try Dense?

Now, let’s change our one hidden layer function so that the user can introduce different activation function as input variable:

# One hidden layer model
def NN_Model_1H(hidden_neurons,learning_rate, activ_func):
    """Definition of deep learning model with one dense hidden layer"""
    model = Sequential([
        # More hidden layers can be added here
        Dense(hidden_neurons, activation=activ_func,input_shape=(19,),kernel_initializer='random_normal'), # Hidden Layer
        Dense(18, activation='softmax') # Output Layer
    ])
    adam=tf.keras.optimizers.Adam(lr=learning_rate, beta_1=0.9, beta_2=0.999, amsgrad=False)
    model.compile(optimizer=adam,
              loss='categorical_crossentropy',
              metrics=['accuracy'])
    return model
hidden_neurons = 25
learning_rate = 0.0003
activ_func = 'sigmoid'

model_1H_25N_sigmoid = NN_Model_1H(hidden_neurons, learning_rate, activ_func)
model_1H_25N_sigmoid_hist=model_1H_25N_sigmoid.fit(train_features,
            train_L_onehot,
            epochs=10, batch_size=32,
            validation_data=(test_features, test_L_onehot),
            verbose=2, callbacks=[epochTiming()])
Train on 218461 samples, validate on 54616 samples
Epoch 1/10
10.653699398040771
 - 11s - loss: 1.7104 - acc: 0.4627 - val_loss: 1.1281 - val_acc: 0.6716
Epoch 2/10
10.089034080505371
 - 10s - loss: 0.8643 - acc: 0.7637 - val_loss: 0.6650 - val_acc: 0.8316
Epoch 3/10
10.155736207962036
 - 10s - loss: 0.5434 - acc: 0.8769 - val_loss: 0.4538 - val_acc: 0.9048
Epoch 4/10
10.080292224884033
 - 10s - loss: 0.3940 - acc: 0.9147 - val_loss: 0.3504 - val_acc: 0.9208
Epoch 5/10
10.075173139572144
 - 10s - loss: 0.3178 - acc: 0.9263 - val_loss: 0.2944 - val_acc: 0.9272
Epoch 6/10
10.07681941986084
 - 10s - loss: 0.2742 - acc: 0.9321 - val_loss: 0.2596 - val_acc: 0.9360
Epoch 7/10
10.05233883857727
 - 10s - loss: 0.2454 - acc: 0.9370 - val_loss: 0.2352 - val_acc: 0.9374
Epoch 8/10
10.064870119094849
 - 10s - loss: 0.2241 - acc: 0.9404 - val_loss: 0.2166 - val_acc: 0.9418
Epoch 9/10
10.0713951587677
 - 10s - loss: 0.2073 - acc: 0.9444 - val_loss: 0.2014 - val_acc: 0.9505
Epoch 10/10
10.076664447784424
 - 10s - loss: 0.1931 - acc: 0.9496 - val_loss: 0.1877 - val_acc: 0.9505
df_sig = save_history(model_1H_25N_sigmoid_hist, 'model_1H_25N_sigmoid.csv')
df_sig
loss acc val_loss val_acc timing
0 1.710428 0.462728 1.128063 0.671598 10.653699
1 0.864262 0.763720 0.664978 0.831588 10.089034
2 0.543418 0.876921 0.453834 0.904826 10.155736
3 0.394032 0.914722 0.350401 0.920792 10.080292
4 0.317782 0.926289 0.294365 0.927201 10.075173
5 0.274197 0.932075 0.259596 0.935989 10.076819
6 0.245356 0.936959 0.235241 0.937418 10.052339
7 0.224126 0.940442 0.216603 0.941812 10.064870
8 0.207300 0.944448 0.201437 0.950546 10.071395
9 0.193108 0.949616 0.187718 0.950509 10.076664

What will happen if we change the activation function in above experiment to other methods such as ‘Softmax’ and ‘ReLu’?

2. Optimizers

In neural networks, one key component is the process of adjusting weights through a process called backpropagation. This process uses a gradient descent algorithm. These algorithms can be used to fine tune the backpropagation process in a way that user wants their neural network to behave. Let’s see how many optimizers are introduced in refining a neural network with the following command:

tf.keras.optimizers??

Adam optimizer is one of highly recommended algorithms used in neural network backpropagation process. But this does not mean other algorithms are not good. it is recommended to choose an optimizer based on the properties of each algorithm and the goal of the neural network.

Underfitting and Overfitting

Overfitting occurs when the model has good performance with the training data, but poor performance with validation data. On the other hand, underfitting occurs when the model has poor performance overall, with training and validation. The goal is to “fit” the model in between underfitting and overfitting. Generally, overfitting is much more common than underfitting because underfitting is usually a result of not enough training epochs, whereas overfitting is a lot more mysterious.

More on overfitting and underfitting: Overfitting and Underfitting

Summary

By going through this notebook it will be obvious that creating a deep learning experiment using the jupyter notebook will get messy and laborious. by the end of this notebook you will learn how to utilize scripting as well as using the power of HPC to alleviate the pain of executing block by block of jupyter notebook and make this experiment fast.

How was the experiment of model tuning so far? frustrating? confusing? So, let’s be honest, any machine learning experiment is going to have some level of uncertainty when dealing with unseen data. Especially when you don’t know how your model will respond to your features. This makes the model tuning process a must process!. In this notebook, by learning about different neural networks hyperparameters and tunning some of them including hidden layers, hidden neurons, learning rate, etc., we tried to monitor the effects on timing and accuracy. These timing and accuracy results were also visualized graphically and saved to csv format for later.

Now, I want to shift your attention from the model tuning process to the platform in which we did our experiment. The jupyter notebook is an excellent platform to create code, experiment on and get the results. but I don’t know about you but for me, the single cell to single cell execution of commands seems tedious. In the beginning of this notebook I have touched on how doing experiments like we did so far can get messy in jupyter notebook. So, I highly recommend to do such experiments with scripting.

In the next lesson, we will touch on how you can convert an existing jupyter notebook to a fine python script that can be executed by HPC without constant interaction from the user.

Further Research:

Key Points

  • Neural network models are tuned by tweaking the architecture and tuning the training hyperparameters.

  • How scripting works by converting the notebook to job scripts