DeapSECURE module 4: Deap Learning
Welcome to the DeapSECURE online training program! This is a Jupyter notebook for the hands-on learning activities of the "Deap Learning" module, episode 4: Please visit the DeapSECURE website to learn more about our training program.
In this notebook, we will learn how to use Keras framework to build a very simple "binary classfication model". We will build a one-neuron model to perform the "application classification task" using the SherLock's "2-apps" dataset introduced in the "Machine Learning" module. A single neuron is the simplest neural network model for this classification task, because there is only one output needed to distinguish the two different apps.
QUICK LINKS
If you are opening this notebook from the Wahab OnDemand interface, you're all set.
If you see this notebook elsewhere, and want to perform the exercises on Wahab cluster, please follow the steps outlined in our setup procedure.
Get the necessary files using commands below within Jupyter:
mkdir -p ~/CItraining/module-nn
cp -pr /shared/DeapSECURE/module-nn/. ~/CItraining/module-nn
cd ~/CItraining/module-nn
The file name of this notebook is NN-session-1.ipynb
.
Throughout this notebook, #TODO
is used as a placeholder where you need to fill in with something appropriate.
To run a code in a cell, press Shift+Enter
.
Summary table of the commonly used indexing syntax from our own lesson.
We recommend you open these on separate tabs or print them; they are handy help for writing your own codes.
Next step, we need to import the required libraries into this Jupyter Notebook:
pandas
, numpy
,matplotlib.pyplot
,sklearn
and tensorflow
.
For Wahab cluster only: before importing these libraries, we have to load the DeapSECURE
environment module:
# Run to load environment modules on HPC
module("load", "DeapSECURE")
Few additional modules need to be loaded:
cuda
module for the calculations on GPUpy-tensorflow
module for the TensorFlow and Kerasmodule("load", "cuda")
module("load", "py-tensorflow")
module("list")
Now we can import all the required modules into Python:
"""Import the necessary Python modules""";
import os
import sys
import pandas
import numpy
import seaborn
from matplotlib import pyplot
import sklearn
# tools for machine learning:
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
# for evaluating model performance
from sklearn.metrics import accuracy_score, confusion_matrix
# classic machine learning models:
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
# TensorFlow
import tensorflow
import tensorflow.keras as keras
# KERAS objects
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras import optimizers
%matplotlib inline
# Some advanced learners may like to use shortcuts,
# so we give them here:
pd = pandas
np = numpy
plt = pyplot
sns = seaborn
import tensorflow as tf
First, we load the SherLock's "2-apps" preprocessed features and labels into DataFrames. We use the reduced set of features saved at the end of the "Machine Learning" module.
df2_features = pd.read_csv('sherlock/2apps_4f/sherlock_2apps_features.csv')
df2_labels = pd.read_csv('sherlock/2apps_4f/sherlock_2apps_labels.csv')
After preprocessing and feature selection, we only have 4 features, namely: cutime
,num_threads
,otherPrivateDirty
,priority
.
The label has two values: 0
representing Facebook, and 1
WhatsApp.
As we do in the ML module, we first split the data into training and testing sets.
train_F, test_F, train_L, test_L = train_test_split(df2_features, df2_labels, test_size=0.2)
Keras is a powerful, high-level framework to develop and deploy neural network models in Python. Keras is intuitive to use, allowing rapid prototyping, experimentation, as well as deployment of deep learning models for real-world problems. Keras began as a high-level interface to several lower-level software frameworks such as Theano and TensorFlow; however, newer versions are built exclusively for TensorFlow. In this notebook, we show how easy it is to define, train, evaluate, and deploy neural networks with Keras.
The steps involved in deep learning are very similar to the steps of traditional machine learning:
Of these steps, the second and third steps will require the Keras-specific objects and functions. Keras model object
There are mainly two ways that we can build models in Keras:
As the name suggests, the sequential model create models layer-by-layer, where the outputs from the previous layer simply connect to the input of the subsequent layer. Please refer to Keras documentation on the Sequential model to learn more.
Limitation of a sequential model:
The functional model provides a way to create an arbitrarily complicated models that include shared layers, or layers with multiple inputs and/or outputs. In this series of notebooks, we will focus on Keras sequential model. Once we understand how to build a network with the sequential model, it is straightforward to learn the other model.
Let us create a function to construct a neural network with Keras.
This model has four inputs defined by the SherLock "2-apps" dataset and one output to distinguish between the two applications: Facebook and WhatsApp.
This function will be called NN_binary_clf
(clf is the short for "classifier"):
def NN_binary_clf(learning_rate):
"""Create a one-neuron binary classifier using Keras"""
model = Sequential([
Dense(1, activation='sigmoid',input_shape=(4,))
])
adam = tf.keras.optimizers.Adam(lr=learning_rate,
beta_1=0.9, beta_2=0.999, amsgrad=False)
model.compile(optimizer=adam,
loss='binary_crossentropy',
metrics=['accuracy'])
return model
This function builds a Sequential
model (full object name: tensorflow.keras.models.Sequential
).
The model has only one layer defined by this declaration:
Dense(1, activation='sigmoid', input_shape=(4,))
The Dense
function declares a regular fully-connected neural layer, which can be a hidden layer or an output layer.
The arguments have the following meaning:
1
: the number of outputs from this layer, which also defines the number of fully connected neurons in this layer.
activation='sigmoid'
defines the (nonlinear) activation function used to transform the weighted sum of the input values to the output values.
input_shape=(4,)
defines that this layer connects to the input layer that has four inputs.
Please see Keras' documentation for the Dense layer for more information and additional parameters.
In the NN_binary_clf
function, this dense layer is the first and last layer in the model.
The next line in the function above,
adam = tf.keras.optimizers.Adam(lr=learning_rate,
beta_1=0.9, beta_2=0.999, amsgrad=False)
defines an optimizer to use to train the model, i.e. to minimize the loss function. We use the Adam optimizer, which is a "stochastic gradient descent method that is based on adaptive estimation of first-order and second-order moments" (ref). This is the go-to optimizer by many deep learning practitioners. The critical parameter here is the learning rate, which determines how fast the model "learn" based on the feedback from the previous iteration.
The last line compiles the model, by integrating it with the other key component of a network, which is the loss function:
model.compile(optimizer=adam,
loss='binary_crossentropy',
metrics=['accuracy'])
The loss function is one of the important components of neural networks. Loss is nothing but a prediction error of neural net. And the method to calculate the loss is called loss function. In simple words, the Loss is used to calculate the gradients. And gradients are used to update the weights of the Neural Net. This is how a Neural Net is trained. the followings are essential loss functions which could be used for most of the models. (Towardsdatascince)
* Mean Squared Error (MSE)
* Binary Crossentropy (BCE)
* Categorical Crossentropy (CC)
* Sparse Categorical Crossentropy (SCC)
Then, we use a model object to call NN_binary_clf
function and start the fitting process:
epochs: The number of epochs is a hyperparameter that defines the number times that the learning algorithm will work through the entire training dataset.
batch size: The number of examples from the training dataset used in the estimate of the error gradient is called the batch size and is an important hyperparameter that influences the dynamics of the learning algorithm.
Loss function used: binary_crossentropy
Optimizer used: Adam optimizer.
Because data validation is part of this syntax, there is no need to write seperate data validation codes.
model = NN_binary_clf(0.0003)
model_history = model.fit(train_F, train_L,
epochs=5, batch_size=32,
validation_data=(test_F, test_L),
verbose=2)
Our model had only input layers and output layer with no hidden layer in between which in this scenario it works similar to a logistic regression. Therefore, we should expect a fairly low accuracy outcome. The porpuse of neural networks is to add as much layers feasible to achieve the best result possible.
This output has 5 iteration or epochs, in each epoch the model went through the training data once and fits it. as the result shows, our first epochs took 26 seconds to complete. our loss is 0.3464 and the accuracy is 0.8501. What is important here in terms of data validation, is the 'val_loss' and 'val_acc'.
Question: Why val_loss and val_acc are important?
The answer is, 'val_loss' is the value of loss for your validation data and 'loss' is the value of loss for your training data. Also, 'acc' is the accuracy on the training data and 'val_acc' is the accuracy on the validation data.
So far, we did import necessary libraries, loaded our preprocessed dataset and fitted our model using keras. Comparing our data validation to previous machine learning models we see our neural network with no hidden layer did a worse job compared to decision tree and it is identical to logistic regression. Thus,
Which model performed better so far; decision tree, logistic regression or neural networks?
Which model trained faster?
Why in this example neural networks performed worse?
How can we improve the performance of neural networks?