This lesson is still being designed and assembled (Pre-Alpha version)

Overview of Deep Neural Network Concepts

Overview

Teaching: 0 min
Exercises: 0 min
Questions
  • What is a neuron?

  • How does a neuron work?

  • What is a neural network?

  • How is a neural network trained?

Objectives
  • The basic components of a neural network.

  • The general idea of model construction.

From Neuron to Neural Networks

Deep Neural Networks (or Neural Networks for brevity) are inspired by the structure of our brain. The human brain is an object of wonder: It contains 100 billion neurons and over 100 trillion synapses (special connectors between neurons). It is capable of performing very high cognitive functions such as recognizing the objects that our eyes see, comprehending the sounds that our ears receive, and even reading and speaking!

Artist's rendition of a biological neural network

Figure: Artist's rendition of a biological neural network (credit: Max Pixel)

In a nutshell, a neuron contains one or more “input terminals” (dendrites is the biological term), and one “output terminal” (axon). Signals received from the dendrites will excite the neuron to produce an output signal. Not all inputs are equal in importance; some inputs may more strongly influence the neuron compared to others.

An illustration of a single neuron

Figure: An illustration of a single neuron (credit: Wikipedia user BruceBlaus)

An Artificial Neural Network (often simply called a Neural Network) is essentially a mathematical function with one or more inputs and one or more outputs. It comprises many artificial neurons that are interconnected in a particular way, with the goal of producing a correct set of responses upon receiving a set of inputs. Each neuron is essentially a mathematical object whose behavior is inspired by how biological neurons work. Neural networks are used for classification tasks. For example, a properly trained Convolutional Neural Network can take an image of a dog and classify the output as a dog (as opposed to a cow). In this case,

A Model for One Neuron: Input, Layer, Activation Function, Output

Here is a neuron model that is widely used in Neural Network applications.

For illustrative purposes, let’s consider a neuron that takes three inputs i(1), i(2), i(3) and produces one output o(1). The three inputs and one output respectively form the input layer and output layer. Generally, a layer is the highest-level building block in deep learning. It is a container that usually has a set of elements either sending values to another layer or receiving values from another layer. The “wires” that connect the input signals to the neuron have different strength factors (usually called weights), i.e. w(1), w(2), w(3). Diagramatically, the neuron looks like this:

3-input neuron

The inputs are combined in a linear fashion at the neuron input point:

z = i(1)*w(1) + i(2)*w(2) + i(3)*w(3)

You may recognize that this is just a dot product. (There is a bias factor i(0) that is omitted in this illustration; but this does not compromise the point illustrated here.) The output is obtained by applying a nonlinear function to this combined input. (This function is commonly called an activation function in neural network.) This nonlinearity is the crucial feature of a Neural Network; it allows complicated features of different categories to be separated and distinguished. (This blog article by Chris Olah offers some illustrative examples.) A classic choice for the nonlinear function is the Sigmoid function:

g(z) = 1 / (1 + exp(-z))

This illustration shows the shape of the Sigmoid function (where z is located at the horizontal axis):

Shape of sigmiod function

Activation Functions

There are several other activation functions that are widely used in neural networks: Shape of other activation functions

If the parameters w(1)...w(3) are known, then computing the input is as simple as using this one-line Python expression:

out = 1 / (1 + numpy.exp(-numpy.dot(w, i)))

How can we make this neuron predict the output o(1) given a set of inputs i(1)...i(3)? We need to train it according to some examples of inputs and outputs. Here, we follow a simple illustrative example (I’m drawing this from the Medium articles by Milo Spencer-Harper and Andrew Trask). For simplicity, each of the inputs and outputs is binary (either one or zero). Here is an example of input sets and the expected corresponding outputs (“outcomes”).

Case number i(1) i(2) i(3) o(1)
1 0 0 1 0
2 1 1 1 1
3 1 0 1 1
4 0 1 1 0

Here’s the machine learning question: What should the output be for the following input?

Case number i(1) i(2) i(3) o(1)
5 1 0 0 ?

Training Phase of a Neural Network: Loss Function and Backpropagation

What is done in the training phase? In the example above, training simply involves an iterative procedure to adjust the weights (w(1), w(2) and w(3)). An achievable goal for training the neuron is, therefore, to be able to reproduce the output of the training data above as closely as possible.

The overall dataflow of the training phase is shown in the following figure.

Diagram for training.

In general, the training phase of a neural network includes the following procedures:

  1. Start with an initial guess for the network weights or parameters. (Sometimes, a guess from a similar kind of network (a well-trained network with a similar structure) can be used, which will tremendously speed up the training process.)

  2. Compute the predicted outcome (akin to o(1) in the single-neuron case) for every training input.

  3. Compute the loss of the training data by comparing the predicted outcome against the labels of the training data. To quantify the loss, a loss function is used to provide a collective measure of how far off the predictions are (o(1)) with respect to the expected output (the label of training data). A large value of the loss function means a large discrepancy between the predicted and expected label values. Therefore, the training phase has the goal of minimizing the loss function of the network.

  4. Apply a correction procedure to update the weights so as to bring the predicted outcome closer to the expected outcome (the training label). For neural networks, an algorithm called backpropagation is used to compute the necessary corrections to the weights.

  5. Repeat steps 2-4 until certain criteria are met (e.g. after N number of iterations or after the loss function falls below a certain threshold).

Optimization for Training: Backpropagation Optimizer

In order to efficiently obtain the adjusted weights (or parameters), many optimizers have been proposed.

Since the topic of neural networks is quite extensive, we will not discuss in much detail technical matters such as the loss function (point 3 above), weight correction procedure (point 4), or convergence criteria (point 5). Interested readers are encouraged to pursue these matters on their own. The programs provided for hands-on activities will provide some reasonable starting points.

Underfitting and Overfitting: Variance vs Bias

We mentioned earlier that losses are an inherent part of machine learning. Unless we have perfect data and the perfect model to describe the data, we are unlikely to perfectly describe our data. The best model obtained via machine learning must strike a balance between two extremes: variance and bias. This balance is very important if we want our model to be generalizable to new cases, i.e. to input data not seen before.

Either of these extremes will prevent the model from giving a good prediction.

Inference (Prediction)

After a network is trained, it is ready to be used to perform its job (such as recognizing images, differentiating between good and harmful network traffic, etc.). This process is often called inference. Compared to training, inference is computationally much cheaper. The overall diagram for inference is shown in the following. Inference Diagram

Deep(er) Neural Network: Putting More Neurons Together

There is not much that a single neuron can do. One obvious limitation is that a single neuron can only produce a binary prediction (i.e. a “yes” or a “no”). Much greater prediction power is obtained by constructing a network of neurons. Here is an example illustration of how one can construct a network comprising many neurons:

Neural network illustration

Figure: An illustration of a simple neural network with one hidden layer (credit: Wikimedia user Glosser.ca, with modification)

This is called a fully connected dense network. The network in this illustration has three layers:

  1. the input layer, colored yellow;

  2. one hidden layer, colored green;

  3. the output layer, colored orange-red.

The input layer does not perform any computation; hence, this network really has only two neuron layers, consisting of a total of six neurons (four in the hidden layer plus two in the output layer). Neuron layers are those that have the “thinking” capability in the network. A similar training flow can be applied to a deeper neural network.

Hardware Optimization for Deeper Neural Network

The process of training is computationally very intensive, as we will experience in this workshop. For this reason, people often use highly parallel computing hardware such as supercomputers or graphics processing units (GPUs).

Key Points

  • Deep neural networks basically have linear and nonlinear parts.

  • Backpropagation is used to adjust the network parameters.

  • An HPC can be adopted to speed up the training and inference processes.