Deep Learning in the Real World

Overview

Teaching: 0 min
Exercises: 0 min

Questions

What kinds of layer can be built into a neural network?

Based on different questions, what kind of layers should we used?

Objectives

Learning more advanced layer design

Neural Networks in Realistic Applications

Today, deep learning has been applied in many sophisticated applications which were previously thought impossible to do by computers. These include

Face recognition, such as that used in smartphone face unlock;
Object recognition, such as that used in computer vision system and self-driving cars;
Text recognition, used in document scanning technologies;
Voice recognition (e.g. Google Home, Alexa, smartphone voice input, and similar voice command systems);
Conversational systems (think of: chat bots, ChatGPT);
Tumor and other abnormality detection from medical imaging (MRI, CAT scan, X-ray…).

In cybersecurity, neural-network models have been deployed to filter spam emails, network intrusion and threat detection, detection of malware on a computer system, among other uses. Some real-world examples are listed in the following articles: “Five Amazing Applications of Deep Learning in Cybersecurity”; as well as “Google uses machine learning to catch spam missed by Gmail filters”.

These sophisticated applications call from more complex network architectures, which include additional types of layers to help these networks in identifying patterns from spatial data points (2-D or 3-D images), learning from sequences of data points (event correlation analysis, voice and video analysis). In this episode, we will briefly touch on various layers (beyond traditional dense neuron layers) that are used in state-of-the-art neural network models. We will also present an overview of network architectures and their applications.

Types of Layers in Neural Networks

In order to build a neural network, one needs to know the different kinds layers which make up the building block of a neural network. In addition to the fully connected neuron layer (described in the previous episode), which constitutes the “thinking” part of the network, there are other types of layers that can provide spatial or temporal perception, “memory”, etc.

Dense layer (fully connected layer) is the classic neural network, in which every neuron connects to every point in the previous layer, and also connects to every point in the next layer. The previous or subsequent layer could be a neuron layer, or another kind of layer.
Convolution layer: A layer that is designed to capture local correlations in data that has spatial, temporal, or sequential order. This layer is frequently used in image recognition, speech recognition, etc. Convolution is able to learn some statistical features of the object/pattern that may appear anywhere in the input (think of a dog that may appear in the left side, right side, or anywhere else in the input picture). Here is an illustration of convolution from a tutorial by Stanford University.
Batch Normalization: Normalizes the values in its input back to the vicinity of 1. In learning about pattern in the data, relative differences among data points are more important than the absolute values. Further, normalization is important to make the training algorithm behave well (i.e. avoiding numerical instability). First, this layer calculates the mean and variation of the data, then normalizes each value by subtracting the mean and dividing it by the standard variation. This is an effort to “normalize” the features to the vicinity of 1.
Pooling: It is commonly inserted between convolution layer in a ConvNet architecture. The Pooling Layer operates independently on every depth slice of the input and resizes it spatially, using the MAX operation. The goal is to down-sampling the input representation. It is used in both training and evaluation phase.
Dropout: Randomly remove part of neurons and their corresponding connection. The goal is to reduce unnecessary feature dependencies and avoid overfitting in a neural network. This random removal is only done during the training phase; during inference, all neurons and connections will be turned on as usual.

Activation Functions

There are several types of activation functions that can be used in modeling the neurons.

Some popular activation functions

(credit: Medium user @krishnakalyan3)

Sigmoid function is the classic choice in the older days of neural network. Recently, the ReLU (rectified linear unit) function has gained popularity because it is much cheaper to compute. Even more recently, however, the ELU (exponential linear unit) gained more popularity as it has been shown to perform best on some benchmark data. We will be using ELU in our hands-on.

Building a Neural Network

Building an appropriate network for a given task requires intuition and many experimentations to pick the best network. We are not going to build a network in this training, as it requires more knowledge on how each layer works, and how the combinations work together.

There are many tutorials and courses on deep learning which covers this topic (see the References section). A brief article by Jason Brownlee covers some of the approach, with some pointers for further reading.

In reality, it will take a lot of trial-and-error to build a network that perform best for a certain task. An article by Naoki Shibuya, titled Pipelines, Mind Maps and Convolutional Neural Networks, demonstrates the discipline that a data scientist had to exert over himself and his impulses so that he could reach his end-goal more efficiently.

Other neural network architectures

There are some more advanced layers which we will mention because it will be used in some exercises:

Residual neural network (ResNet): can skip connections, or shortcuts to jump over some layers. It is used to avoid the problem of gradient vanishing. It can be used as a backbone for computer vision tasks. Here we show a building block of residual learning.
Recurrent Neural Networks (RNN): Neurons are fed information not just from the previous layer but also from themselves from the previous pass. Each hidden cell received it’s own output with fixed delay –one or more iterations. This layer mainly used in when context is important like unsegmented, connected handwriting recognition or speech recognition tasks. RNN is good for processing sequence data for predictions but suffers from short-term memory. Here is an illustation of RNN
Long Short-Term Memory (LSTM): It is a modified version of recurrent neural networks by introducing gates and an explicitly defined memory cell, which solves gradient vanishing problem and makes it easier to remember past data in memory. In a nutshell, this layer provides some “remembering and forgetting” capability to a neural network to safeguard the information by stopping or allowing the flow of it. LSTM is well-suited to classify, process and predict time series given time lags of unknown duration. It is used for time-dependent problems such as speech recognition, natural language processing, etc.

Here are cheetsheets of convolutional neural network and Recurrent Neural Network from Stanford University.

A Zoo of Neural Networks

There is an interesting chart by Fjodor Van Veen for well-known neural network types. A nice brief explanation on each one of them can be found in this article.

Key Points

Layer design

previous episode

DeapSECURE module 4: Deep Learning (Neural Network)

lesson home