This lesson is still being designed and assembled (Pre-Alpha version)

DeapSECURE module 3: Machine Learning: Setup and Hands-on Files

Workshop Resources

The machine learning and neural network lessons share the same set of research problems, datasets, and hands-on files. The notebooks linked below are for this neural network lesson, but the ZIP files contain everything needed to learn both lessons.

Obtaining Hands-on Materials

If you are taking this training using ODU’s Wahab cluster, please read through the instructions on launching a Jupyter session via Open OnDemand and copying the hands-on files in order to set up your own copy of the files in your own home directory on the cluster.xs

The downloadable resources below are made available here for the general public to use on their own computers. These were taken from the online workshop series in the Summer of 2021 (a.k.a. “WS-2020-2021”).

To download the notebooks and the hands-on files, please right-click on the links below and select “Save Link As…” or a similar menu.

Resources: Jupyter Notebooks

(The HTML files were provided for convenient web viewing.)

2020-2021 Workshop Notes

The list of notebooks above reflects the actual sequence of notebooks taken during ML session of the 2020-2021 workshop series. The first notebook below is actually a notebook from the Big Data lesson on Data Wrangling and Visualization.

Resources: Hands-on Package

The hands-on files are packed in ZIP format. The three ZIP files above are mandatory. To reconstitute: Unzip all the files, preserving the paths, into the same destination directory.

Setting Up Hands-On Files

Taking Both ML and NN Lessons?

The hands-on ZIP packages above contain exercise files for both the DeapSECURE’s ML and NN lessons.

The DeapSECURE hands-on exercises can be run on many platforms. They were initially created and tested for ODU Wahab cluster, but they can be adapted to other HPC clusters. They can also be run on a sufficiently powerful local computer (desktop/laptop) with a standalone Python distribution such as Anaconda. Please find below the instructions for the platform you will be using. Your instructor or mentor should have informed you concerning which platform you should be using.

Preparing Hands-on Files on ODU Wahab Cluster

To prepare for the exercises on Wahab, please run the following commands on the shell. (This can be done using a terminal session under SSH, or a terminal session within Jupyter.)

Hands-on files are located on Wahab on this subdirectory:

/shared/DeapSECURE/module-ml/

(For Turing, the location is /scratch-lustre/DeapSECURE/module-ml/Exercises).

Create a directory ~/CItraining/module-ml:

$ mkdir -p ~/CItraining/module-ml

Copy the entire directory tree to your ~/CItraining/module-ml:

$ cp -a /shared/DeapSECURE/module-ml/. ~/CItraining/module-ml/

Be careful! All characters do matter (even a period must not be missed). Do NOT insert whitespace where there is not one in the command above!

Now change directory to ~/CItraining/module-ml,

$ cd ~/CItraining/module-ml

and you are ready to learn! If you are using the Jupyter notebooks (see the resources near the top of this page), navigate your Jupyter’s file browser to this directory and select the appropriate notebook to open.

Obtaining Compute Resource (Non-Jupyter)

DeapSECURE lessons can also be carried out without the Jupyter platform. While it is possible to use the plain python interface for learning, we recommend that learners at minimum use [ipython][ipython], which has the nice autocomplete, history, and shell-like facility.

In this workshop, we will begin by training neural networks interactively, which is a computationally intensive process. For this reason, we must do our hands-on activities on a compute node. (We intentionally limit our session time to a maximum of one day. You can increase or decrease the session time as needed.)

For the Wahab cluster:

$ salloc -c 1 -t 1-0

For the Turing cluster:

$ salloc -c 1 -C AVX2 -t 1-0

(We request a compute node on the cluster that has the AVX2 support. AVX2 is vector instruction which will significantly speed up machine learning computations.)

For all other clusters with a SLURM job scheduler, the following command may work (check with your instructor or local cluster documentation):

$ srun --pty --preserve-env -c 1 -t 1-0 /bin/bash

Notice that the host name printed on the shell prompt would change; that is an indicator that we have logged on to a compute node.

Using Multiple CPUs Cores

If you are ambitious and want to be able to use multiple cores for expensive calculations (later), you may reserve multiple cores for yourself (for example: 2, 4, or 8 cores). For this training, 4 is sufficient (e.g. for Turing):

$ salloc -c 4 -t 1-0

Setting Up Software Environment

Packages such as sklearn and ThunderSVM have a lot of dependencies. After you obtain the compute resource, you will need to load a number of modules.

Wahab Cluster

On Wahab, we have prepared a custom environment module called “DeapSECURE” that will load all the other necessary modules:

module load DeapSECURE

Turing Cluster

The following is an example for Turing cluster:

enable_lmod
module load python/3.6
module load numpy scipy
module load pandas
module load scikit-learn
module load ipython
module load matplotlib
module load gcc/6

We created a shell include file named sklearn-env-py3 in your hands-on directory to ease reloading of these modules later on. To take advantage of this, please issue this command in the same directory as before:

$ source sklearn-env-py3

Do this only once per shell session, right after you obtain the compute resource.

Other Clusters

The following software and libraries are the prerequisite for the hands-on activities of the ML lesson:

We recommend you get the newest version of each package. Should you encounter issue with software, please file an issue on Gitlab.