This lesson is in the early stages of development (Alpha version)

DeapSECURE module 1: Introduction to HPC

This is the first module of ODU’s DeapSECURE cyberinfrastructure training. This module contains the introduction to High-Performance Computing (HPC) clusters, complemented with many hands-on exercises. First, we introduce what an HPC is, then how to access an HPC system (ODU Wahab cluster). Next, we will present a crash course, or refresher, on UNIX shell. We will use the UNIX shell knowledge to write a simple pipeline to process a very large number of spam emails and obtain some statistical knowledge about them. Our final goal in this module is to perform basic analysis on a massive set of spam emails.

Prerequisites

  • Basic computer interactions, students should know how to interact with a computer using a keyborad.
  • Basic concepts such as directories, files, and paths.
  • Basic text editing skills. Students should know how to input text, issue commands…

Schedule

Setup Download files required for the lesson
00:00 1. Introduction to High-Performance Computing What is a High-Performance Computing (HPC) system?
Who uses HPC systems?
Why HPC?
00:10 2. Spam: Everyone's Cybersecurity Issue What is a spam?
What the different types of spam?
What are the problems posed by spam?
How does spam indicate cybersecurity problems?
Why do we need a powerful supercomputer to analyze a massive collection of spam emails?
00:20 3. Accessing HPC How do we access a modern HPC system?
How do we interact with a basic HPC interface?
00:30 4. Basic Shell Interaction How do we interact with a UNIX shell, the basic HPC interface?
What is a file system?
How do we navigate around files and directories from a UNIX shell?
How do we manage files and directories from a UNIX shell?
How do we work with text files from a UNIX shell?
How do we get help on UNIX shell commands?
01:20 5. Text Processing Tools & Pipeline How do we process text-based information using UNIX tools?
How do we build a processing pipeline by combining UNIX tools?
01:50 6. Task Automation with Scripts How can we repeat the same or similar set of commands over and over?
02:15 7. Investigating the Origin of Spam Emails What is a spam database?
What are the uses of spam database?
How do we trace the origin of an email based on its header information?
02:30 8. Running Spam Analyzer on a High-Performance Computer System How do we run computational jobs on a modern HPC system?
03:00 9. Using HPC for Parallel Processing How do we run computational jobs in parallel on a modern HPC system?
What are key issues to watch out to achieve efficient parallel execution on HPC?
03:30 10. Using GNU Parallel on HPC What is GNU Parallel?
What are the characteristics of suitable jobs for GNU Parallel?
How do we run multiple jobs simultaneously using GNU Parallel?
04:00 11. Analyzing and Summarizing the Distribution of Spam Origins How do we analyze and summarize HPC results?
What is the distribution of the origins of spam emails in Spam Archive?
04:45 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.