Introduction to High-Performance Computing
|
Supercomputers are a collection of smaller computers into one big unit
HPC is used in various domains, pretty much everywhere a computer could be used to solve a problem
HPC helps get results faster than traditional computing.
|
Spam: Everyone's Cybersecurity Issue
|
Spam is an unsolicited email that contains unwanted advertisements, requests, or enticements.
Different types of spam emails include: unsolicited advertisements, scam, phishing, email with malicious payload.
Spam poses cybersecurity risks through stealing personal information, malicious software, and system break-in.
Powerful supercomputers can tremendously reduce the time to process massive amounts of data through parallel processing.
|
Accessing HPC
|
|
Basic Shell Interaction
|
UNIX shell provides a basic means to interact with HPC systems
pwd , cd , and ls provides essential means to navigate around files and directories
Directories and files are addressed by their paths, which can be relative or absolute
Basic file management tools: mkdir , cp , mv , rm , rmdir
Basic text viewing and editing tools: cat , less , nano
|
Text Processing Tools & Pipeline
|
echo prints a message.
wc counts the number of lines, words, and bytes in a file.
head prints the first few lines of a text file.
tail prints the last few lines of a text file.
cut selects a particular column or columns of text data from a text file.
sort sorts lines of text.
uniq prints the unique lines of text.
grep filters lines of text matching a given text pattern.
|
Task Automation with Scripts
|
A script is a text file containing a sequence of commands
The for statement takes a list and run commands for each of the elements in the list by iterating through the list items
The if statements are used to execute commands based on given conditions
|
Investigating the Origin of Spam Emails
|
A spam database is a collection of spam emails that have been gathered over many years to provide a representation of spam circulating on the Internet.
A spam database such as the SPAM Archive is helpful to study the characteristics of spam emails, including their origins.
The origin of an email can be determined from the IP address recorded in the tracking information in the email’s header.
An IP address can be mapped to a geographic location using an appropriate database.
|
Running Spam Analyzer on a High-Performance Computer System
|
|
Using HPC for Parallel Processing
|
|
Using GNU Parallel on HPC
|
|
Analyzing and Summarizing the Distribution of Spam Origins
|
|