This lesson is in the early stages of development (Alpha version)

DeapSECURE module 2: Dealing with Big Data

The Big Data lesson module introduces an efficient way of handling, processing, and analyzing large amounts of data using pandas, matplotlib and seaborn. pandas is the de facto data analysis and manipulation tool for Python programming language. Matplotlib and Seaborn are visualization packages for data analysis in Python. The data handling skills introduced in this lesson form the foundation for the subsequent two lessons on machine learning and neural networks.

Prerequisites

  • Learners should have acquired basic skills in Python programming in order to learn this lesson effectively. Learners that are new to Python are encouraged to take a brief tutorial on Python, such as the Plotting and Programming in Python lesson by Software Carpentry. The DeapSECURE project also maintains a list of Python crash courses.

  • This lesson module requires Python, Pandas, Matplotlib and Seaborn. For optimal teaching and learning experience, please use Jupyter Notebook or JupyterLab.

Schedule

Setup Download files required for the lesson
00:00 1. Introduction to Big Data Analytics and Pandas What is big data?
What is big data analytics?
What are the uses of big data analytics in cybersecurity?
What is Pandas?
What are the appropriate use cases of Pandas?
00:00 2. Big Data Challenge: Detecting Malicious Activities on Smartphones What are the security challenges related to smartphones?
What are the goals of big data analytics in cybersecurity?
What is the Sherlock dataset?
What are the potential uses of the Sherlock dataset?
00:10 3. Fundamental of Pandas What are Series and DataFrame in Pandas?
How to create a Series object?
How to create a DataFrame object?
How to retrieve and modify data in a Series or DataFrame object?
00:55 4. Analytics of Sherlock Data with Pandas What are the basic data manipulation operations (building blocks) in pandas?
What insights can we obtain by common combinations of these building blocks?
01:45 5. Data Wrangling and Visualization How do I check data distribution and find inner relationships between different features?
How do I analyze features by using visualization tools?
02:15 6. Outro: Big Data Analytics in Real-World Applications How does big data processing look in the real world?
What other tools and frameworks are available for big data processing?
02:20 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.