This lesson is in the early stages of development (Alpha version)

DeapSECURE module 2: Dealing with Big Data: Spark---Scalable Framework for Big Data

This document contains additional introductory materials on Apache Spark, an alernative big data analytics framework.

Which One to Choose? Pandas vs. Spark

pandas can be compared to Spark in many ways: Each library offers DataFrame, an object that embodies a dataset in a tabular format. Here are some similarities:

But there are several important differences:

In terms of data operation, there is also an important difference: