This lesson is still being designed and assembled (Pre-Alpha version)

Introduction to Parallel Programming

Overview

Teaching: 20 min
Exercises: 0 min
Questions
  • What is and why parallel programming?

  • What are the different parallel programming models?

  • What are some parallel programming standards?

Objectives
  • Get an overview of parallel programming.

  • Distinguish different parallel program models.

  • Get an idea of challenges of writing a parallel program.

Introduction - to fix later

In this module, we will learn to write parallel programs in C/C++ and Python using OpenMP and MPI, respectively. In all other module, we are using Python as our programming language, but here we will also use C/C++ for OpenMP. What are the differences between our workshop today and previous workshops? Well the answer is simple: In the first workshop, our program was a serial program and we used a script to launch processing in parallel, the “workers” were all independent and different processes. In the other workshops, the parallel implementation was done within the platform we used, such as ThunderSVM, Spark… In this workshop, our program will be in itself a parallel program. We will implement the parallel aspect of the code ourselves. We will cover basics concepts of OpenMP with C/C++ and MPI with Python using MPI4PY module which is built on top of one of the MPI libraries.

Why Parallel Programming?

Parallel programming allows the user to greatly reduce time-to-solution by splitting the problem among a number of worker-processes. In other words, parallel programming allows the user to solve a computational problem using multiple computing resources. It allows the programmer to take advantage of modern architectures composed of multiple processors. Workers can process different chunks of data simultaneously. Parallel programming is done on parallel computer architectures such as a cluster computer like Turing, and can also be done in today laptops and desktop computers with multi-core CPUs. Parallel programming is used when a problem is too big or takes too long to solve. Parallel programming minimizes the runtime, or time to solution for problems that deal with big data and/or big computations.

Parallel platform architecture.

Figure: A simplified illustration of a parallel computer as a set of networked computers.

In single processor computers, the user is limited to the memory and computing power of the machine. In parallel computers, many computing resources are put together, thus multiplying the computing power, while also increasing memory capacity, thus enabling the processing of larger and complex problems. Today, parallel computers still have networked architectures, but they feature faster networks and faster CPUs. Within one of the units of the networked computer, also known as compute node, resides one or more multi-core processor(s). Each multi-core processor has its own memory, independent of other multi-cores, and independent of other compute nodes. This architecture provide the user with a powerful computing platform, that when utilized optimially, will solve larger and more complex problems faster.

Power of Parallel Computing Example (Scaling)

Scaling oversimplified example.

1 worker completes a task in 6 hours. 2 workers complete in 3 hours. 4 workers complete in 1.5 hours. Given that there are N workers (and this pattern of speedup continues), how quickly can we complete the task if we implement parallelism?

How quickly can 400 workers finish the task?

Hints

In this overly simplified example, assume linear speedup continues. So, N workers complete the task in 6/N hours. So, 400 workers will finish in 54 seconds! 6/400x60x60 = 54. Note, this is assuming ideal linear speedup, which is not possible to achieve given restraints explained later in the lessons.

High-Level Strategy for Parallel Programming

Consider the following problem: calculate the weather of the US. Calculating the weather for any region requires very computationally expensive computations. This problem requires parallel programming because this is a time-sensitive computation. Calculating the weather needs to be fast and reportable so the reporting of the weather is current. Calculating the weather requires the program to learn on past weather data and use it to predict the next future weather values. The learning (and subsequent predicting) of the data is independent based on region.

In serial, calculating the weather of the US uses one CPU to make all the computations necessary. The program will one-by-one perform the learning and predicting tasks for each of the regions.

This can be solved in parallel by mapping a computer resource/core to a particular region of the US. This makes solving the weather for the US faster. As explained later in the module, this is a data parallel problem that utilizes shared-memory architecture.

Weather Serial.

Figure: On the left side is an example of solving the weather example in serial and the one of the right is an example of solving the weather example in parallel. Figures from Chen, Shaohao. "Introduction to High Performance Computing." Boston University. https://www.bu.edu/tech/files/2017/09/Intro_to_HPC.pdf.

The high-level strategy is to divide a spatial problem into regions. Then, map each region to a computer resource. Then, figure out how to take care of the interactions between regions/resources.

What Is a Parallel Program?

A parallel program could be defined as a computer code that has the capability to carry out computations simultaneously using more than one process or thread (both known as workers). This means that a single launch of the program will spawn several (as specified) processes to solve a given problem. So far, most of the programs we dealt with have been written to run on a problem specific platform handling the parallel processing, but the program codes have so far been serial. What is the difference between a parallel program and traditional serial program? This is obvious: a serial program has a single process executing instructions one after the other, sequentially. At any given time, there is only one running worker solving the problem. A parallel program on the other hand spawns multiple workers which, depending on the situation, can be concurrently and independently working on a given aspect of the problem or doing the exact same operation as peer workers but on its own set of data. Multiple processes (workers) live in the same time frame. More on the differences between serial and parallel programs next.

Type of Parallel Problems - Loosely vs. Tightly Coupled Problems

Loosely coupled

In Module 1, we explored a “loosely coupled problem” - processing 1998 spam set using many independent workers to work in parallel on independent tasks (with no communication except for initial division of labor and final summary). Loosly coupled problems execute tasks independently to many workers with little communication besides the beginning and end of all tasks. In principle, there is no required parallel programming framework necessary. They are trivial to parallelize.

Loosely Coupled

Figure: Example of a loosely coupled parallel problem.

Tightly coupled

In this module, we will focus on “tightly coupled problems.” A single problem solved by many (tightly coupled) workers. For tightly coupled problems, parts of the task can be executed in parallel, but there exists a strong interdependence among tasks. Cannot just “farm out” and then gather results at the end. Communication and coordination among workers are required throughout. Communication and coordination is needed (from “somewhat frequently” to “very frequently”). This communication and coordination is managed through a parallel programming framework (ex. MPI).

Tightly Coupled

Figure: Tightly Coupled Parallel Problems.

Parallelism: Loosly and Tightly Coupled Problem

Consider the following problems. Classify them as loosly or tightly coupled problems.

  1. Simulation of motions of drug molecules in a biological system (cell, or tissue).
  2. Counting votes in a national election.
  3. Automotive factory - building a car.
  4. Weather problem as stated above.

Hints

  1. Motion of drug molecules: Tightly coupled because the motion of one drug molecule can influence the motion of another drug molecule. An example is if the drug molecules collide.

  2. Counting votes: Loosly coupled. There is coordination in assigning a different section of the country to each vote counting resource. Each vote counting resource counts only it’s assigned region. Coordination is needed at the end to collect the sum of all the votes from each vote counting entity.

  3. Building a car: Tightly coupled. Several tasks can be run in parallel, but communication and sharing of parts is necessary to complete the assembly. For example, the following tasks can be run in parallel: Parallel Task I: assemble parts of the main frame, engine, doors, etc. Parallel Task II: install the doors, seats, dashboard, wheels, tires, etc. to the main frame Parallel Task III: Install the wiring for door components and paint the car.

  4. Weather: Loosely coupled. Based on how the problem was described above (super high-level), only mentioning the learning and predicting phase only, it is loosely coupled. It require the initial mapping of computing resources and the final reporting.

There are three main key components of (tightly coupled) parallel programming. Component 1 is problem decomposition. Split the problem into parts that should be able to process independently (to some extent). This is usually done by the user. Component 2 is concurrency. The idea that multiple workers can work on independent tasks simultaneously. Parts of the work can be done simultaneously. Component 3 is communication and coordination. A data exchange among workers is necessary to solve the problem. Data sharing is handled through parallel frameworks (such as OpenMP and MPI).

Key Components

Figure: Key components of parallel programming.

Consider the car assembly problem. The problem can be decomposed into Parallel Task I: assemble parts of the main frame, engine, doors, etc., Parallel Task II: install the doors, seats, dashboard, wheels, tires, etc. to the main frame, and Parallel Task III: Install the wiring for door components and paint the car.

Each of the sub-tasks in each of the three parallel tasks can be done concurrently. This can be done on independent tasks that require no coordination or communication. For example, assembling the main frame, engine, and doors can be done concurrently.

Parallel Task II has to wait on Parallel Task I, since the doors, seats, dashboard, wheels, tires, etc. cannot be installed on the main frame without the main frame first being assembled. This requires coordination and communication between the tasks - Parallel Task I needs to share the assembled main frame. Likewise, the wiring for the doors can only be done in conjunction with the doors that were installed in Parallel Task II.

Key Points

  • First key point. Brief Answer to questions. (FIXME)