Parallel Computation with Homomorphic Encryption

Overview

Teaching: 0 min
Exercises: 0 min

Questions

Hands-on 3: How to use parallel computing to speed up HE computations?

Objectives

First learning objective. (FIXME)

Computational Cost of Homomorphic Encryption

Encrypted numbers are much more expensive to create and compute than regular (unencrypted) numbers. Our experiments on Turing (Intel Xeon with CPU frequencies between 2.1–2.4 GHz) showed that encrypting a number takes more than 40 times the amount of time needed to perform a single operation. In one experiment of summing 100 encrypted numbers, we noted that it took 1.8 seconds to encrypt them all, but only 0.04 seconds to sum them all up. Therefore, the encryption is the most expensive part, exceeding anything else by a large factor..

Parallelization to Speed Up Homomorphic Operations

In this section, we will demonstrate how one can use parallel computing resources to speed up homomorphic operations. Because we have identified that the most expensive part is the encryption, we will simply demonstrate the parallelization of the encryption.

Task: Encrypting ODU Crown Image

In this hands-on session, we want to encrypt the Old Dominion University’s Crown image so that we can securely transfer this a potentially hostile cloud environment for further processing. Here’s the image in question:

Image of ODU Crown

The image is in PNG format, its size is 182 by 146 pixels. There are a total of 26,572 pixels. We want a faster way to encrypt it using Paillier scheme, then save the encrypted image to a file.

How long it will take?

Knowing that it takes 1.8 seconds to encrypt 100 numbers, how long it should take to encrypt 26,572 pixels? Assume one pixel is represented by one number (which is true, because we converted the numbers to grayscale).

Message Passing Interface for parallel computing

We will be using the industry-standard approach to parallelize our calculation called “Message Passing Interface” (MPI). We will use MPI to divvy up the task to encrypt the numbers to many processors, which is analogous to a restaurant employing five people to manually wash dishes on five separate sinks. MPI is implemented as a software library and is available in many programming languages, including C, C++, Fortran, Java, and even Python. Python’s interface to MPI is called mpi4py. On Turing, mpi4py leverages Intel’s high-performance MPI library.

MPI is a library that allows a programmer to split a single problem to be tackled among a number of processes (call this N) on N CPU cores (or more, as we will learn in the next module).

What is a process?

A process is an independently running program on a computer: Your UNIX shell is a process. If you execute python encrypt_phe.py, then it launches a yet another process by runnng the Python interpreter (which in turn will run encrypt_phe.py). Each process is basically a “worker” that serves us to do a specified task. The traditional programming is focused around having a single worker to do all the tasks that are specified as the computer program (or script).

We will reserve more details about MPI for the upcoming module; it is sufficient for now to know that MPI is used to employ one or more “workers” in order to solve a problem in a concerted way. With MPI, there will be N independent processes that can communicate to one another through the functionalities of MPI library. Each process has a rank, a unique id to distinguish it from other processes. The rank IDs are from 0, 1, 2, … N-1. They each have their own computing capabilities. They can communicate with one another to exchange information by sending and receiving messages.

Usually an MPI program in Python will initialize itself in this way:

import mpi4py.MPI as MPI
# ... import other libraries as well

#MPI stuff 
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

The comm variable is a communicator, which is essentially an object to perform communication with all the other processes in this MPI “world” (i.e. the set of processes launched together as a single MPI run). COMM_WORLD is the largest universe of all these processes, which is sufficient for most MPI programs. Get_rank returns the rank ID, as previously described. Get_size returns the number of processes in this MPI world.

Before we get too far, we need to be aware of few things:

Each process has to work on its own chunk of the image, not overlapping with any other process
All processes have to use the exact same key so that the image will be recoverable by one process

How will we encrypt the image shown at the beginning of this episode?

Each process reads the image to work on. (Because it is such a small image, we will make all processes read their own images. In other cases, the input is so large that we have to even read the input in pieces, i.e. not all pieces of information are available in every process.)
Each process identifies what chunk of the image to work on. This is called domain decomposition.
Each process will encrypt the pixels contained in its own chunk of the image.
All the processes will return the encrypted form of their respective chunk to the master process, which is identified as process with rank 0.

Step number three above is what yield the time savings, because now we have many workers to encrypt the same image.

Domain decomposition

Let us delve deeper into the domain decomposition of this problem. To do this, we need to know how is the image represented: First, we simplify the image to grayscale in order to reduce all color channels per pixel to a single value. In this case we will get a 146 by 182 matrix. (Remember that the image is a two-dimensional object with 182 by 146 pixels. Image sizes are usually expressed in the x-by-y format, whereas matrices are in row-by-column format, i.e., y-by-x. So there are 146 rows in the matrix format.)

Next, we divide the image by the rows, giving a process an (almost) equal number of rows to process. If we have 5 processes, 146 / 5 = 29.2 rows per process. We will not work on a partial row; we need full rows. Therefore, we need to some processes 30 rows to process. We could do this:

Process	proc 0	proc 1	proc 2	proc 3	proc 4
# rows	29 + 1	29	29	29	29

The order of the rows matters so that they do not overlap. Process 0 works on rows 0 through row 29 (in the 0-based array index); process 1 works on row 30 through row 58; and so on…

#Domain decomposition
chunk = int(int(n_row) / int(size))
if n_row % size > 0:
    if rank < ( n_row % size):
        chunk = chunk + 1
        offset = rank * chunk
    else:
        offset = rank * chunk + (n_row % size)
else:
    offset = rank * chunk

(Later on, convince yourself that the domain decomposition is right, by printing out some values such as chunk, offset, etc. for every process.)

To ensure all processes use the same public key for encryption, only process 0 generates the key and then broadcast (comm.bcast) it to all the other processes. (In this case below, process 0 reads an existing key.)

#Public and private keys
if rank == 0:
    pub_key, pri_key = paillier.generate_paillier_keypair()
else:
    #pk_n = None
    pub_key = None

#Broadcast public key to all processes
pub_key = comm.bcast(pub_key, root=0)

Each process works on its own share of the task and then sends its result to process 0 which will combine the final result into one single file.

Please turn to subdirectory secure-image in your hands-on directory, then inspect the program called encrypt_image_mpi.py. No need to understand everything, but please take time to get some sense of what is being done at the different parts of the program.

Executing the MPI program

Now you can run the program parallel mode (we choose 24 processes) or in serial mode (1-process). We use SLURM job script to define the resources needed for this run:

!/bin/bash -l
#SBATCH --job-name=MPI_encrypt_24core
#SBATCH --output=MPI_encrypt_24core
#SBATCH --ntasks=24
#SBATCH --nodes=1

source crypto-env-py3

mpirun -n 24 python3 encrypt_image_mpi.py

SLURM script is a shell script (in this case, a bash script). The four lines near the top of the script are special comments with #SBATCH prefix, which will be parsed by the job scheduler (SLURM):

--job-name sets the name of the job
--output sets the name of the file containing the output of the job
--ntasks sets the number of MPI tasks to be launched
--nodes sets the number of compute nodes requested

The actual “program” in the script is simple: it sources the environment definition we’ve seen in the setup phase, then launches the 24 processes using the mpi command. The -n argument sets the number of processes. With mpirun, these processes will be coordinated and be able to communicate with each other through the communication channels established by the MPI library upon the start of the program.

To run in 24-process mode, invoke on the shell,

$ sbatch job_01core.slurm

To run in serial mode, invoke on the shell,

$ sbatch job_01core.slurm

Notice at the end of the file you will be shown how much (wallclock) time is spent for running the program.

Parallel decryption

Modify the encryption program so that it will decrypt the encrypted image (ODU_Crown_enc.json) and reconstruct the original image (in grayscale).

Hints

Use the mpimg.imsave ) function to save the reconstructed image.

Key Points

First key point. Brief Answer to questions. (FIXME)

previous episode

DeapSECURE Lesson 5: Cryptography for Privacy-Preserving Computation

next episode

Parallel Computation with Homomorphic Encryption

Overview

Computational Cost of Homomorphic Encryption

Parallelization to Speed Up Homomorphic Operations

Task: Encrypting ODU Crown Image

How long it will take?

Message Passing Interface for parallel computing

What is a process?

Domain decomposition

Executing the MPI program

Parallel decryption

Hints

Key Points

previous episode

next episode