Homomorphic Encryption of an Image in Parallel
Overview
Teaching: 0 min
Exercises: 0 minQuestions
Hands-on: How to use parallel computing to speed up HE computations?
Objectives
Describe parallelization strategy.
Parallelize encrypting image.
Parallelize decrypting an image.
Parallelization to Speed Up Homomorphic Operations
Encrypted numbers are much more expensive to create and compute than regular (unencrypted) numbers. Our experiments on Turing (Intel Xeon with CPU frequencies between 2.1–2.4 GHz) showed that encrypting a number takes more than 40 times the amount of time needed to perform a single operation. Therefore, the encryption is the most expensive part, exceeding anything else by a large factor. Because we have identified that the most expensive part is the encryption, we will simply demonstrate the parallelization of the encryption.
Review of Serial Implementation of Encrypting an Image
Review of Sections of Serial Encryption
-
Loading an image
-
Looping over the pixels and encrypting one at a time
-
Save encrypted 2-D array to a file
Review of Problem Introduction: Encrypting an Image
Eencrypt an image so that we can securely transfer it for further processing. Here’s the image in question:
The image’s size is 256 by 385 pixels. There are a total of 98,560 pixels. We want a faster way to encrypt it using Paillier scheme, then save the encrypted image to a file. Note, this larger image takes longer than the tiny image.
Before we get too far, we need to be aware of few things:
-
Each process has to work on its own chunk of the image, not overlapping with any other process
-
All processes have to use the exact same key so that the image will be recoverable by one process
How long it will take?
Knowing that it takes 1.8 seconds to encrypt 100 numbers, how long it should take to encrypt 98,560 pixels? Assume one pixel is represented by one number (which is true, because we converted the numbers to grayscale).
Message Passing Interface for Parallel Computing
We will use mpi4py to divvy up the task to encrypt the numbers to many processors.
Short review of a template for a simple parallel program
Recall a Template for a Simple Parallel Program.
MPI skeleton (size, rank, etc)
- Initialization:
- Get the file name from the
encrypt_par.slurm
file- Rank 0 reads input image (imread)
- Convert image to grayscale (converts_to_grayscale)
- Load in public key from file (pubkey_load_jwk)
- Broadcast common parameters & data (comm.bcast)
- Domain/Problem decomposition:
- Each process identifies what chunk of the image to work on.
- See problem decomposition
- Do (encryption) work in parallel:
- Each process will encrypt the pixels contained in its own chunk of the image (pixel-wise encryption).
- Gather (partial) output data
- All the processes will return the encrypted form of their respective chunk to the master process, which is identified as process with rank 0.
- Output encrypted image
- Save output encrypted image to a file
Step 3 is what yields the most time savings, because now we have many workers to encrypt the same image.
Encrypt the image in parallel by implementing each section
Use
/phe_image_par/solutions/encrypt_par_04procs.slurm
to encryptlion14graytiny.jpg
.> #!/usr/bin/env python3 > > """ > Image encryption using Paillier scheme > > This scripts performs the following: > * loads an image > * converts to grayscale if needed > * encrypts the image > * writes encrypted image to a disk file (in JSON format). > > """ > import datetime > import json > import re > import sys > import time > > import numpy as np > import matplotlib.image as mpimg > > from phe import paillier > from phe.util import int_to_base64 > from paillier_tools import * > > from mpi4py import MPI
Step 1: Implement MPI skeleton
comm = MPI.COMM_WORLD size = comm.Get_size() rank = comm.Get_rank()
Step 2: Input of data
MYSELF = sys.argv[0] def usage(): print("Usage: ", MYSELF, " IMAGE_FILE") if rank == 0: # Get the file name from arg 1 try: file_name = sys.argv[1] except IndexError: usage() sys.exit(1) # Load the image image_orig = mpimg.imread(file_name) print("Image input file: ", file_name) print("Shape of original image: ", image_orig.shape) # Transform to grayscale to reduce the computational workload GRAY = convert_to_grayscale(image_orig) N_ROWS = GRAY.shape[0] N_COLS = GRAY.shape[1] print("Shape of data to be encrypted: ", GRAY.shape) # Public key for encryption PUB_KEY = pubkey_load_jwk(read_file("phe_key.pub")) else: # Placeholders for image and public key GRAY = "Empty" PUB_KEY = "Empty" # Broadcast common parameters & data # NOTE: Since the image is a small object (don't expect something like # 16 MP image for this exercise), we simply broadcast it. # You can always fix it to suite your use case. gray_g = comm.bcast(GRAY, root=0) pub_key_g = comm.bcast(PUB_KEY, root=0)
Step 3: Domain Decomposition
if rank == 0: for i in range(1, size): L = N_ROWS*i // size U = N_ROWS*(i+1) // size comm.send(L, dest=i) comm.send(U, dest=i) # Give assignment of work to rank 0 itself: L_w = 0 U_w = N_ROWS // size else: L_w = comm.recv(source=0) U_w = comm.recv(source=0) print("Rank", rank, ": Received L =", L_w, "and U =", U_w) # Remember the work will be rows L_w, L_w+1, ..., U_w-1 # Preallocate array to contain encrypted pixels: # Each element contains a single row of pixels. gray_enc_w = [None] * (U_w - L_w)
Step 4: Encryption work in parallel
# Do work: encrypt the pixels one by one start = time.time() n_enc_w = 0 # Not absolutely necessary comm.barrier() if rank == 0: print("MASTER: Encrypting the pixels...") for i in range(L_w, U_w): # Messages will be printed only on rank 0 if rank == 0: print(" ", i, end="", flush=True) # Encrypt row values row_enc = [ pub_key_g.encrypt(float(x)) for x in gray_g[i] ] # Serialize row values for writing to JSON file gray_enc_w[i - L_w] = [(int_to_base64(x.ciphertext()), x.exponent) for x in row_enc] n_enc_w += len(gray_enc_w[i - L_w])
Step 5: Gathering of parital results
if rank == 0: print(flush=True) # collect the number of encrypted pixels for each rank: N_ENC = comm.gather(n_enc_w) if rank == 0: print("Encryption completed. Total", sum(N_ENC), "pixels were processed.") # start with rank 0's encrypted rows GRAY_ENC = gray_enc_w[:] # Since we know the structure of the partition, i.e. the bands of rows # are ordered sequentially from rank to rank, we can simply lengthen the # resulting array (GRAY_ENC) upon receiving the extra rows from ranks # 1, 2, ...rank-1. for R in range(1, size): # Note: We don't even need to read back L and U from the other ranks #L, U = comm.recv(source=R, tag=1) gray_enc_recv = comm.recv(source=R, tag=0) GRAY_ENC += gray_enc_recv else: #comm.send((L_w, U_w), dest=0, tag=1) comm.send(gray_enc_w, dest=0, tag=0)
Step 6: Output Encrypted Image
if rank == 0: date = datetime.datetime.now().strftime('%Y-%m-%dT%H:%M:%S') # Compute output filename output_name = file_name.rsplit(".", 1)[0] + ".pcrypt" print("Image output file: ", output_name) comment = 'Encrypted on ' + date # It is good to add some checks to catch bugs assert len(GRAY_ENC) == len(GRAY) # Store a lot of metadata so we can comprehend the file later on enimg_dump_json(PUB_KEY, GRAY_ENC, GRAY.shape, image_orig.shape, comment, output_name) end = time.time() # Print total execution time print("Rank", rank, ": Timing: Completed in ", end - start, "seconds.")
Output for tiny lion:
# Encryption of image: ../lion14graytiny.jpg (PARALLEL VERSION with 4 processes) Image input file: ../lion14graytiny.jpg Shape of original image: (26, 39) Note: image is in grayscale already. Shape of data to be encrypted: (26, 39) Rank 0 : Received L = 0 and U = 6 MASTER: Encrypting the pixels... 0 1 2 3 4 5 Rank 1 : Received L = 6 and U = 13 Rank 1 : Timing: Completed in 35.139321088790894 seconds. Rank 2 : Received L = 13 and U = 19 Rank 2 : Timing: Completed in 35.140297412872314 seconds. Encryption completed. Total 1014 pixels were processed. Image output file: ../lion14graytiny.pcrypt Rank 0 : Timing: Completed in 35.167746782302856 seconds. Rank 3 : Received L = 19 and U = 26 Rank 3 : Timing: Completed in 35.14179515838623 seconds.
Output for larger lion image:
Encryption of image: ../lion14gray.jpg (PARALLEL VERSION with 4 processes) Image input file: ../lion14gray.jpg Shape of original image: (256, 385) Note: image is in grayscale already. Shape of data to be encrypted: (256, 385) Rank 0 : Received L = 0 and U = 64 MASTER: Encrypting the pixels... 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Rank 1 : Received L = 64 and U = 128 Rank 1 : Timing: Completed in 3170.8454174995422 seconds. Rank 2 : Received L = 128 and U = 192 Rank 2 : Timing: Completed in 3170.878129005432 seconds. Rank 3 : Received L = 192 and U = 256 Rank 3 : Timing: Completed in 3170.907501935959 seconds. Encryption completed. Total 98560 pixels were processed. Image output file: ../lion14gray.pcrypt Rank 0 : Timing: Completed in 3172.2461047172546 seconds.
Domain Decomposition
Let us delve deeper into the domain decomposition of this problem. Consider the case when the image representation is a 146 by 182 matrix. (Remember that the image is a two-dimensional object with 182 by 146 pixels. Image sizes are usually expressed in the x-by-y format, whereas matrices are in row-by-column format, i.e., y-by-x. So there are 146 rows in the matrix format.)
Next, we divide the image by the rows, giving a process an (almost) equal number of rows to process. If we have 5 processes, 146 / 5 = 29.2 rows per process. We will not work on a partial row; we need full rows. Therefore, we could do this:
Process | proc 0 | proc 1 | proc 2 | proc 3 | proc 4 |
---|---|---|---|---|---|
# rows | 29 + 1 | 29 | 29 | 29 | 29 |
The order of the rows matters so that they do not overlap. Process 0 works on rows 0 through row 29 (in the 0-based array index); process 1 works on row 30 through row 58; and so on…
#Domain decomposition
chunk = int(int(n_row) / int(size))
if n_row % size > 0:
if rank < ( n_row % size):
chunk = chunk + 1
offset = rank * chunk
else:
offset = rank * chunk + (n_row % size)
else:
offset = rank * chunk
To ensure all processes use the same public key for encryption, only
process 0 generates the key and then broadcast (comm.bcast
) it to
all the other processes.
Each process works on its own share of the task and then sends its result to process 0 which will combine the final result into one single file.
Different Parallel Implementations
Case 1: only rank 0 does initialization (above case)
Case 2: all ranks do initialization
Case 3: If initialization and/or reporting is heavy, parallelize them
Key Points
Utilize problem decomposition to parallelize, but do not allow chunks of data to overlap
Utilizing the same key among all processes will allow decrypting by one process.