This lesson is still being designed and assembled (Pre-Alpha version)

Homomorphic Encryption of an Image in Parallel

Overview

Teaching: 0 min
Exercises: 0 min
Questions
  • Hands-on: How to use parallel computing to speed up HE computations?

Objectives
  • Describe parallelization strategy.

  • Parallelize encrypting image.

  • Parallelize decrypting an image.

Parallelization to Speed Up Homomorphic Operations

Encrypted numbers are much more expensive to create and compute than regular (unencrypted) numbers. Our experiments on Turing (Intel Xeon with CPU frequencies between 2.1–2.4 GHz) showed that encrypting a number takes more than 40 times the amount of time needed to perform a single operation. Therefore, the encryption is the most expensive part, exceeding anything else by a large factor. Because we have identified that the most expensive part is the encryption, we will simply demonstrate the parallelization of the encryption.

Review of Serial Implementation of Encrypting an Image

Review of Sections of Serial Encryption

  1. Loading an image

  2. Looping over the pixels and encrypting one at a time

  3. Save encrypted 2-D array to a file

Review of Problem Introduction: Encrypting an Image

Eencrypt an image so that we can securely transfer it for further processing. Here’s the image in question:

Image To Encrypt

The image’s size is 256 by 385 pixels. There are a total of 98,560 pixels. We want a faster way to encrypt it using Paillier scheme, then save the encrypted image to a file. Note, this larger image takes longer than the tiny image.

Before we get too far, we need to be aware of few things:

How long it will take?

Knowing that it takes 1.8 seconds to encrypt 100 numbers, how long it should take to encrypt 98,560 pixels? Assume one pixel is represented by one number (which is true, because we converted the numbers to grayscale).

Message Passing Interface for Parallel Computing

We will use mpi4py to divvy up the task to encrypt the numbers to many processors.

Short review of a template for a simple parallel program

Recall a Template for a Simple Parallel Program.

  1. MPI skeleton (size, rank, etc)

  2. Initialization:
    • Get the file name from the encrypt_par.slurm file
    • Rank 0 reads input image (imread)
    • Convert image to grayscale (converts_to_grayscale)
    • Load in public key from file (pubkey_load_jwk)
    • Broadcast common parameters & data (comm.bcast)
  3. Domain/Problem decomposition:
  4. Do (encryption) work in parallel:
    • Each process will encrypt the pixels contained in its own chunk of the image (pixel-wise encryption).
  5. Gather (partial) output data
    • All the processes will return the encrypted form of their respective chunk to the master process, which is identified as process with rank 0.
  6. Output encrypted image
    • Save output encrypted image to a file

Step 3 is what yields the most time savings, because now we have many workers to encrypt the same image.

Encrypt the image in parallel by implementing each section

Use /phe_image_par/solutions/encrypt_par_04procs.slurm to encrypt lion14graytiny.jpg.

> #!/usr/bin/env python3
> 
> """
> Image encryption using Paillier scheme
> 
> This scripts performs the following:
> * loads an image
> * converts to grayscale if needed
> * encrypts the image
> * writes encrypted image to a disk file (in JSON format).
> 
> """
> import datetime
> import json
> import re
> import sys
> import time
> 
> import numpy as np
> import matplotlib.image as mpimg
> 
> from phe import paillier
> from phe.util import int_to_base64
> from paillier_tools import *
> 
> from mpi4py import MPI

Step 1: Implement MPI skeleton

comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()

Step 2: Input of data

MYSELF = sys.argv[0]

def usage():
    print("Usage: ", MYSELF, " IMAGE_FILE")

if rank == 0:
    # Get the file name from arg 1
    try:
        file_name = sys.argv[1]
    except IndexError:
        usage()
        sys.exit(1)

    # Load the image
    image_orig = mpimg.imread(file_name)

    print("Image input file:        ", file_name)
    print("Shape of original image: ", image_orig.shape)

    # Transform to grayscale to reduce the computational workload
    GRAY = convert_to_grayscale(image_orig)

    N_ROWS = GRAY.shape[0]
    N_COLS = GRAY.shape[1]

    print("Shape of data to be encrypted: ", GRAY.shape)

    # Public key for encryption
    PUB_KEY = pubkey_load_jwk(read_file("phe_key.pub"))

else:
    # Placeholders for image and public key
    GRAY = "Empty"
    PUB_KEY = "Empty"
# Broadcast common parameters & data
# NOTE: Since the image is a small object (don't expect something like
# 16 MP image for this exercise), we simply broadcast it.
# You can always fix it to suite your use case.
gray_g = comm.bcast(GRAY, root=0)
pub_key_g = comm.bcast(PUB_KEY, root=0)

Step 3: Domain Decomposition

if rank == 0:
    for i in range(1, size):
        L = N_ROWS*i // size
        U = N_ROWS*(i+1) // size
        comm.send(L, dest=i)
        comm.send(U, dest=i)

    # Give assignment of work to rank 0 itself:
    L_w = 0
    U_w = N_ROWS // size
else:
    L_w = comm.recv(source=0)
    U_w = comm.recv(source=0)

print("Rank", rank, ": Received L =", L_w, "and U =", U_w)
# Remember the work will be rows L_w, L_w+1, ..., U_w-1

# Preallocate array to contain encrypted pixels:
# Each element contains a single row of pixels.
gray_enc_w = [None] * (U_w - L_w)

Step 4: Encryption work in parallel

# Do work: encrypt the pixels one by one
start = time.time()
n_enc_w = 0

# Not absolutely necessary
comm.barrier()

if rank == 0:
    print("MASTER: Encrypting the pixels...")

for i in range(L_w, U_w):
    # Messages will be printed only on rank 0
    if rank == 0:
        print(" ", i, end="", flush=True)
    # Encrypt row values
    row_enc = [ pub_key_g.encrypt(float(x)) for x in gray_g[i] ]
    # Serialize row values for writing to JSON file
    gray_enc_w[i - L_w] = [(int_to_base64(x.ciphertext()), x.exponent) for x in row_enc]
    n_enc_w += len(gray_enc_w[i - L_w])

Step 5: Gathering of parital results

if rank == 0:
    print(flush=True)

# collect the number of encrypted pixels for each rank:
N_ENC = comm.gather(n_enc_w)
if rank == 0:
    print("Encryption completed. Total", sum(N_ENC), "pixels were processed.")

    # start with rank 0's encrypted rows
    GRAY_ENC = gray_enc_w[:]
    # Since we know the structure of the partition, i.e. the bands of rows
    # are ordered sequentially from rank to rank, we can simply lengthen the
    # resulting array (GRAY_ENC) upon receiving the extra rows from ranks
    # 1, 2, ...rank-1.
    for R in range(1, size):
        # Note: We don't even need to read back L and U from the other ranks
        #L, U = comm.recv(source=R, tag=1)
        gray_enc_recv = comm.recv(source=R, tag=0)
        GRAY_ENC += gray_enc_recv

else:
    #comm.send((L_w, U_w), dest=0, tag=1)
    comm.send(gray_enc_w, dest=0, tag=0)

Step 6: Output Encrypted Image

if rank == 0:
    date = datetime.datetime.now().strftime('%Y-%m-%dT%H:%M:%S')

    # Compute output filename
    output_name = file_name.rsplit(".", 1)[0] + ".pcrypt"
    print("Image output file:       ", output_name)
    comment = 'Encrypted on ' + date

    # It is good to add some checks to catch bugs
    assert len(GRAY_ENC) == len(GRAY)

    # Store a lot of metadata so we can comprehend the file later on
    enimg_dump_json(PUB_KEY, GRAY_ENC, GRAY.shape, image_orig.shape, comment, output_name)

end = time.time()

# Print total execution time
print("Rank", rank, ": Timing: Completed in ", end - start, "seconds.")

Output for tiny lion:

# Encryption of image: ../lion14graytiny.jpg  (PARALLEL VERSION with 4 processes)
Image input file:         ../lion14graytiny.jpg
Shape of original image:  (26, 39)
Note: image is in grayscale already.
Shape of data to be encrypted:  (26, 39)
Rank 0 : Received L = 0 and U = 6
MASTER: Encrypting the pixels...
  0  1  2  3  4  5
Rank 1 : Received L = 6 and U = 13
Rank 1 : Timing: Completed in  35.139321088790894 seconds.
Rank 2 : Received L = 13 and U = 19
Rank 2 : Timing: Completed in  35.140297412872314 seconds.
Encryption completed. Total 1014 pixels were processed.
Image output file:        ../lion14graytiny.pcrypt
Rank 0 : Timing: Completed in  35.167746782302856 seconds.
Rank 3 : Received L = 19 and U = 26
Rank 3 : Timing: Completed in  35.14179515838623 seconds.

Output for larger lion image:

Encryption of image: ../lion14gray.jpg  (PARALLEL VERSION with 4 processes)
Image input file:         ../lion14gray.jpg
Shape of original image:  (256, 385)
Note: image is in grayscale already.
Shape of data to be encrypted:  (256, 385)
Rank 0 : Received L = 0 and U = 64
MASTER: Encrypting the pixels...
  0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63
Rank 1 : Received L = 64 and U = 128
Rank 1 : Timing: Completed in  3170.8454174995422 seconds.
Rank 2 : Received L = 128 and U = 192
Rank 2 : Timing: Completed in  3170.878129005432 seconds.
Rank 3 : Received L = 192 and U = 256
Rank 3 : Timing: Completed in  3170.907501935959 seconds.
Encryption completed. Total 98560 pixels were processed.
Image output file:        ../lion14gray.pcrypt
Rank 0 : Timing: Completed in  3172.2461047172546 seconds.

Domain Decomposition

Let us delve deeper into the domain decomposition of this problem. Consider the case when the image representation is a 146 by 182 matrix. (Remember that the image is a two-dimensional object with 182 by 146 pixels. Image sizes are usually expressed in the x-by-y format, whereas matrices are in row-by-column format, i.e., y-by-x. So there are 146 rows in the matrix format.)

Next, we divide the image by the rows, giving a process an (almost) equal number of rows to process. If we have 5 processes, 146 / 5 = 29.2 rows per process. We will not work on a partial row; we need full rows. Therefore, we could do this:

Process proc 0 proc 1 proc 2 proc 3 proc 4
# rows 29 + 1 29 29 29 29

The order of the rows matters so that they do not overlap. Process 0 works on rows 0 through row 29 (in the 0-based array index); process 1 works on row 30 through row 58; and so on…

#Domain decomposition
chunk = int(int(n_row) / int(size))
if n_row % size > 0:
    if rank < ( n_row % size):
        chunk = chunk + 1
        offset = rank * chunk
    else:
        offset = rank * chunk + (n_row % size)
else:
    offset = rank * chunk

To ensure all processes use the same public key for encryption, only process 0 generates the key and then broadcast (comm.bcast) it to all the other processes.

Each process works on its own share of the task and then sends its result to process 0 which will combine the final result into one single file.

Different Parallel Implementations

Case 1: only rank 0 does initialization (above case)

Case 2: all ranks do initialization

Case 3: If initialization and/or reporting is heavy, parallelize them

Key Points

  • Utilize problem decomposition to parallelize, but do not allow chunks of data to overlap

  • Utilizing the same key among all processes will allow decrypting by one process.