This lesson is still being designed and assembled (Pre-Alpha version)

Paillier Cryptosystem for Homomorphic Encryption

Overview

Teaching: 0 min
Exercises: 0 min
Questions
  • What is Paillier encryption scheme?

  • What can be encrypted and calculated using Paillier HE?

  • How to save encrypted numbers?

Objectives
  • First learning objective. (FIXME)

Introduction

In this hands-on episode, we will focus on an implementation of HE that is available for Python programming language, that is using Paillier encryption scheme. We will be using the python-paillier library.

Paillier encryption scheme is similar to the popular RSA algorithm in that it involves exponentiation of a very large base with a very large exponent. The basic unit of plaintext (m) in Paillier cryptosystem is an integer. Real numbers can be approximated as very large integers.

Using Paillier encryption scheme, the following operations can be done on the ciphertext (e.g. M1, M2, …):

With these basic components, one can still do subtraction (with plain/ciphertext) and division with plaintext.

In this episode, we will learn how to:

Preparation: Key generation

Paillier encryption starts with the generation of a keypair:

# import additional stuff needed later on:
import json

# import Paillier library
import phe
from phe import paillier

pubkey, privkey = paillier.generate_paillier_keypair(n_length=2048)

What’s in a key?

A cryptography key is essentially a set of one or more numbers. Usually these are extremely large numbers that are difficult to compute, like prime numbers and the products thereof.

The private key consists of two prime numbers p and q. (Some articles such as the Wikipedia’s Paillier Cryptosystem alternatively use the pair of greek symbols lambda and mu as the private key—they are just derived quantities from p and q.)

Formally, the public key consists of n and g. By definition,

n := p * q

and most Paillier implementation (such as python-paillier we will be using later) uses a simple choice

g := n + 1

so effectively the public key only has n.

What’s in your key?

Check your own public and private keys and examine the values of p, q, and n. Use the privkey.p, privkey.q, and pubkey.n attributes.

Compare your values with your peer’s values. How large (order of magnitude) are these values? Hint: to measure the string length of an integer like those, use the str and len functions combined.

Example solution

>>> privkey.p
165801263340632983976560662438308425856376892415968944445942078522128733590037545491114495778463344034457791215860577561908717660960279116282832632108138114113215470481000160667178582327571748667307748221469642828898728723202606759475506355383541570049534956671732505254974281600674365388829556194125478439663

>>> privkey.q
173692986048859584131292662978616306818667799173886008136009019862903330127902404198243622829355376568051859747081316805127608698230348868838407907757905356268206709017850161836762180321535363252843130114261965137350004486033503058254303012135765203279703631680903239188556052848628643153099498478342809602171

>>> pubkey.n
28798516520307858892245242752031788221019505752401640907847555469080110989819460404887769769114996691178956366902424297847922947959745336265127613942423489628418392856716544509021179099995101598581024448922988985325716248080094092790457195629831492960414027793811488320980935077935214695348015743458064652434646259278623218413099569743293632244873602245153101169163714018053846361736912463943370196927861624340029295752670877587921787341114737650043775604469578472260873497172429754696151563073203245718398299476346582842129611188422523870025644927967778458063530248617514979015051481160778109441927444069910857308373

>>> len(str(priv.p))
309

The p and q are over 300 digits long, whereas n is over 600 digits!

The number of bits are relatively easy to compute using the “%x” C-like string formatting and noting that each byte comprises of 8 bits:

# The count of hexadecimal digits in this number:
>>> len("%x" % priv.p)
256

# The count of bits in this number (remember: a byte = 2 hexadecimal digits):
>>> len("%x" % priv.p) * 4
1024

In real application, of course you won’t share your p and q values. The n_length=2048 in the construction of the keypair sets the length of the public key to 2048 bits. This makes the length of each p and q to 1024 bits each.

Basic operations

Let us now encrypt some data and do some operations:

Encrypting numbers

Python-paillier supports the encryption of integers and real numbers. The encryption will cryptically represent a plaintext as another number that is very large in magnitude. It is advisable not to use numbers that are extreme in the magnitude (for example, 10**100 [ten raised to the one-hundredth power] or beyond) because the extreme values can cause overflow in the representation of the ciphertext.

Let us begin with encrypting two numbers m1 and m2 below:

m1 = 3.14159
m2 = 10.01

M1 = pubkey.encrypt(m1)
M2 = pubkey.encrypt(m2)

According to Paillier’s scheme, encryption of m is essentially raising the value of (n+1) to the power of m, then multiplied with a specially random integer that will obfuscate the M value further. We can examine the garbled values of M1 and M2:

# What is this object?
>>> M1
<phe.paillier.EncryptedNumber at 0x1071fd048>

# What is the numerical value of the garbled message?
>>> M1.ciphertext()
358042382770970270871524800057088993221249913569455679842042704805236196798901110138194625868064580648366378045126433145589847592042155363527717057481962442927634034228963964441449764296013360397445353630883402829038700969195047512245998315288904243095699545878859254325699578614910293987106279004516694723811771678355852903333263976670616317424950250139809342812336448835863147832115469392284379745141185442407804051322178178067222135101866778737234526964369704209798907917843985713752348919395564785629545112505111864804352424749406544201602358442027344200150631027335061665436620838770976547147216884719120406406990135182420301787378701807752625145514333784471858890922709189677324828050487528956994674637706742097263253068935111408501343290249421788540787705161699618174447299562550175340316069533657736832883926749539295329978528335318595004290370344167608315106573144494125367974775195050854204929161874473987196886524872998925329629342955885015929491823940042109455343304878081339271308043826293205311947989826903150983651291506365361391467422114815705184215214751061898354179774360927559552458033988283761950762737874945446251124416375925915879200362054684310251908696240048698646030868448074744342597878483402868383817342731

>>> M2.ciphertext()
251500510465933159444129963041891475735958637022077320726753988995782543605875276064355046382636476224156927573410788265913428536281362746678685886477342515747725346205724908277948589850498950113032857048973248166230503904414015080108483413222096602346345325763118683239416195596913352139016432488199232132566385839598965555332124257078399090417720309411774421097191437214204796576044983316327834833222105146237548316050866369699850876429255645296741309710225470794517666162685190673394137156087317381360574331350929674031046372889099732764768604092375642787380395168571666222170013934385236219940414879886409254086394673631604203605436108581499762932943318959134010040793568875436218443906828584820850410509803672150295495821202471827088903655699694039407477528126302536535755178913446557295082494305811588939339905625856529970642483000600509412103276093981197711165525459371031282679106443571999891342788487025728031054793474962502015189375063908990710504205059139340048441494846749585770682660763741858638982000978845126174168249264753262086981396109082398653554959250888229072524248786310589381771608267693735086616061149065114993335429575126741915070311764384862452182088480319731960530717752449479079558471598704820992912826091

For a 2048-bit public key (n), these ciphertexts are 1024-bit values.

There is a surprising feature with Paillier encryption: the ciphertext above are not unique. Suppose we want to re-encrypt m1 into another encrypted number for the second time, then the ciphertext will not come back the same:

>>> M1b = pubkey.encrypt(m1)

>>> M1b.ciphertext()
215320743737696792505629417916091495555024649807936320684353999140834571484779962921512277916909414641309636112370294503228090979147962780269526673009406021881896793720689090071312153650626495787508248861034327242415603410146990342661619334510416764998889942132905802824724239120675557183424815361330704267772947690005229292906897925217332204148870783543479608682054196054545195592919684891849750764136444683823057851495872857773215322251877541513716609320457821781879940992643784545715595574202450603214394953691090145907141824134262849406978878239026886759001237077626562486714003717466071035536213885602577090940590577178665013822391659484716541142683844179550123522737256279019202767987360008603243973448284620325533478542036736751743971556195842146128863327582513693071968646235738170809604622656200900953991868485445863364714104339124619095140215161163009357556551996754076825186466841820534524743827225220418936217000161402407525456696209429025630673991941894032978204250155747283840025096489965601737665129716201572068989600305650659617843415841667211599052292050537888713743676090424391401102401343332605608284942470259004441154791461641050713624527451566473359673037755770138383260106548157593847792994342362880470720049308

The garbled form is different, yet the underlying plaintext is the same! This is an intended feature of the encryption scheme: by just looking at the ciphertext, one will not know whether two values are identical. In fact, it is impossible to perform numerical comparison involving one or more ciphertexts.

Now let us perform operation:

>>> M3 = M1 + M2

# Create some constants
>>> n1 = 2
>>> n2 = 0.125

>>> M4 = M1 * n1

>>> M5 = M1 + n2

# Multiplication of two ciphertexts are not supported
>>> M6 = M1 * M2
... (some stack trace)
NotImplementedError: Good luck with that...

The results of the mathematical operations above (M3, M4, M5) are also encrypted numbers which are jibberish to those without the decryption key:

>>> M3.ciphertext()
346271150828615488181846097847395851154922255082407573156807157423...

>>> M4.ciphertext()
217806911011334857004539935049659502210592047599395846344934617502...

>>> M5.ciphertext()
105199549075824746345972983983003901376943492867585255733113741110...

We can get back the actual numbers after numerical operations:

>>> m3 = privkey.decrypt(M3)
>>> m4 = privkey.decrypt(M4)
>>> m5 = privkey.decrypt(M5)

>>> print(m3)
13.151589999999999

>>> print(m4)
6.28318

>>> print(m5)
3.26659

How about cross-checking against the result of plaintext arithmetics?

>>> m1 + m2
13.151589999999999

>>> n1 * m1
6.28318

>>> m1 + n2
3.26659

Let’s play around with some numbers and make sure you are comfortable with the allowable operations!

Saving and loading encrypted data

Saving keypair

Every time generate_paillier_keypair() is called, a different keypair is generated, and the key values are random. This randomness is part of computer security. Therefore, there has to be a way to save and restore a keypair.

For this module, we will follow the approach suggested used by the authors of python-paillier (refer to python module phe.command_line in the source code for the origin):

def keypair_dump_jwk(pub, priv, date=None):
    """Serializer for public-private keypair, to JWK format."""
    from datetime import datetime
    if date is None:
        date = datetime.now().strftime('%Y-%m-%dT%H:%M:%S')

    rec_pub = {
        'kty': 'DAJ',
        'alg': 'PAI-GN1',
        'key_ops': ['encrypt'],
        'n': phe.util.int_to_base64(pub.n),
        'kid': 'Paillier public key generated by phe on {}'.format(date)
    }

    rec_priv = {
        'kty': 'DAJ',
        'key_ops': ['decrypt'],
        'p': phe.util.int_to_base64(priv.p),
        'q': phe.util.int_to_base64(priv.q),
        'kid': 'Paillier private key generated by phe on {}'.format(date)
    }

    priv_jwk = json.dumps(rec_priv)
    pub_jwk = json.dumps(rec_pub)
    return pub_jwk, priv_jwk

The function above returns two json objects (which are basically two strings) from the public and private keys, respectively.

The public key can be written easily to a text file:

with open("phe_key.pub", "w") as F:
    F.write(pub_jwk + "\n")
    print("Written public key to {}".format(F.name))
    print("n={}".format(pubkey.n))

Saving private key to file

Please follow the same pattern as the code snippet above to save your private key to a file called phe_key.priv.

Real-world challenge: Securely saving private key to file

We don’t use the simple open function to create a file to which we write the private key, because this is a security critical piece. Instead, we have to (1) delete the existing file (if it exists already), then (2) create the file with the exact permissions from the get-go (i.e. no group and world read-write permission bits). These steps are necessary so that no bystander can steal the private key information due to race condition. Challenge yourself to come up with a secure way to accomplish this.

Solution

Please look at file Paillier/create_save_keypair.py in your exercise directory for a possible solution.

Loading keypair

The routine to recreate the public/private keypairs is given as follows:

def keypair_load_jwk(pub_jwk, priv_jwk):
    """Deserializer for public-private keypair, from JWK format."""
    rec_pub = json.loads(pub_jwk)
    rec_priv = json.loads(priv_jwk)
    # Do some basic checks                                                                                                                                                      
    assert rec_pub['kty'] == "DAJ", "Invalid public key type"
    assert rec_pub['alg'] == "PAI-GN1", "Invalid public key algorithm"
    assert rec_priv['kty'] == "DAJ", "Invalid private key type"
    pub_n = phe.util.base64_to_int(rec_pub['n'])
    pub = paillier.PaillierPublicKey(pub_n)
    priv_p = phe.util.base64_to_int(rec_priv['p'])
    priv_q = phe.util.base64_to_int(rec_priv['q'])
    priv = paillier.PaillierPrivateKey(pub, priv_p, priv_q)
    return pub, priv

Key re-creation is straightforward:

with open("phe_key.pub", "r") as F:
     pub_jwk = F.read()

with open("phe_key.priv", "r") as F:
     priv_jwk = F.read()

pubkey, privkey = keypair_load_jwk(pub_jwk, priv_jwk)

Saving encrypted number

An encrypted number needs to be accompanied by its associated public key for it to be operable mathematically. Therefore, when saving one or more encrypted numbers (for example to a disk file), one must include the public key, or have a way to re-associate the public key when reloading the data. In the routines below we include the public key in the dump and load functions. The envec_ prefix stands for “encrypted vector”, where vector just means a list of one or more numbers.

def envec_dump_json(pubkey, enc_vals, indent=None):
    """Serializes a vector of encrypted numbers into a simple JSON format."""
    from phe.util import int_to_base64
    R = {}
    R['public_key'] = {
        'n': int_to_base64(pubkey.n),
    }
    R['values'] = [
        (int_to_base64(x.ciphertext()), x.exponent) for x in enc_vals
    ]
    return json.dumps(R, indent=indent)


def envec_load_json(R_json):
    """Deserializes a vector of encrypted numbers."""
    from phe.util import base64_to_int
    R = json.loads(R_json)
    R_pubkey = R['public_key']
    R_values = R['values']

    # deserialized values:
    pubkey_d = paillier.PaillierPublicKey(n=base64_to_int(R_pubkey['n']))
    values_d = [
        paillier.EncryptedNumber(pubkey_d, ciphertext=base64_to_int(v[0]), exponent=int(v[1]))
        for v in R_values
    ]
    return pubkey_d, values_d

All these routines are collected in a Python module called paillier_tools, available in your hands-on directory (file Paillier/paillier_tools.py).

Please take a second to look into this file and understand the purpose of each functions contained therein. There are two additional files, read_file and write_file, which are meant to read or write text data (represented as a single Python string).

Creating your own keypair and saving it to disk

As an exercise, please write a script called mykeys.py that creates a keypair, then save the public and private keys to two files: phe_key.pub and phe_key.priv, respectively.

Hint: this is the beginning part of the program:

import json
import os
import phe
from phe import paillier
from paillier_tools import *

pubkey, privkey = ...  # complete on your own

Hint: use the paillier.generate_paillier_keypair, keypair_dump_jwk, and write_file functions.

Solution

A possible solution is provided in create_save_keypair.py file in your hands-on directory.

Note: this script employs a technique to store the private key securely, which will be necessary when programming real applications! But we don’t need to worry for this in our hands-on activities; simply use write_file.

Now we are ready to perform some fun operations with the Paillier homomorphic encryption!

Key Points

  • Paillier cryptosystem permits addition of two ciphertexts, or one ciphertext and one plaintext, or multiplication of one ciphertext and one plaintext.