HPC for Advanced Cryptography
Overview
Teaching: 0 min
Exercises: 0 minQuestions
What are the main issues with security and trust in computing?
Objectives
First learning objective. (FIXME)
Motivation
In this lesson we will introduce a special class of encryption techniques called “homomorphic encryption” that will allow computation on encrypted data, while protecting privacy of the data owners.
Basic Encryption
Encryption has been the great pillar in protecting data from unauthorized access. Encryption makes data look like random, incomprehensible bits by parties that do not have the key to decrypt the message.
Let us introduce some notation to facilitate the later discussions. We will not be delving into deep math required to understand the detail of the encryption techniques; so do not worry.
-
Let the message to be encrypted be denoted by a lowercase letter such as
m
. Thism
can be a single numerical value (an integer, a real number), a string, or an image, among others. -
E
is the encryption operator; -
D
is the decryption operator.
Using the natations above,
-
M := E(m)
refers to the encrypted form ofm
; -
D(E(m))
refers to the recovered value ofm
after decryption.
Cybersecurity community often refers to m
as plaintext, while
E(m)
is called ciphertext—and this is regardless of whether
the message is actually text data or other type of data.
In this lesson, we will use the capital letters (other than D and E)
to refer to the encrypted form of the plaintext symbolized by the same
letter in lower case.
A particular cryptosystem consists of a pair of particular
E
and D
operators:
understandably, one cannot use D
from a different cryptosystem to
decipher ciphertext generated by another cryptosystem.
Encryption always involves a secret key
and possibly another public key:
-
Symmetric cryptography involves only one key, used by
E
andD
both to encrypt and decrypt the messages. In this case, the key must be kept a secret at all times. -
Asymmetric cryptography involves a pair of key, used respectively by
E
andD
to encrypt and decrypt the messages. One is the public key, which can be shared to other people, and the other is the secret key, usually called a private key. In this way, one party can encrypt or decrypt the message, but only the other party can do the opposite action.
Due to the nature of encryption, it is generally impossible to
perform transformation, manipulation, or arithmetic on the ciphertext E(m)
.
Once the message is encrypted, however, it generally becomes inoperable. Specifically, arithmetic operations cannot be applied to encrypted numbers For example, encrypted numbers we can’t do mathematics on the encrypted data.
Homomorphic Encryption
In this lesson, we will present a brief introduction to a class of advanced encryption techniques called homomorphic encryption (HE). One countermeasure that was recently invented is homomorphic encryption (HE). Unlike common cryptographic techniques, HE scrambles information but still permits limited transformations and operations to be performed on the encrypted data. This is an important feature for privacy preservation in data-driven computational techniques such as ML.
HE allows one to perform some arithmetic on the encrypted data without having to decrypt the data. The kind of operations possible would depend on the exact detail of the scheme. HE is still a very active research topic, and much more need to be worked out before the use of HE will become widespread.
HE for Privacy Preservation
One major motivation of inventing and deploying homomorphic encryption is to preserve the privacy of the data while still being able to learn something about the data owner(s).
Here are some examples where HE can help:
-
Research involving personal health data: Researchers want to learn about prevailing health issues among certain kind of population in society. They need to get into detailed health records of the people, which, given the details they need to know, may lead to the risk of identifying the persons in mind, even though the names and other personally identifying information have been stripped. In any case, the researchers only need the statistics about the population, not the health data of any invidiual person.
-
Secure voting: How to create an electronic voting system that preserves the confidentiality of individual voter’s choices, while at the same time being able to aggregate the votes to yield the election result? In some countries, knowing your party alignment can be a dangerous thing.
-
Mobile computing, internet-of-things, and machine learning: Mobile computing devices have been recognized as a golden source of data of all sorts: people’s preferences, physical activities, pictures, etc. Add “internet of things” to these devices—there is no shortage of data sources to answer a lot of questions that previously cannot be answered due to lack of data. Many people want to leverage the data for various good reasons, but without additional protection, users’s privacy is at risk. Imagine what happens if the server where the data are collected and stored get hacked and all the private data leaking out! Good people are not interested in tracking individual person’s data; what they want is the aggregate information (from statistical analysis) as well some model (obtained from machine learning) good for making predictions.
There are many other use cases envisioned. See the following 2017 whitepaper: “Applications of Homomorphic Encryption” for a representative list of potential applications for HE.
Current practice of data security include (1) the removal of irrelevant personal information and (2) setting up strong security perimeter so that only authorized parties can access the servers where the data resides. However, recent security breaches attest that these measures have some weak points. Often, the people are the weaklink. Just one cunningly crafted phishing can get the admin password in the wrong hand. In other cases, some unaddressed security vulnerability in the computer/network system leaves a hole for hackers to break in.
In this module, we will learn that HE can be utilized to harden data security when used on top of the standard security practices mentioned above. Even though individual bits of information are encrypted and never revealed, one can still do operations like statistical aggregation, machine learning, etc. using these sensitive data.
Security in Programming
Before embarking our lessons, let us be impressed that security is paramount in any computing environment. Security practices are not a main thrust in this lesson series, but a reminder is appropriate. Even in parallel computing, machine learning, big data, etc., we still have to be cognizant of security implications of our program to avoid having any part of our computation or data jeopardized or compromised. For example, there are so many “data” today that are sensitive in nature (e.g. personal data) being used in analytics or modeling, etc.
Cryptography Is a Serious Business
Unlike many other facets of programming, cryptographic programming is not something that we can approach casually. There are so many things that can go wrong with cryptography, which eventually leads to the risk of compromising the security of computing systems. The simplest example is using a weak password (or key for that matter). Another example is using a predictable random number generator.
Key Points
First key point. Brief Answer to questions. (FIXME)