Nabeel's Blog

Tuesday, February 22, 2011

Encryption in the cloud

Good read on the $subject

There are solutions for data-at-rest, but they are far from perfect for cloud scenarios. In my opinion, the most challenging one is the data-in-process. There are some basic solutions to perform computation over encrypted data, but they are quite limited. This is mainly due to the fact that we still don't have efficient fully homomorphic encryption schemes. I think in the next decade, we will see better techniques to perform computation over encrypted data without decrypting them.

Monday, February 21, 2011

Homomorphism in Shamir's secret sharing

The homomorphism property in the Shamir's secret sharing scheme is a simple but a very powerful idea that could be useful to reduce the trust we have to put on the dealer to correctly distribute secret shares to users. (See my previous post for an introduction on verifiable secret sharing).

Let's look a simple example to see what this property means.

Assume that we have two degree two polynomials g and f which we can use to create two 3-out-of-n secret sharing schemes where at least 3 out of n users should get together to obtain the secret.

f(x) = 2 + x + x²
g(x) = 3 + 2x + 3x²

We have three users with the following secret shares given from the two secret sharing polynomials where the secrets are 2 and 3 for f and g respectively. Let's call them sub-secrets.

User	f	g
u₁	(1,4)	(1,8)
u₂	(2,8)	(2,19)
u₃	(3,14)	(1,36)

Let's call the combination of the two sub-secrets as the super-secret, which is equal to 5. The problem we have at hand is "how can the users construct the super-secret revealing as little information about the sub-secrets as possible?".

It is not difficult to see that if the three users releases the sum of their secret shares, that is (1, 12), (2, 27) and (3, 50), and do the polynomial interpolation on in it, the combined function gives the super-secret 5. So, the sum of the shares of the secrets are the shares of the sum of the secrets. This is what the homomorphism property is.

The above relationship is defined as the (+, +)-homomorphism. It could be two different operators. For example, Shamir's secret sharing scheme can be converted to (*, +)-homomorphism by using discrete log of secret shares.

References:
Josh Cohen Benaloh, Secret Sharing Homomorphisms: Keeping Shares of a Secret, Proceedings on Advances in cryptology---CRYPTO, 1986

Friday, February 18, 2011

A simple construction of a veribable secret sharing scheme

In a SS (Secret Sharing) scheme a secret is split among many people and some of them need to corporate to build the secret from their shares of the secret. In in it is basic form, a dealer who possesses the secret s splits the secret and gives the secret share s_i to each party P_i, i = 1, 2, ..., n. The protocol makes sure that there should be at least t+1 parties to derive the master secret s. In other words, t or less number of parties cannot derive the secret.

In a normal SS scheme, the dealer is assumed to be honest. What if the dealer can be malicious? That's where VSS (Verifiable Secret Sharing) comes to play. VSS is similar to SS except that it allows each party to verify that the share given to them, s_i, is a valid one.

Here's a simple VSS scheme using Shamir's secret sharing and additive homomorphic encryption.

An additive homomorphic encryption scheme (e.g. ElGamal cryptosystem) allows us to perform addition over encrypted data. In particular, we use the following properties. Let E be the encryption algorithm and m₁, m₂ two numbers.

E(m₁)E(m₂) = E(m₁ + m₂)
E(m₁)^m₂ = E(m₁m₂)

The dealer generates secret shares just like in the Shamir's SS scheme. For (n, t+1) (i.e. out of n parties, there should be at least t + 1 parties to construct the secret, the dealer selects a random degree t polynomial f with f(0) = s:

f(x) = s + a₁x + a₂x² + ... + a_tx^t

The dealer sends the secret share (i, f(i)) (f(i) = s_i) to each party P_i through a private communication channel. Now t + 1 parties can perform a polynomial interpolation and obtain a unique polynomial which is equal to f and get the secret f(o).

In order to verify their secret shares, the dealer distribute an additional piece of information. The dealer generates public and private key pair for the additive homomorphic encryption E and broadcasts the encrypted coefficients of f, that is, E(s), E(a₁), E(a₂), ..., E(a_t). A party P_i, first encrypts it share and compare if the following holds. If it is true, it concludes that the dealer has given a valid secret share.

E(s_i) ==? E(s)E(a₁)ⁱE(a₂)^i² ... E(a_t)^{i^t}References:
How to Share a Secret, 1979, Adi Shamir
A Practical Scheme for Non-interactive Verifiable Secret Sharing, 1987, Paul Feldman

Tuesday, February 15, 2011

Flat table based group key management

Flat table (FT) key management is a simple way to manage group key. In it is basic form, it is quite efficient, but it is susceptible to collusion attacks.

First the notations:
GC - Group Controller - the centralized entity that manages the group key
GM - Group Member - a participant who wants to obtains the group key
K - the group key, it is also called DEK (Data Encryption Key) and is used to encrypt broadcast messages to the group.
KEK - Key Encryption Key - a key used to encrypt K.

Each GM in the group is assiged an identifier, ID. ID is an n-bit binary number which is denoted as X_n-1X_n-2...X₀, where X_i is either 0 or 1. The maximum size of the group N = 2ⁿ.

GC maintains 2n KEKs, {k_i,b | i ∈ Z_n, b ∈ Z₂}. That is for each bit position, there are two keys, one for 0 and one for 1.

Each GM is given n + 1 keys. Half of the KEKs (i.e. n) which correspond to the n bits in its ID and K.

Join:
Let's assume that the ID of the joing GM is X_n-1X_n-2...X₀. In order to provide backward secrecy (i.e. not allow the joining GM to access previous keys), we need to update K and n KEKs {k_{i,X_i} | i ∈ Z_n} corresponding to this ID. New K and KEKs are encrtypted with their old K and KEKs respectively and broadcast to the group. GC gives the new K and KEKs to the joining GC though a secure private communication channel.

Leave:
Let's assume that the ID of the leaving GM is X_n-1X_n-2...X₀. In order to provide to provide forward secrecy (i.e. not allowing the leaving GM to access new keys), again K and the n KEKs held by the leaving GM are updated.
However, we need to use a different approach to update the keys since we cannot reuse the old keys to do the job. GC decides a new group key K' and encrypts with the remaining n keys that the leaving GM does not possess and broadcasts the encrypted messages. An existing group member can decrypt at least one of those messages and obtain the new group key K'. Then GC encrypts each new KEK corresponding to those of the leaving GM twice. First GC encrypts with K' and then encrypts with the old KEK, and then broadcast to the group. It make sure only the existing users can access new KEKs who already had old KEKs. Notice that the leaving GM cannot access since it does not have K' to remove one layer of encryption.

There are proposals for optimizing for multiple leaves. In one work, the use Boolean function minimization techniques to reduce the complexity.

Collusion attacks:
When multiple users are removed at once, the FT key management approach is not resistant to collusion attacks.

References:
Waldvogel et. al. The VersaKey framework: versatile group key management, 1999

Chang et. al. Key management for secure internet multicast using Boolean function minimization techniques, INFOCOM 1999

Wednesday, February 9, 2011

What should we have in a security definition/protocol?

There are three key components that should be specified in any security definition/protocol.

1. Adversarial power under which the protocol is secure
2. Network model in which the protocol operates
3. Security guarantees that the protocol provides

Security is not something absolute. We may not be able to protect a protocol against any type of adversary. A security protocol is meaningless if it does not specify the capabilities of the adversaries under which the protocol is still secure. The capabilities of adversaries are specified in the adversarial model. For example, if the adversary is computationally bounded or have unbounded computational power, if the adversary is honest, semi-honest or malicious, etc.

Your protocol may need a trusted third party to work. It may not be secure without a trusted third party. Your protocols may need certain parties to online at all times. Such information is specified in the network model. Like the adversarial power, your protocol is meaningless without a proper network model.

As I mentioned earlier, security is moving line; your protocol may solve only certain problems. It is crucial to specify what problems it solves up front as part of the security guarantees.

In short, you need to specify what security guarantees your security protocol provides under which adversarial and network model. Otherwise, a security protocol is as useless as zero security protocol.

A big mistake people, especially in the database community, make is to say "our protocol is secure for this data set and it does not reveal any useful information to the adversary". This is very wrong. You cannot define security based on a input data set. What happens when you have a different data set? Will your protocol still be secure? Probably not. Security should never be analyzed by a set of examples. Rather, it should be based on the above 3 objective measures.

Friday, January 28, 2011

Broadcast encryption vs. DRM

In the last post, I started looking into the problem of broadcast encryption (BE). In this short post, I am comparing the BE problem with DRM (Digital Rights Management) problem.

BE and DRM has the common goal of preventing unauthorized users from accessing the content. However, IMO, DRM is more challenging since the adversary could be a privileged user. A privileged user who has legitimate access could decide to copy or convert the content into a different format and share it with non-privileged users. Hence, DRM requires additional mechanisms to prevent copying or conversion or at least mechanism to identify a traitor if DRM is violated.

Friday, January 21, 2011

Broadcast encryption

Broadcast encryption (BE) deals with encryption schemes designed for broadcast transmission systems such as pay-TV systems, content dissemination in an organization, the distribution of copyright protected material on disks, audio/video streaming systems and so on. The goal is to allow only an arbitrary subset of users (sometimes called privileged users) from the universe of users to access the content while minimizing key management overheads. In this series of blog post, I will be looking into how this field of BE evolved from research point of view, the current state of the art schemes available to address this research problem and future directions.

(Figure: broadcast encryption)

In a high level, BE works as follows: Each user in the universe is given a set of symmetric keys initially. A set of messages are sent to establish a common key among the set of privileged users so that only the privileged users can decrypt the broadcast messages using the common key.

Naive approach 1
Each user is given a unique symmetric key.
When a privileged set needs to be establish, the controller selects a group key K and encrypts number of times equal to the size of the privilege set using the symmetric keys of the privileged set and sends to the users.
Subsequent broadcast messages are encrypted with the key K.

It works but it requires a very long transmission to establish the common key (number of users in the privileged set into the size of the message).

Naive approach 2
Each possible subset of users is assigned a unique symmetric key.
When broadcasting a message, encrypt it with the correct symmetric key corresponding to the privileged set.

It also works, but users need to store prohibitively many keys and revocation is also difficult.

It should be clear that BE is an optimization problem which tries to optimize the following parameters.
1. The number of keys given to each user.
2. The number of transmissions used by the controller to establish the common key.
3. The computation effort involved in retrieving the common key by the users of the privileged set.

References:
1. Broadcast encryption, Amos Fait and Moni Naor, 1998