This week, I presented this talk to a database security class I am taking. It provides some background details about FGAC and then moves on to focus on a SIGMOD 2006 paper "Redundancy and Information Leakage in FGAC" by Kabra et. al. Up until this work, researchers were mainly focusing on the problem "How to prevent information leakage from the query result when FGAC is in action". In other words, research focused on how to make sure that users get only those data they are authorized to access, no more and no less (no less -> completeness is a difficult thing to achieve in FGAC - Can we make sure that user gets all the data that she is entitled to receive?) The paper investigates information leakage through channels other than the query result. Specifically through UDFs (User Defined Functions), exceptions and error messages. A trivial solution to prevent potential information leakage through UDFs is to pull UDFs all the way up in the query plan so that these functions get executed only on authorized relations. However, it could be very inefficient if these UDFs are selective. The novelty of the approach comes from the fact that UDFs are placed to achieve both efficiency and security at the same time. The authors mainly compare improvement in the execution time only. It could mislead readers as they do not include the optimization time in it (Obviously they require more time to optimize as they need to take into account potentially malicious UDFs). Ideally, execution time should be equal to query optimization time + query execution time.

What other approaches can we take to prevent such leakages? A trivial solution is to restrict who can define UDFs. If we allow any user to define UDFs, can we correctly infer if they are safe or not? It looks to be a difficult problem - depending on the context, one UDF may be safe in one situation but not in another. Even more difficult issue is, some UDFs might not actually leak visible data, they may exploit subtle timing variations based on the values passed these functions (For example, a UDF takes a longer time to process for certain types of values - which effectively allow the user to infer some information about the data she does not have access to).

Information leakage is not restricted to FGAC. It's been a widely researched topic in many areas including web applications. In fact, information leakage is one of the top 10 security concerns of ad-hoc web application development.

## Friday, February 27, 2009

## Sunday, February 22, 2009

### NTL library for doing number theory

I have been working on an implementation for a new key management scheme. It involves a lot number theory and linear algebra over a finite field. I initially tried to use the GP/Pari C library (and a couple of more), but it has a very low-level programming API and documentation is not satisfactory either. However, it is really easy use as a command-line tool, but I wanted a library to embed in my C++ code. Then, a colleague of mine suggested using NTL. NTL C++ library is really helpful for such work. This library is well documented and easy to use.

## Thursday, February 19, 2009

### How to pick a research topic in 10 seconds?

No, I am not an expert to answer this question, but Prof. Douglas Cormer (Distinguished prof. of CS Purdue, currently VP at Research collaboration at Cisco) has a way to answer the question :-)

## Sunday, February 15, 2009

### How large can Mersenne primes get?

Long time ago I mentioned about the discovery of 43rd Mersenne Prime number. (Any number of the form 2^n - 1 where n is a prime is a mersenne number. If that number is prime, it is called a mersenne prime) Recently (in Aug/Sep last year) UCLA mathematicians discovered the 45th and a week after the 46th by Germans.

Why do people bother finding them? It is a grand computational challenge! (thus considered to be a great discovery which brings enormous recognition from peers) and can advance the field of computing by dealing with the enormous computational complexity. Since it is so hard to prove whether a Mersenne number is a prime number (primality testing is computationally expensive for large numbers), researchers have been developing various distributed/parallel architectures to attack the problem. As a result, it has led to the advancement of our understanding on such architectures. These numbers are also useful for cryptographic protocols. For example, Mersenne primes are used in PRNG's.

Can there be larger Mersenne primes than the 46th one? It is still uncertain whether there are infinitely many of them. But scientists believe there are.

I know a Mersenne prime uniquely identify a perfect number. But, apart from the applications where prime numbers are useful, are there other practical applications of Mersenne primes?

Why do people bother finding them? It is a grand computational challenge! (thus considered to be a great discovery which brings enormous recognition from peers) and can advance the field of computing by dealing with the enormous computational complexity. Since it is so hard to prove whether a Mersenne number is a prime number (primality testing is computationally expensive for large numbers), researchers have been developing various distributed/parallel architectures to attack the problem. As a result, it has led to the advancement of our understanding on such architectures. These numbers are also useful for cryptographic protocols. For example, Mersenne primes are used in PRNG's.

Can there be larger Mersenne primes than the 46th one? It is still uncertain whether there are infinitely many of them. But scientists believe there are.

I know a Mersenne prime uniquely identify a perfect number. But, apart from the applications where prime numbers are useful, are there other practical applications of Mersenne primes?

## Friday, February 13, 2009

### Talk: From Indentity to Attribute Based Cryptosystems

Today for the group meeting, I did an informative presentation on the $subject. The slides are here. It uses existing materials from several public available presentations. When I was preparing this talk, some of the related issues from applied cryptography perspective came to my mind. (It is possible some of these issues has already been addressed and I may not be aware of it. Please do let me know if you have any pointers)

If we use constant values as attributes, how do we efficiently revoke a key (i.e. ideally without affecting other users while preventing access to the revoked user)?

Can we say that IND-CA-CCA provides stronger security than IND-CCA?

There is a huge trust placed on the PKG (Private Key Generator) server. How do we increase the trust? (one possibility is to use a threshold scheme to make a collective decision)

How can we be sure that user authentication to PKG is perfect? (It gets 100% right that no one can impersonate others) The security of the whole system depends on how strong the authentication process is.

We need to send attributes in the clear. This may allow malicious parties to infer information about the message being sent. Can we come up with a scheme that hides even the attributes? (We don't want to increase the load on the recipient)

If we use constant values as attributes, how do we efficiently revoke a key (i.e. ideally without affecting other users while preventing access to the revoked user)?

Can we say that IND-CA-CCA provides stronger security than IND-CCA?

There is a huge trust placed on the PKG (Private Key Generator) server. How do we increase the trust? (one possibility is to use a threshold scheme to make a collective decision)

How can we be sure that user authentication to PKG is perfect? (It gets 100% right that no one can impersonate others) The security of the whole system depends on how strong the authentication process is.

We need to send attributes in the clear. This may allow malicious parties to infer information about the message being sent. Can we come up with a scheme that hides even the attributes? (We don't want to increase the load on the recipient)

## Thursday, February 5, 2009

### [Crypto] Elliptic Curves in Figures

Lately, I got interested in elliptic curve cryptography and have been reading materials on it. All of the cryptographic algorithms rely on the fact that computers are computationally bounded. While this assumption still holds, the bound keeps on pushing up with the advancements in technology. IMO, eventually conventional cryptographic algorithms will be replaced by elliptic curve cryptographic algorithms. I hope the following pictorial tour will give some pointers as to why it will happen and some basics of ECC.

Why does an asymmetric cryptographic system work?

As we increase the key length, the inverse operation exponentially becomes hard while the complexity of the forward operation increases only slowly. It is the hardness of the inverse operation (in conventional systems, the discrete logarithm problem: given y (which is = g^x mod p), g, and p (a large prime), find x) that makes asymmetric cryptography possible. To keep up with the ever increasing computing power, it forces to use longer key lengths.

For RSA 1024 or above. Can we do the job with smaller key while making the inverse operation hard? We need a curve which has a faster exponential growth for the inverse operation.

ECC (Elliptic Curve Cryptography) is the solution.

How is it possible? Both CC (Conventional Cryptography) and ECC use groups. The difference is how the group and its operation for ECC is defined. In CC, we use multiplicative groups of integer modulo p. In ECC, we actually use elliptic curves!

How does an EC look like? It has the form y^2 = x^3 + ax + b.

The points on the EC are the members of the group. Actually it has one extra member; it i the identity element and denoted by infinity. Addition of points is the group operation. But how exactly do we add points?

If P and Q are two on the EC, P + Q = -R where -R is the third point on the curve provided that the line passing through P and Q is not vertical. If it is vertical, there can be at most two points on the EC. Those two points have the form R and -R. For example R = (x, y) and -R = (x, -y).

What if we add a point to itself?

P + P = -R (draw a tangent, if it is not vertical, it will intersect the EC at one more point (which is R). The point of reflection is -R)

Point multiplication is what we use to build a asymmetric cryto system: kP = P + P + ... + P (k times). Even though it is difficult to visualize graphically, there are efficient algorithms to calculate this. What is the inverse here? Given the points P and Q on E, find k such that Q = kP. It is a hard problem and is the equivalent of discrete logarithm problem in ECC.

For example, here is how ElGamal public key cryptosystem works.

Fix an elliptic curve E modulo p and select a point Q of large order on E (Like fixing a large prime p and selecting a primitive root 1 < g < p in Classical ElGamal (CE for short)). (Just like g, Q is also public)

Each user has a private key. Alice selects a secret 0 < a_A < p (same as in CE).

Alice publishes her public key b_A = a_A.Q (Just like b_A = g^a_A in CE).

Now Bob wants to send a secret message M to Alice (this message is actually embedded to a point in E). Bob selects a random k such that 0 < k < p and sends Alice the following pair.

C = (kQ, kb_A + M) (Like C = (g^k, M.b_A^k) in CE)

Alice finds M from C.

(k.b_A + M) - (k.Q)a_A = (k.a_A.Q + M) - k.a_A.Q = M (Like M.b_A^k/g^k^b_A = M in CE).

If an eavesdropper to find M without knowing a_A (Alice's secret), she needs to solve the discrete logarithm problem!

References: here and Wagstaff's Cryptanalysis book

Why does an asymmetric cryptographic system work?

(Source: here)

As we increase the key length, the inverse operation exponentially becomes hard while the complexity of the forward operation increases only slowly. It is the hardness of the inverse operation (in conventional systems, the discrete logarithm problem: given y (which is = g^x mod p), g, and p (a large prime), find x) that makes asymmetric cryptography possible. To keep up with the ever increasing computing power, it forces to use longer key lengths.

For RSA 1024 or above. Can we do the job with smaller key while making the inverse operation hard? We need a curve which has a faster exponential growth for the inverse operation.

ECC (Elliptic Curve Cryptography) is the solution.

How is it possible? Both CC (Conventional Cryptography) and ECC use groups. The difference is how the group and its operation for ECC is defined. In CC, we use multiplicative groups of integer modulo p. In ECC, we actually use elliptic curves!

How does an EC look like? It has the form y^2 = x^3 + ax + b.

The points on the EC are the members of the group. Actually it has one extra member; it i the identity element and denoted by infinity. Addition of points is the group operation. But how exactly do we add points?

If P and Q are two on the EC, P + Q = -R where -R is the third point on the curve provided that the line passing through P and Q is not vertical. If it is vertical, there can be at most two points on the EC. Those two points have the form R and -R. For example R = (x, y) and -R = (x, -y).

What if we add a point to itself?

P + P = -R (draw a tangent, if it is not vertical, it will intersect the EC at one more point (which is R). The point of reflection is -R)

Point multiplication is what we use to build a asymmetric cryto system: kP = P + P + ... + P (k times). Even though it is difficult to visualize graphically, there are efficient algorithms to calculate this. What is the inverse here? Given the points P and Q on E, find k such that Q = kP. It is a hard problem and is the equivalent of discrete logarithm problem in ECC.

For example, here is how ElGamal public key cryptosystem works.

Fix an elliptic curve E modulo p and select a point Q of large order on E (Like fixing a large prime p and selecting a primitive root 1 < g < p in Classical ElGamal (CE for short)). (Just like g, Q is also public)

Each user has a private key. Alice selects a secret 0 < a_A < p (same as in CE).

Alice publishes her public key b_A = a_A.Q (Just like b_A = g^a_A in CE).

Now Bob wants to send a secret message M to Alice (this message is actually embedded to a point in E). Bob selects a random k such that 0 < k < p and sends Alice the following pair.

C = (kQ, kb_A + M) (Like C = (g^k, M.b_A^k) in CE)

Alice finds M from C.

(k.b_A + M) - (k.Q)a_A = (k.a_A.Q + M) - k.a_A.Q = M (Like M.b_A^k/g^k^b_A = M in CE).

If an eavesdropper to find M without knowing a_A (Alice's secret), she needs to solve the discrete logarithm problem!

References: here and Wagstaff's Cryptanalysis book

### Tennis..

I was a bit disappointed when the Women's final of Australian Open 2009 lasted only a little less than one hour. Serena was too good for Dinara. However, the Men's final was a classic battle between Federer and Nadal. The game was so intensely fought that the first set took as much time as the duration of the women's final!

Hardcourt was Federer's territory but his strength was not enough to earn the title this time. It was more than the title up for grab. He had the opportunity to equal Sampras's record of 14 grand slam titles. Now it is put on hold until French open. I think Federer had the upper hand in the first four sets. However, when it went down to the 5th set, Federer was kind of mentally down and gave up the set.

During the award ceremony his emotions said the story. He broke down and couldn't even speak in the first attempt.

Even Nadal seemed to be in a difficult situation and was a bit shaky about how to respond to the situation, but tried to calm down his opponent.

Now Nadal has won back to back major titles in clay, grass and hardcourt (for the first time). It was simply fascinating to see how he produced winners on the run. (During the first four sets I thought Federer was equally/more impressive than Nadal. I particularly loved the winner he produced by a back-hand shot.)

Now Federer has lost to Nadal in three major finals in a row. He needs to do something different to avoid being the runners-up again:

Nadal is kind of unstoppable now. I think both are equally talented and have their own strengths (there were a lot of "what a shot" moments during the game) - more than the ability it all boils down to the belief that one can beat the other. Only if and when Federer regroups to believe in himself, will he equal/overtake Sampras's record..

Hardcourt was Federer's territory but his strength was not enough to earn the title this time. It was more than the title up for grab. He had the opportunity to equal Sampras's record of 14 grand slam titles. Now it is put on hold until French open. I think Federer had the upper hand in the first four sets. However, when it went down to the 5th set, Federer was kind of mentally down and gave up the set.

During the award ceremony his emotions said the story. He broke down and couldn't even speak in the first attempt.

(Source: Australian Open)

Even Nadal seemed to be in a difficult situation and was a bit shaky about how to respond to the situation, but tried to calm down his opponent.

Now Nadal has won back to back major titles in clay, grass and hardcourt (for the first time). It was simply fascinating to see how he produced winners on the run. (During the first four sets I thought Federer was equally/more impressive than Nadal. I particularly loved the winner he produced by a back-hand shot.)

Now Federer has lost to Nadal in three major finals in a row. He needs to do something different to avoid being the runners-up again:

Nadal is kind of unstoppable now. I think both are equally talented and have their own strengths (there were a lot of "what a shot" moments during the game) - more than the ability it all boils down to the belief that one can beat the other. Only if and when Federer regroups to believe in himself, will he equal/overtake Sampras's record..

Subscribe to:
Posts (Atom)