A problem with k-anonymity:
Table 1The above table shows a list of patient records. The following table shows the 3-anonymous version of the above table.
Table 2Disease attribute is sensitive here. If a user knows some background information, she can infer sensitive attributes of patients. Suppose Alice knows that Bob is 25 years old and lives in ZIP 47625. She knows that Bob belongs to one of the first three records in table 2. Since all of them have the same disease, Alice can deduce that Bob has heart disease. l-diversity was introduced to overcome this problem where there are at least l distinct sensitive attributes for each equivalent class. Had the table 2 had the 2-diversity property, Alice would only be able to guess Bob's illness with only 0.5 probability. (Note: even l-diversity has issues. t-closeness was introduced to address one of such issues)
Table 4 shows the 3-diversity version of table 3. Here both salary and disease are sensitive attributes. Notice that each equivalent class has 3 distinct sensitive attributes.
Even when the sensitive attributes are different, if they are semantically similar (low salary range, some specific types of diseases, etc.), one can perform a similarity attack. For example, if one knows that Bob is one of the first three records, one can deduce that Bob's in low income range of 3K-5K and has a stomach related disease. To account for the semantically closeness of values (i.e. to prevent similarity attacks), t-closeness was introduced. (I leave it for you to read)
A problem with replacing non-disclosed values with NULL:
(a) show a customer table with attributes ID, name, age and phone number. (Y) and (N) indicate the users' content about disclosing those attribute values. For example, Linda and Mary don't mind disclosing everything, but Nick does not want the organization to disclose his age and phone number to any third party. It is common practice to replace (N) values with NULL before executing the query.
Q1: “SELECT name, phone FROM Customer”
Q2: “SELECT name, phone FROM Customer WHERE age >= 25"
Q = Q1 - Q2
Intuitively Q should return only those records with age < 25 as in (e). But to the contrary, it returns two results as in (d). Therefore, using NULL is not a sound solution. A variable based approach was introduced to solve this problem. (I leave it for you to read)
Zero-Knowledge Proof of Knowledge (ZKPK):
Discrete Logarithm Problem is hard to solve. We can use this hardness assumption to hide a secret.
(DLP: Given a generator g of a group of order p and c (which is calculated from g^x), find the value (i.e. x) that gives c)
If you can solve the problem, you can convince another party you possess a secret. How do you actually convince the party that you know the secret without revealing the secret?
Alice: I know the value x corresponding to c. (but I don't want to tell you x)
Bob: Show me.
Alice: Chooses a random y selected from Z_p and send g^y to Bob.
Bob: Send the challenge r selected from Z_p to Alice.
Alice: Computes s = y + r.x and sends s to Bob
Bob: Checks if g^s = g^y.c^r and is convinced that Alice knows x if those are equal.