Thursday, December 3, 2009

Why I don't have the DoB listed in FB?

If you noticed my FB profile, I only list the birthday (not the DoB). I am not listing the DoB, not because I don't want to reveal my age (of course, someone who is knowledgeable about the contextual information about the FB groups I am in (e.g. Saint Aloysius College - 1998 batch, UoM 1999/2000 batch, etc.) could infer my year of birth), but because of security reasons. DoB - Security??? do you see the connection?

The other day I went to my bank to withdraw some money, but I forgot to carry my bank card which has the bank account number. So I had to show/prove the possession of other credentials. After getting a photo ID of mine, the next thing the operator asked from me is my DoB. Imagine now someone faking my photo ID and having my DoB listed online, walk to the counter. What does it prevent that someone from withdrawing money from my account?

There are even greater consequences; There have been research suggesting correlation of public information to your SSN number. SSN number is confidential information; only certain service providers such as bank, rental office, insurance company, utility providers, know it; leaking it could be a disaster.

Recently, researchers from CMU have cleverly shown that one can narrow down the possible SSN numbers an individual can have using public information available. If the attacker knows your DoB and location, they can guess your SSN with high probability:

Information about an individual’s place and date of birth can be exploited to predict his or her Social Security number (SSN). Using only publicly available information, we observed a correlation between individuals’ SSNs and their birth data and found that for younger cohorts the correlation allows statistical inference of private SSNs. The inferences are made possible by the public availability of the Social Security Administration’s Death Master File and the widespread accessibility of personal information from multiple sources, such as data brokers or profiles on social networking sites. Our results highlight the unexpected privacy consequences of the complex interactions among multiple data sources in modern information economies and quantify privacy risks associated with information revelation in public forums.

SSN has the format xxx-xx-xxxx (Area Number (AN) - Group Number (GN) - Serial Number (SN)); AN and GN are location specific and anyone can get the list of numbers assigned in each state. This reduces the entropy as some level of randomness is sacrificed. You might wonder why AN and GN are predictable? Well, one reason might be that it could be helpful to identify impostors by checking based on the location of the person that she provides.

Here's a pattern that they have discovered (from the paper):

The above figure shows the distribution of AN, GN and SN number (y-axis) in Oregon in 1996 for some publicly available records. The x-axis is the 365 days which represent the DoB's of those people in the records. Notice the correlations.

The paper also mentions about the dramatic reduction of entropy of the SSN numbers. (For better security you need to have higher entropy. If you can inject more randomness, you get higher entropy and it becomes less predictable.)

Without auxiliary knowledge, the theoretical entropy of an SSN can be estimated at 30 bits (in log2).
The more granular knowledge of the assignment scheme that we have shown to be inferrable significantly decreases that entropy (for some states, down to 11 bits).

The point is that one piece of information alone may not be an interesting asset to an attacker and hence you may think it is safe. However, it is the correlation of different pieces that makes attackers life easier. With social networks and other online web sites booming, different pieces of our personal information are available through different locations. These public information could be used by attackers to engineer attacks which we may not imagine in our wildest possible dreams. (In short, the more public information available about us, the less entropy for our confidential information.)

Note: It would be much better for me from the security PoV to completely hide my birthday in FB. You might be wondering why I am not doing so; privacy is a strange thing; it is an economic good; it has trade-offs. I currently perceive that the benefits over-weighs the threats :-)
(Maybe I'll write about the privacy economy in a future blog; there has been a lot of research in this topic for quite some time.)


Eran Chinthaka said...


Nice post. But IIRC, banks need a "valid" id like a passport or driver's license. If some one can forge that then you will have many more things than your bank account to worry about.
About the social security issue, now that appears really scary. We should not forget that black hackers are capable of breaking in to federal networks also. So for them inferring SSNs, just like described in research paper will be a piece of cake. Even though the data in the paper seems promising in Oregon, I wonder whether it is true everywhere. If that is the case, who will protect Americans?

Nabeel Yoosuf said...

Agree with your point. An outside bank may ask for passport or driver's license. I went to the university bank located inside the university union. They were okay with me showing only the Purdue ID. Agree that if one can forge ID's they can do much bigger damages.

The paper has another figure for PA. The figure for PA looks more random than that for OR. The reason they mention is that the birth rate in OR is less than that in PA and also OR is less populated. So, with higher population growth, it won't look so predictable. (On a lighter note, to protect America, according to this paper produce more babies :D) The paper also mentions that there are some steps already to further randomize certain digits in SSNs.