Wednesday, March 7, 2007

Transferring Terabytes of data from point A to point B

Read an interesting BBC news post. Chris DiBona, open source program manager at Google, talks about Google’s open source effort towards overcoming the problem of sending huge amount of data across network. The idea has been inspired by the work (Microsoft TerraServer) done by Jim Gray et al., the father of satellite mapping on the web (I am not sure if they have found Jim who went missing at sea).

Here’s the abstract of Microsoft TerraServer: a spatial data warehouse, published in Proceedings of the 2000 ACM SIGMOD international conference on Management of data.

Microsoft® TerraServer stores aerial, satellite, and topographic images of the earth in a SQL database available via the Internet. It is the world's largest online atlas, combining eight terabytes of image data from the United States Geological Survey (USGS) and SPIN-2. Internet browsers provide intuitive spatial and text interfaces to the data. Users need no special hardware, software, or knowledge to locate and browse imagery. This paper describes how terabytes of “Internet unfriendly” geo-spatial images were scrubbed and edited into hundreds of millions of “Internet friendly” image tiles and loaded into a SQL data warehouse. All meta-data and imagery are stored in the SQL database.

TerraServer demonstrates that general-purpose relational database technology can manage large scale image repositories, and shows that web browsers can be a good geo-spatial image presentation system.

No comments: