Amazon: One Terabyte of public data now available to developers

Posted by Samantha Rose Hunt and Rick C. Hodgin

Chicago (IL) - Amazon.com, which has typically been known for its presence in the virtual shopping arena, has now begun offering over 1 Terabyte (1,000 GB) of data which is sure to completely wow the public via its newest project, called Public Data Sets on Amazon’s Web Services. The company announced the availability of four new data sets last night via a blog post. The data can be accessed via Amazon’s cloud computing service.





The data is a collection of all publicly available DNA sequences, US census data, the entire English section of Wikipedia in machine readable format, chemistry data, data from the US Department of Transportation and much, much more (a Terabyte's worth).



The company has made a claim that delivering this data to developers is necessary and "trivial", though it has yet to be revealed what developers are doing with it. It would seem that there is a potential for the building of a gigantic library of data, but for that we’ll have to wait and see.

Several other websites offer free web services which allow remote applications to access their internal server data via protocols like SOAP. However, this new set of data is not part of Amazon's server data, or anything related to their core business. It is simply a new tool provided unto the world.


The following is taken from Amazon.com's website:

How It Works
Select public data sets are hosted on Amazon EC2 for free as Amazon Elastic Block Store (Amazon EBS) snapshots. Amazon EC2 customers can access this data by creating their own personal Amazon EBS volumes, using the public data set snapshots as a starting point. They can then access, modify and perform computation on these volumes directly using their Amazon EC2 instances and just pay for the compute and storage resources that they use. If available, researchers can also use pre-configured Amazon Machine Images (AMIs) with tools like Inquiry by BioTeam to perform their analysis.

To get started using the Public Data Sets on AWS, simply perform these three easy steps:
1) Sign up for an Amazon EC2 account.
2) Launch an Amazon EC2 instance.
3) Create an Amazon EBS volume using the Snapshot ID listed in the catalog above for your chosen snapshot.
The ElasticFox Getting Started Guide provides a simple walkthrough of how to launch an instance and create an Amazon EBS volume using ElasticFox, a convenient FireFox plug-in. Or, see the Amazon EC2 Getting Started Guide.