Digital ark aims to preserve dead data formats

Posted by Emma Woollacott

A group of European institutions have created what they call a 'digital genome', designed to allow future generations to read data stored on obsolete computer systems.

Research by the European Commission co-funded Planets project has suggested that as hardware and software goes out of date, the EU alone is losing over three billion euros worth of digital information every year.

"Anyone using a relatively modern PC who has ever gone back and tried to read material stored on a floppy disc will instantly recognise the frustration of trying to access obsolete formats," says project coordinator Adam Farquhar of the British Library.

"Yet the death of the floppy disc is just the tip of the iceberg. Even if you possess the necessary hardware to access a particular storage format and the files haven’t become corrupt, without the supporting software and compatible operating systems, knowing what is on the disc, let alone reading the files in question, will be impossible."

But this week, a time capsule containing a record of the ‘Digital Genome’ was deposited inside 'Swiss Fort Knox' – a high security digital storage facility in the Swiss Alps. It contains the information and the tools to reconstruct data long after the  supporting technology has disappeared.

Inside are five formats that the group sees as endangered species - JPEGs, JAVA source code, .Mov files, websites using HTML, and PDF documents. Versions of these files are stored in archival standard formats – JPEG2000, PDFA, TIFF and MPEG4 – to prolong their lifespan for as long as possible.

Two and a half thousand additional pieces of data map the 'genetic code' necessary to describe how to access these file formats in future, and the required code is translated into multiple languages to improve the chances of being able to interpret in the future.

Copies of all information are stored on a range of storage media, from CD, DVD, USB, Blu-Ray, floppy disc, and solid state dard Drives to audio tape, microfilm and even paper printouts.

"Unlike hieroglyphics carved in stone or ink on parchment, digital data has a shelf life of years, not millennia," says Andreas Rauber of the University of Technology of Vienna.

"With technology continuing to develop at a blistering pace and the rate of data creation showing no signs of slowing down, failure to implement adequate digital preservation measures now could cost us billions in the future."