Wednesday, August 20, 2014

Data Archeology

I asked a colleague what he thought about this article and he thought that data archeology would be a better name for the painstaking work that we do.  For instance, the National Archives just sent over 11 boxes of magnetic tape, originally from the National Hurricane Center.
Remember 8-track tapes?
These are 9-track.
He keeps a tape cleaner in his office for just this purpose.
The tape readers sit across the hall in another office.  The data will be read by a computer, and then pored over by several people with MS and PhD after their names.  We figure out what the tapes contain and rate their value, uniqueness and cleanliness/preparedness.  We can then estimate how much effort it will take to make them reusable and decide if it is worthwhile to do so.

Data that gets the highest rating may even be staged on a fast disk for instant global web access.  Otherwise, it will go into a tape robot like this one at the San Diego Supercomputing Center.  Researchers who request data from tape will have to wait a few minutes to a few hours to access the data.  (If there is a great deal of interest in a particular dataset, we will find space disk space for it.)

As a data archive, we preserve everything for future generations by converting the data to modern storage media.  If the data is unique or might have current or future value, we will spend many, many hours preparing it for research by standardizing the data and metadata to modern standards.

This part is labor intensive and also requires people with expert-level knowledge.  That's why everyone in our department has graduate degrees in atmospheric science, oceanography, statistics and/or information science.  It would be hard to imagine private industry putting this much effort into data without a sure commercial payoff*.  Some things need to be done by the government, or else they won't be done at all.

The sky when I left work tonight.

* Private industry does use this archive, and we don't charge them for it. This type of data work is infrastructure and paid for by taxes levied on a broad population.

I am speaking only for myself, and not for my department or our larger organization.

2 comments:

  1. that's great.. I haven't touched a data tape since 1988.. at the university, we used to get tapes from the provincial school systems with exam results, had to process them asap in order to get scholarship offers out to the best students before the other universities did - apparently not a few students would just take the first offer they got.

    This was interesting, on 'modern storage media' -
    http://spectrum.ieee.org/consumer-electronics/standards/will-todays-digital-movies-exist-in-100-years

    Preserving data can only be done with constant vigilance and continuing effort. Thank you and the government for yours ;-)

    ReplyDelete
    Replies
    1. Thanks for the link.
      And you are welcome. This is the perfect job for someone who is deeply interested in data and history--namely me!

      Delete