decoration decoration
Stories

GROKLAW
When you want to know more...
decoration
For layout only
Home
Archives
Site Map
Search
About Groklaw
Awards
Legal Research
Timelines
ApplevSamsung
ApplevSamsung p.2
ArchiveExplorer
Autozone
Bilski
Cases
Cast: Lawyers
Comes v. MS
Contracts/Documents
Courts
DRM
Gordon v MS
GPL
Grokdoc
HTML How To
IPI v RH
IV v. Google
Legal Docs
Lodsys
MS Litigations
MSvB&N
News Picks
Novell v. MS
Novell-MS Deal
ODF/OOXML
OOXML Appeals
OraclevGoogle
Patents
ProjectMonterey
Psystar
Quote Database
Red Hat v SCO
Salus Book
SCEA v Hotz
SCO Appeals
SCO Bankruptcy
SCO Financials
SCO Overview
SCO v IBM
SCO v Novell
SCO:Soup2Nuts
SCOsource
Sean Daly
Software Patents
Switch to Linux
Transcripts
Unix Books

Gear

Groklaw Gear

Click here to send an email to the editor of this weblog.


You won't find me on Facebook


Donate

Donate Paypal


No Legal Advice

The information on Groklaw is not intended to constitute legal advice. While Mark is a lawyer and he has asked other lawyers and law students to contribute articles, all of these articles are offered to help educate, not to provide specific legal advice. They are not your lawyers.

Here's Groklaw's comments policy.


What's New

STORIES
No new stories

COMMENTS last 48 hrs
No new comments


Sponsors

Hosting:
hosted by ibiblio

On servers donated to ibiblio by AMD.

Webmaster
Lessons on Data Preservation From the Audio Industry
Friday, March 17 2006 @ 01:22 AM EST

One of Groklaw's resources is people. Our readers work in a broad variety of professions. I got an email from a guy who is an audio specialist, and when he read about the ODF story, about the need for longterm storage of documents, it immediately resonated with him, because part of his job for many years has been preserving audio. Here's a bit of that email:
In the past I have seen what happens as audio formats come and go, and how difficult it can sometimes be to get audio from older formats. The lesson that has been learned for both audio and video is that if you want be able to get your material back in the future you have to be able to maintain a viable playback system.

Many archives have stocks of older obsolete equipement (particularly for video formats) that are cannibalized to keep their main playback machines going. Digitization has not really been an option for much of this material as it is still not possible to fully digitize absolutely everything that is captured on audio tape or on film. It is getting closer though.

I see the same problem with the documents that get saved with the audio or video. How may of these were saved on old 5 1/4 inch floppy discs and filed with the audio tapes? It is pretty hard to find one of those in a PC nowadays.

The same could be said for software. You have some stuff in an old version of a word processor from the old Commodore computer. Not only can you not read the 5 1/4 disc, but even if you could you cannot read the proprietary binary format file with the information you want.

And that has only happened in my working life (and still have a good 20 years to go).

The risk in storing information in undocumented binary formats is that there really is no guarantee that a) the hardware will still be around; b) the software can be found -- a) because hardly anyone keeps old hardware, and b) because many licences do not allow you to run old versions of the software you have upgraded. On that point, I recall watching boxes of old 'obsolete' software being dumped some years ago.

So you can see that from my perspective as an audio specialist, storing things in formats where the hardware or software to access the material may not be available or may be uneconomic to maintain is a risky proposition.

I asked if he'd be willing to elaborate for Groklaw, and he graciously agreed.

************************************

Lessons Learned From the Audio Industry

~ by The Sound Man

My world is audio, and I have worked in it for 25 years. The audio industry's experience with audio formats may, I believe, serve as an object lesson for people who are undecided about privately documented storage formats.

From the invention of the phonograph in 1877 the audio industry has had publicly documented standards. As a result it has always been possible to play early domestic 78s on any model of gramophone since (although not all companies used quite the same speed or same groove pitch). Modern disc lathes use the same physical principles to create LPs and 45s. You can play virtually any 78, 45 or 33rpm discs on just about any turntable that has the correct speed settings and the right stylus for the size of the groove. Finding a turntable might be a problem though -- I have 78s that I inherited from my father which I cannot play, but it is easy to find information on how to play these discs.

The evolution of professional formats has been similar. It is possible, to take early tape recordings (made in the 40s and 50s) and play them back on today's equipment (with some electrical adjustments) and hear pretty much what was recorded back then. Reel to Reel tape recorders are getting harder to buy too, and I think only one or two companies make them now. They all used the same standards for speed and tape width.

As a result of easy access to past formats the CD market has been flooded with compilations of classic material recorded on both 78 and early tape.

The market for professional recorders is now full of computer based solutions, almost all of them are closed source applications. The first system I used was ProTools on the Macintosh IIfx (just like the one Douglas Adams owned). The inputs to the system were via balanced audio or AES (Audio Engineering Society) digital, both publicly documented AES standards. The file format used for audio was AIFF, another publicly documented standard.

Today I use SADiE on a PC running Windows. It uses the same AES standards for input and output as well as MADI, another public standard. It uses WAVE format for files on the PC platform, but it can read and write AIFF (even off Mac discs mounted on a PC), and some private formats too, all in the name of allowing people to exchange audio and use their tool of choice.

Even the most simple free software editors use these same formats. Open Source audio systems use the same standards. A person with a home studio can send me a WAV or AIFF file made on any system they choose for mastering on our SADiE system. It reads both formats. I can send either format to the record company rep (using a Mac) for approval. The same software is not needed.

There are now portable recorders based on computer technology. I can buy a compact flash card from anyone, record onto it in any recorder that uses that type of media, plug the card into any computer with a card reader, and import the audio files into virtually any audio application you care to name.

It is interesting that standards were often not kept a secret - they were submitted to standards bodies for everyone to benefit from.

Different companies have even agreed on standards to exchange edit decision list information. This allows you to complete work on one system, and import both the audio and the bulk of your editing work onto another.

When There Is No Happy Ending

Not all professional audio stories have a happy ending though.

DAT (Digital Audio Tape) was widely adopted by the professional community in the early 1990s, after failing as a domestic format (probably due to the Audio Home Recording Act of 1992).

They were very cheap compared with the professional machines in use at the time, and companies like Sony and Tascam turned their efforts to making 'professional' DAT units. Many of the original proprietary (one format per vendor) recording systems fell into disuse and were dropped, both by manufacturers and studios.

There were standards for the recording format and the tape, and it was possible (unless someone's machine was out of alignment) to interchange tapes and machines regardless of who made them.

Here begins the strife. Sony announced in November 2005 that they would stop shipping DAT machines at the end of that year. DAT is now obsolete. Cheaper and better ways had been found to do the job, ways that improve the flow of material through the use of open standards. Unlike records, where the format was around for a very long time and was adopted by all, the DAT was only adopted by a small section of the community, and now you cannot buy new machines.

In order to play back DATs from our archives, we purchased a few of the last machines. We have to maintain these as long as we still have DATs to transfer to some other format. It might be hard in the future to make a DAT machine, but at least there is documentation so it could be done. So here we have a recent, openly documented standard that is now defunct, and in a short time many folks will no longer be able to access the content on those tapes because they no longer possess the hardware to play them or have the ability to make the hardware to play them.

Strangely, DAT has another life for data backup, using the ISO DDS public standard.

I should note that most DAT audio players can output a signal that complies with an AES standard, and this can be plugged into most digital mixers or recorded back into virtually any piece of gear today, whether it be a high end professional (proprietary) hard disk recorder, or a Pentium 4 in someone's bedroom with a semi-pro sound card.

In my world standards have allowed professionals to be creative using the the tools they choose, to share material, and to pass work from person to person without much fuss. Manufacturers have worked together to allow this. The audio community has demanded it.

No one wants to be incompatible in our industry. Investments in existing equipment can run into millions of dollars. Who would dare use a non standard format? Such a machine would be of no use in our studio. It would not work with the equipment we already have.

I should note that there are exceptions where the product is in a small niche, or the functionality provided is unique. Sometimes competitors adopt the standard set by the first to market. These standards are usually made available to all. If the industry finds the new standards useful then everyone wins. One good example is the ADAT lightpipe optical format for transferring multiple channels of digital audio. Developed by Alesis, it turned out to be such a useful format that many other manufacturers used it as a secondary or even primary format on their hardware.

I can safely say that many audio professionals would rather own a system that supported publicly documented standards. Companies can come and go, software applications can change with the passage of time and fashion. But the work created on these systems must remain. It is a creative legacy for the future. The future livelihood of thousands of companies depends on continued easy access to material produced today and in the past. As mentioned already, there is a thriving market for re-releases of old recordings.

Thinking back to the DAT experience, I have audio files that were recorded 10 years ago on a Macintosh. They have been moved between systems and storage formats many times. Over the years I have been able to open the audio files in many different applications, some of which are now defunct. I can copy the files as many times as a want and send it to whoever I want.

What about the software we use to store information? These days most audio ends up on a hard disc or data tape of some sort. I like to store any documentation with the audio. I used to print it out and put it in the tape box. Anyone can access a printed page (I was reminded of this after seeing in London an exhibition of fragments of the Bible dating from the first few centuries after Christ's death. Anyone can still read them.)

If I use DLT I know I can get the audio back for two generations of new hardware. I can decide to use this format or choose another. The manufacturer tells me in advance what I need to know.

What About Our Documents?

But what about my documents? The word processor I use saves files in a binary format which is not publicly documented. It was not always the dominant software in this class (remember Wordperfect?). It may not be in the future. What will be the future, like the DAT example above, of files saved in these formats when they become obsolete? Someone could build a DAT player in 10 years time given time and money -- they just follow the standard, but what about an obsolete, privately documented file format?

I have on my hard disk archived files created in Lotus 123. I can open them today, but for how much longer? I could convert them of course, but what if they were filed away on a 5 ΒΌ floppy in our basement, long forgotten, and then suddenly were needed?

I've said that it can be impractical to recover audio from obsolete formats, even with public documentation of how to do so. In the future it may be impractical to recover documents as well. It is just a matter of time. Imagine if all the literature of antiquity was encrypted in some way.

Privately documented standards put my creative work at risk. They could stop me retrieving work from the past, they limit my creativity now, they limit my freedom to work or collaborate with others (or make it harder), and they put at risk the recovery of my work in the future. They lock me in to software and hardware that may not be appropriate next week, let alone in five years.

As well as helping industry, standards also benefit the market because they provide security to the end consumer. A good example of consumer security is the Compact Disc.

Red Book audio CDs all use the same format (developed by Philips and Sony). Any disc that meets the Red Book standard will play in any player that also meets the standard. I can buy a disc in London that was made in Europe, and play it back in my home country on a player made in China. It just works. Some copy protected discs will not play in some new gear. They do not meet the standard. I avoid these if I can.

As an aside, current copy protection stops people ripping CDs at faster than real time using a computer. However the protection schemes only address today's technology -- they do not stop people playing CDs in a normal player and re-recording them in real time to another format, which is just how it used to be done in the days of LPs and cassettes.

Every DVD player on the planet (as far as I am aware) will play Red Book compact discs, and so will nearly every PC with a CD-ROM drive. My Sony DVD player bought last year will play a CD I bought in 1985 (Dire Straits' Brothers in Arms). My new laptop's DVD writer will play it too.

I, as a consumer, can buy an audio CD secure in the knowledge that it won't be obsolete in the near future, and that my investment is protected.

In my professional world, open standards ensure that I am free to provide creative services to my clients, and collaborate with anyone I choose.

What are you free to do in your world? Today? Tomorrow? For how long?

Please think about it.


Note: Links to commercial sites in this article and references to specific products and manufacturers are for illustrative purposes only, and are not an indication of endorsement of these products or sites.

Copyright (c) 2006 Groklaw/The Sound Man. All rights reserved.


  


Lessons on Data Preservation From the Audio Industry | 184 comments | Create New Account
Comments belong to whoever posts them. Please notify us of inappropriate comments.
Corrections Here!
Authored by: Anonymous on Friday, March 17 2006 @ 01:31 AM EST
Not that there are ever very many.

[ Reply to This | # ]

Corrections here please.
Authored by: Fractalman on Friday, March 17 2006 @ 01:37 AM EST
So PJ can find them. :)

[ Reply to This | # ]

Off topic here please.
Authored by: Fractalman on Friday, March 17 2006 @ 01:39 AM EST
And remember to make the links clickable using html format setting. Thank you.

[ Reply to This | # ]

Lessons on Data Preservation From the Audio Industry
Authored by: Anonymous on Friday, March 17 2006 @ 03:10 AM EST
When storing in an undocumented binary format, one doesn't just need the hardware to read the disks, but also the hardware that can execute the software (if available) that can read the binary format.

Although it now all of todays PC's are still compatible with the old 8088 IBM compatible, this may well change sometime.

[ Reply to This | # ]

WordPerfect ...
Authored by: Wol on Friday, March 17 2006 @ 03:12 AM EST
Switch off if you don't want a plug ...

I've just bought WP Home Office 12 - which uses the SAME DOCUMENTED format that
WP6 used back in 1994.

And - something I didn't know until relatively recently - guess who was
responsible for the change in file formats between v5 and v6? Yes, you guessed
it, MICROSOFT! I don't understand it, but it seems like Win95 was designed in
such a way as to deliberately be incompatible with the v5 file format (I said I
don't understand it ...)

So, if it weren't for our favourite whipping boy, it seems likely that WP would
still be on the same file format it first introduced maybe 20 years ago ...

Cheers,
Wol

[ Reply to This | # ]

As long as it's only proprietary...
Authored by: Anonymous on Friday, March 17 2006 @ 03:16 AM EST
...an old Lotus123 file poses not that much of a problem.

I might be able to find an application which still reads
Lotus123. Worst case, it would take the work to write a
converter on my own.

Trouble sets in with all things DRM. It is not only
proprietary, it is *designed* to protect its contents
against you (the customer). To allow access only under
certain restrictive conditions.

All DRM music you buy can very well be totally useless
in 5 years. For example, because the music provider goes
out of business. And upgrades for your DRM-hardened music
player are not longer available. As soon it doesn't run on
your new hardware, all of your files are toast.

Therefore I will never accept to buy DRM-encoded stuff.
MP3 "plays for sure" without having a sticker on it saying
so.

[ Reply to This | # ]

Isn't this what '1984' was all about
Authored by: Winter on Friday, March 17 2006 @ 03:55 AM EST

George Orwell's book was about controlling thinking by controlling what you can and cannot communicate.

Changing history by destroying (rewriting) history was a large part of that.

And in real life. Remember the famous photograph of Staling and his comrades. ( link)

The aim of the current drive for DRM too is destroying history. Not so much for politcal aims, but for making "content" scarce. There was a side-bar link on GL about the curious difference between Fashion and Music/Film. The fashion industry survived by exhuberance. The more designs, the better. The music and film industry try to survive on artificial scarcity.

Music is not scarce. Idols has shown to those who don't knew it already that there are thousands of musicians that are willing to perform. The Music industry scout and contract as many of them as possible. And then use the contracts to forbid them to perform. Only a few willing artists are allowed to produce records and be broadcast.

This strategy still leaves a "history" loophole out of scarcity. People can still play old music instead of buying new tracks. DRM closes that loophole by denying people the right to play old music.

I know I am paranoid. But am I paranoid enough?

---
Revenge, Justice, Security, and Revenge, chose any two.

[ Reply to This | # ]

Data Preservation - the costs
Authored by: The Mad Hatter r on Friday, March 17 2006 @ 04:31 AM EST


As has already been pointed out binary file formats can be decoded, in some
cases quite easily. However it takes time, and money, and equipment in the case
of obsolete storage technologies. DRM technologies can be reveres engineered
too.

And then there's the legal issues. I don't live in the States so the DMCA
doesn't apply to things I do at home. However by engaging in reverse engineering
a file format that has DRM (and note that I may not even know it was DRM'd) I
could find myself in legal trouble if I ever wanted to visit the States.

Legislation got us into this - we should use legislation to get us out. What if
every "DRM'd" file was required to be stored in the library of
Congress in an open format without DRM? The file would be held in Bond, for a
number of years to be determined by the owning companie's willingness to pay.

We want to set the payment high enough that keeping a non-performing product
protected would not be economical, but low enough so that it would allow even
garage bands to take part.

Assume that the archive copy does not have to be filed until 1 year after
release of the product (but immediately in case of bankruptcy or Chapter 11
filing by the filer). This is enough time for the filer to determine whether to
buy 5, 10, or 20 year lock in. Extensions could also be purchased. During that
period the non-drm copy would be in escrow (and note that one of the purposes of
the payments is to cover translation into future formats, and storage on better
(longer lasting media) as they become available).

Not that this DOES not effect copyright law. It only assures that the Library of
Congress (which according to my all too falible memory retains EVERYTHING) has
an unencumbered copy of the work.

If something like this isn't done, we stand a chance of loosing large amounts of
cultural heritage. Some of that heritage may not be spectacularly wonderfull.
Some of it may be fantastic. The future will have to be the judge of that.


---
Wayne

http://urbanterrorist.blogspot.com/

[ Reply to This | # ]

Lessons on Data Preservation From the Audio Industry
Authored by: PeterBlue on Friday, March 17 2006 @ 04:40 AM EST
We are starting a web site that will allow people to upload / download their audio files (that THEY hold the copyrights to !!) and I'm wondering which format is best. I guess it would be better to use an open format to store audio files like these :-
  • MP3 OK for general stuff
  • OGG Vorbiss I haven't used this much
  • FLAC Free Lossless Audio Codec
What are the benefits of each format ? does anyone know ?

[ Reply to This | # ]

BBC Micro Doomsday Project - nearly unusable after 20 years
Authored by: TiddlyPom on Friday, March 17 2006 @ 05:02 AM EST
The BBC Micro Doomsday Project is a good example of why not to use proprietary software or especially hardware. This was a project involving schools all over the United Kingdom to 'preserve' a snapshot of what life was like in 1986 (which was the 900th anniversary of the original Doomsday Book - a census done under the reign of William and Conqueror in 1086).

Of course the original Doomsday Book is available for everybody to view thanks to the above website whereas the more recent version is very difficult to view (requires BBC Micro emulation software and appropriate archaic software licences).

What should be done (long term) of course is to convert this archive into well documented public domain formats such as ODF, PDF, OGG (audio) and OGG or Dirac (video) so that we have a fighting chance of being able to read it in the dim and distant future.

The data should also (of course) be in the public domain as it was provided by the British public!

The most important thing of all is that there is no DRM on this data or associated software. How short sighted the movie and audio industries are :(

---
"There is no spoon?"
"Then you will see that it is not the spoon that bends, it is only yourself."

[ Reply to This | # ]

A little OT: reading old vinyl discs
Authored by: Anonymous on Friday, March 17 2006 @ 05:33 AM EST
1. Scan them with a high resolution scanner

2. Run image through some "groove edge" detection program

3. The edges of the groove can be transformed into sound by software.

Problem solved.

And yes, I know it sounds easier than it actually is.

Leif Nielsen

[ Reply to This | # ]

Emulators.
Authored by: Anonymous on Friday, March 17 2006 @ 05:53 AM EST
I'm pretty happy with emulation.

In the case of open source apps etc, pretty much everything can be made portable
(not only did I run many gpl apps when I had windows, but now I'm running 99% of
those apps in 64-bit under ubuntu dapper).

For things that are proprietary, especially with someone sitting on the
"intellectual property", it will likely never be opened.

The options are reverse engineering, or running it via an emulator.

Many of my favorite DOS games (the oldest being from 1982, and guess what... I
think Vivendi is sitting on the rights...whoever has what Sierra Online once
had...).

Anyway, without it, many of these games would be doomed to dissapering... As for
the media... Well, let's just say I'm glad some people took the time to put
online... I only copied a few of my games from 5 1/4 to 3.5 floppies...

I guess a picture is worth a thousand words...

Here is a little screencap, to illustrate how the software lives on ;)

http://img211.imageshack.us/my.php?image=screenshotkq5zr.png

Works as a remember, besides the pc speaker being emulated via real speakers...

[ Reply to This | # ]

Rosetta Stone- any similarities?
Authored by: Anonymous on Friday, March 17 2006 @ 07:59 AM EST
we (well scholars anyway) can only read ancient Egyptian hieroglyphs because of
the existence of the stone and the passages being written in Greek too. How
many ancient languages died completely because of the absence of such a key?

I can see any other similarities to the current debates from history where power
has been maintained by an elite by restricting access to learning
(reading/writing).
----------------------------------------------------
All we learn from history is that no-one learns from history and humans repeat
the same mistakes. Don't know who said it first, and I've probably mis-quoted.

[ Reply to This | # ]

Lessons on Data Preservation
Authored by: elronxenu on Friday, March 17 2006 @ 08:00 AM EST
Very well written article, thank you!

Although I do think that the first step in preserving analogue audio should be to digitise it. Once digitised you can prevent generational degradation by using a lossless audio format for your master.

I waited too many years before preserving my old TRS-80 source code from the late 1980s. Over the years I had used 5.25" TRS-80 diskettes, then 5.25" MINIX (and DOS) diskettes and quarter-inch tape (as well as large numbers of 3.5" diskettes) before CDR became widespread. By the time I got around to thinking about recovering all that 1980s source code I had gone through essentially three new generations of hardware.

I got it done in the end, well 98% of it anyway, but I spent hundreds of hours working on it. The main breakthrough was that I kept some authentic TRS-80 disk drives and I was able to plug them into my linux computer and read them using the XTRS trs-80 emulator. From there I could transfer them to linux. Now I keep them on spinning storage - RAID 1 - and I released them to the world on my website www.nick-andrew.net and I also sent them to Ira Goldklang who runs the wonderful resource www.trs-80.com.

So I have saved my oldest code, and I was able to extract all the files from my MINIX and DOS diskettes very easily by comparison, but I still have quite a number of 3.5" diskettes which have not been copied to hard disk.

[ Reply to This | # ]

This issue plagues electronic design
Authored by: Anonymous on Friday, March 17 2006 @ 09:03 AM EST
In electronic design (not necessarily chip design, although that has it's own
issues), there are three major processes (in terms of the actual design - there
are more for a complete system)

Note - EDA = Electronic Design Automation

1. Definition. This uses, of course, a good old fashioned word processor, so the
same issues as others face are here.

2. Schematic capture. Although there are open alternatives out there (KiCad
comes to mind), they are not yet robust enough for dense challenging boards. Now
we get into a real mess. The various competing schematic capture players all
have their own proprietary formats.
This becomes a real issue for old designs. Indeed, I have floppies with old
designs on them along with the executables I used to create them, because
nothing now can read those older designs (and they aren't that old - the oldest
is 17 years old in electronic format). Up to the 80s, most design was done by
hand drawings - oh for a scanning tool with device recognition to scan those old
designs in.
So we are stuck with old electronic format designs we can't print (unless one
has the original executable and sometimes dongle) and newer ones that I can not
migrate to newer/better packages.
If you think standard document lockin is bad, look at EDA.

2. PCB layout
Again, we have a number of players with proprietary and non-transferable
formats. Older designs were laid out by hand on Mylar with black tape of varying
widths - I would love to be able to scan those in too.

3. Board fabrication. Here we are in much better shape. Everybody that
fabricates PCBs uses 'gerbers' (RS-274x format) that is public. Why? because
board fabricators insisted on a standard so they could subcontract jobs between
themselves when loading is high (quite a common occurence).

There are working groups trying to specify a reasonable set of document formats
for layout and schematics, but there's no real agreement yet, and designers are
loath to change their tools, for very good reasons.

Design tools such as schematic capture and layout have *very high* learning
curves, simply because of the features available, which helps EDA vendors lock
existing customers into their tools.

Although some things are not too bad (there are standard netlist formats for
example), the specifics of a rendered circuit board in a tool that can update it
are most commonly proprietary. When the board in questions may have literally
thousands of tracks and tens of thousands of nodes with tight constraints, it is
not feasible to redo unless you have a couple of months to spare.
(The densest board I did had only connectors, but over 5000 *extremely* high
speed nets (5Gb/s), 28 layers and a host of other connectivity. Don't try that
without high end tools).

I do not object to paying for high end tools, but I do object to my data being
held hostage for lockin. Of course, the original vendors simply used a data
format that worked for them, but now the data format is seen as a way to
maintain sales (well, duh).

So when we get a usable data format and high end tools support it, we'll be able
to migrate across tools *to the best tool for the job* - there is no perfect
tool in this area (as is true everywhere, really) so it would be nice. It would
also ease data extraction for things such as SPICE simulations (signal
analysis).

If we had the equivalent of ODF, then the gorillas of the market would feel
devastated, but the users would be happier.

PeteS
[not logged in]

[ Reply to This | # ]

Data/information storage.
Authored by: rfrazier on Friday, March 17 2006 @ 09:47 AM EST

Scholars, espcially ones who have had dealings with ancient texts, have long worried about archiving and maintaining data and information. I never used Wordstar, Wordperfect, MSWord, or any word processing program, for this very reason. (Not that I'm all that confident that folks in 100 years time will want to read anything I've written.)

I moved from vi and roff to emacs and LaTeX in the late 80s (switching back to vi(m) in the late 90s), and have used LaTeX ever since. My dissertation written using LaTeX, submitted in 1990, formats now exactly as it did then.

If you want to see how some scholars are dealing with this problem, take a look at TEI.

Best wishes,
Bob

[ Reply to This | # ]

All rights reserved?
Authored by: Anonymous on Friday, March 17 2006 @ 10:24 AM EST
That's an excellent essay. I notice it ends with a copyright notice and the phrase "All rights reservered." It's not that I covet the content, but this story needs to get out as widely as possible. One of the CC share-alike licenses would have been nice.

[ Reply to This | # ]

Lessons on Data Preservation From the Audio Industry
Authored by: erikm on Friday, March 17 2006 @ 11:06 AM EST

Interesting story and oh so true. I work for a datarecovery company and we see these kind of things every day:

A customer brings in a couple of old backup tapes and wants them converted to CDROM. If they're lucky it's QIC, DDS, DLT or Exabyte tape. Those drives are still supported with modern hardware and operating systems, all thanks to the open SCSI standard for interfacing tape devices. Next is the file format used on the tape. Proprietary formats take some time to reverse engineer (a couple of days to several weeks), but if the data is compressed using a proprietary compression scheme or (worse) encrypted the chances for succesful recovery get very low. Again, for best results, the customer should have used open standards like tar, cpio, or dump. We have succesfully recovered data from 20 year old QIC tapes written with Unix tar (minus errors due to worn out tapes), but failed with a younger system that used both proprietary hardware and a proprietary tape format.

The same holds for file formats: we can try to reverse engineer them, but with proprietary formats we only have limited success. Recovering a mail server using the de facto standard mbox or maildir format is relatively easy, but recovering a mail server that uses some kind of proprietary format for storing messages gets quite hairy.

I'm also a photographer. One of the main reasons I still use film (mostly slide film, but also negatives) is that I know I can still view the pictures I take 40 years from now (one of the other reasons is that I love Fuji Velvia ;-) ). My dad has quite an extensive slide archive and the pictures he took during his trip to Norway in 1965 still look very good. I don't think I will be able to view the pictures I take today with a digital camera in the year 2047. Not only will the media (cdrom, compact flash, IDE drives) we know today be obsolete, but it is also questionable if those media can hold the information for such a long time.

If you're using a digital camera, be sure that you store the images in an open format so you are future proof. JPEG is just fine, but due to the lossy compression you loose image information. RAW images contain the image data as recorded by the image sensor, but unfortunately every camera has its own proprietary RAW format. Luckily photographers also realise the drawbacks of proprietary formats and started the OpenRAW Working Group to motivate camera makers to openly document their individual RAW formats. Join them if you care about your photo's.

Erik

PS: If you care about your photo's make sure you check your media every two year and copy all of your pictures to a new CDROM (or DVD or whatever you use). The long term quality of backup media is still unknown and in this way you make sure you can enjoy your pictures in the future. This continuous copying of photo's makes digital photography more expensive than camera manufacturers want you to believe. It's an often overlooked cost of digital photography.

[ Reply to This | # ]

Standards aside...
Authored by: philc on Friday, March 17 2006 @ 11:46 AM EST
... are there not copyright issues involved? The author mentioned preserving
copyrighted creative work by copying from one media to another. There is much
discussion on the legality of making a backup copy. How does this fit in with
moving a copyrighted media collection to a new format? Seems like there is a
major push to make this illegal.

Standards are the way to go in virtually everything that large groups of people
do and use. We humans, by nature, come up with "standard" ways of
living. Standards evolve out of how we move forward as a society.

Monopolies (especially Microsoft) are again blocking normal evolution of
standards. Remind me again why society feels the necessity to establish and
protect monopolies through the patent law? And also permit monopolies such as
Microsoft to exist?

I would love to see a law that would force all standards to be open and
available to the public for free forever. I would also like to see a law that
reuqires everything that can be stored in multiple formats to be stored in
standard formats. Eventhough some individuals benefit from proprietary formats,
society as a whole does not. Society should protect itself over sanctioning the
activity of a few individuals.

[ Reply to This | # ]

DIVX and Resumes
Authored by: lordshipmayhem on Friday, March 17 2006 @ 12:31 PM EST
I am reminded of two incidents:

1) A friend of mine needed to have his resume reprinted. He was no computer
genius - a fact he was quite well aware of - so he asked me if I could help.

It turned out the resume was stored by his (then) wife on an 8-inch diskette and
had been done on a Wang word processor. My largest floppy drive was five and a
quarter.

He had to recreate his 15-year-old resume completely.

2) DIVX - anybody remember this proprietary videodisk format, a competitor to
early DVD's? Died a painful death, despite Circuit City's attempts to get us
all to use it. A few months back I recall hearing that the computer that the
DIVX player had to "phone home" to was finally shut down. Without
this computer connection, DIVX players will not play any DIVX disks. As a
result all DIVX disks are now nice, shiny coasters, ideal for resting your
coffee cup on while you surf the net. Any movies on them alone would be lost to
history until someone hacked a solution to the problem. (Fortunately the titles
on DIVX (a) were mostly forgettable and (b) are mostly available on other
formats.)

[ Reply to This | # ]

The Social Issues
Authored by: Anonymous on Friday, March 17 2006 @ 12:32 PM EST
First, thanks for the article, it is clear and well written.

I see two sound reasons for interoperable standards in the article; published
and open standards, and new equipment must be compatible with existing equipment
in the studio. There is also an expectation by the user to just replace single
parts, not the entire equipment suite.

In the computer document arena, users do not have an articulated expectation of
being able to use old documents. That is being cured, slowly, by this
discussion and ODF in general.

The real problem is that software manufacturers have convinced us that the
entire suite must be replaced, at the same time! Without the expectation that
old/open formats need to be respected, and without the need to make parts
compatable with existing applications, there is no customer driven need to
maintain any sort of compatability.

To fix the problem, we can pound the backwards compatability table, that has
limited traction. Professions, like law, already eschew the dominant player's
product, for that very reason. We might start demanding that applications be
upgraded piecemeal. That would open a market for drop in components because
interoperablilty would then become crucial, and standard.

Can the customer base influence a monopolist to change? Can we get a critical
mass of customers to demand better service? Are we forever trapped by
"Good Enough"?

-- Alma

[ Reply to This | # ]

A lesson in software
Authored by: Anonymous on Friday, March 17 2006 @ 12:44 PM EST
I'm a computer programmer from the 70s, I worked on computers called PDP and
Vax, and even Sentry70, Prime, and Cyber.

I have printouts of programs and things I wrote back in the 70s, they are all
readable now.

Some of the stuff was on magnetic tape. All of that information is lost. There
isn't anyone in the world that can still read it.

Thankfully I did transfer some of it to a computer called the commodore 64 at
one time, and from there to the PC, so generation after generation it got
preserved.

But so what? Nothing can read the code, or use it.
There's no way to translate it or convert it.

There's no computer in the world today that can understand those programming
languages in any form.

So what do I do now?

The language is called "Fortran IV" by the way, and it is totally
incompatible with Fortran/77 or "f2c" gnu-fortran or any modern
system.

Even semi-modern vax systems cannot understand the code, because they all use
the same fortran/77 as modern PCs.

I even have an OLD vax, but it is even too recent to handle it. and all vaxen
(the proper name for plural devices) that are old enough don't exist any more.

So what to do? toss it? After all this effort, it'd be nice to find a way...

[ Reply to This | # ]

Lessons on Data Preservation From the Audio Industry
Authored by: Anonymous on Friday, March 17 2006 @ 02:05 PM EST
8 or so years ago I call on a major record label. I worked for a computer
services vendor. We were trying to sell the digitizing of old maters to
preserve the audio. The coating was literally flaking off master from Louis
Armstrong and others recorded in the '30s. The value of making a copy that is
in no way inferior to the original (digital) -vs- the losses from analog copies
were lost to the inability to fund the project.

I often wonder what the status of some of the great recording are?

To me, while there may be angst at the interpreation of a format (MP3 in 50
years), a lost master is a lost master.

[ Reply to This | # ]

Changing Trains at Wigan:
Authored by: Alan(UK) on Friday, March 17 2006 @ 06:16 PM EST
Digital Preservation and the Future of Scholarship.

Interesting reading.

[ Reply to This | # ]

source recordings & lossless copies
Authored by: snowmannishboy on Friday, March 24 2006 @ 02:44 AM EST
hey Sound Man,

i've had quite a bit of experience, like your own, in dealing with
"obsolete" audio media/formats/machines. everything from edison
cylinders to rec-o-cut transcription shellac to dbx compressed quadraphonic
& binaural lp records.

it's awesome (in every sense of the word) that i can listen to sounds stored on
100+ year old media. but it makes me sick to look at my stack of live recordings
on philips digital compact cassette tapes.

but now i'm stuck with no confidence in _any_ digital format or media.

what are you (or anyone else) doing to ensure that you can enjoy the SACD and
DVD-A recordings 50 years from now?

[ Reply to This | # ]

Groklaw © Copyright 2003-2013 Pamela Jones.
All trademarks and copyrights on this page are owned by their respective owners.
Comments are owned by the individual posters.

PJ's articles are licensed under a Creative Commons License. ( Details )