JUST SOLVE THE PROBLEM
"Just Solve the Problem" is 30 days dedicated to solving a problem, that just needs lots of bodies thrown at it. Jason Scott announced this idea in this weblog entry.
This is the 2012 Problem described from the entry:
In the last couple centuries, we’ve created a number of self-encapsulated data sets, or “files”. Be they letters, programs, tapes, stamped foil, piano rolls, you name it. And while many of those data sets are self- evident, a fuck-ton are not. They’re obscure. They’re weird. And worst of all, many of them are the vital link to scores of historical information.
Everyone knows this problem. It’s why old novelists cry they can’t pull their first novel out of Wordperfect. It’s why someone who used U-matic tapes to record the first meetings of a famous protest group goes “oh well”. It’s why, in all things, someone looks at anything older than five years, and goes “bye”, figuring there’s nothing they can do.
And I’ve had to listen to the mewings about this problem for at least 20 years now, in various forms. A lot. And then the person lights up about maybe solving this problem, and then dims and says “well, we can’t really solve the problem”. Because they know – it’d take an army of people to do it.
Let’s make that goddamned army.
Planning and Discussion
This page is meant to be an initial scratchpad of related discussion and planning for the November project. The goal is that when the people imbued with the urge to be part of the project arrive on November 1st, a clear set of procedures and opportunities to contribute is laid out.
What This Project is Not
- This is not a "sprung from the forehead of Zeus" attempt to completely re-boot the process of enumerating the many formats out there. Much work has been done and there is much to share.
- This will not be on the Archive Team wiki - this page is just being used here because it's convenient.
What Formats We're Talking About
Obviously, "file formats", as in digital files, are what most people think of, but it would be really nice to expand out to every quantifiable self-contained format for information. It might get a little hairy, but in the name of being expansive and going places that politics and mandates might not, having information on everything from punch cards and piano rolls to vinyl records and barcodes would be a nice addition. They also challenge the individual pages of the Wiki to be flexible and comprehensive, without getting held up on things like trying to indicate the storage medium or to call it a "file". Let's include the golden record that went with voyager! Let's include morse code! Let's include known USB sticks!
For a given item, with "item" being an individual format, the priority should be:
- Enumeration (indicating the format exists)
- Examples of this format in use (either actual files or renderings of the format)
- Documentation about that format or its conversion (with website or wayback links)
- Links to known programs, utilities and source code that interprets this format
So, one thing to keep in the back of our minds (way, way back, like over near the powdered milk) is the fact that established groups and individuals are going to have some resistance as to the use of such an endeavor. The most efficient way to do this is to keep track of what whiners complain that we will not prioritize and consider, and where possible, prioritize and consider. That's it! Action quiets whiners. Response whining does not.
Discussions or Essays about Just Solve the Problem 2012
- http://ascii.textfiles.com/archives/3645 - Jason Scott's Original Post
- http://unsustainableideas.wordpress.com/2012/07/03/solve-file-format-problem/ - A Call to Arms, by Unsustainable Ideas
- http://unsustainableideas.wordpress.com/2012/07/04/the-solution-is-42-what-was-the-problem/ The Answer is 42 - What was the question?
- http://openplanetsfoundation.org/blogs/2012-07-04-answer-our-digital-preservation-needs - A rant about format registries by Paul Wheatley
- http://openplanetsfoundation.org/blogs/2012-07-06-biodiversity-and-registry-ecosystem - Andy Jackson on how this initiative fits into the broader picture of collecting, managing and utilising file format information
List of Places Keeping Track of Formats
- http://wotsit.org/ (presently broken)
- http://www.nationalarchives.gov.uk/PRONOM/ Version-level format information on hundreds of formats, can be downloaded as XML (ask me for details AndyJackson 15:07, 2 July 2012 (EDT))
- http://gitorious.org/re-lab Tools & specs from re-lab- graphics formats, and office formats
- http://www-mmsp.ece.mcgill.ca/documents/AudioFormats/index.html - Documentation, details, specs, and samples for a handful of audio formats
- http://wiki.xentax.com/index.php/Game_File_Format_Central This wiki is the home of the most game (archive) file format knowledge in the world.
Other Useful Materials and Services
- http://www.nsrl.nist.gov/ National Software Reference Library (see also http://blogs.loc.gov/digitalpreservation/2012/05/life-saving-the-national-software-reference-library/)
- http://www.openplanetsfoundation.org/software/fido Format Identification for Digital Objects
- https://github.com/usnationalarchives/File-Analyzer File Analyzer
- http://sk1project.org/modules.php?name=products&product=uniconvertor UniConvertor is a universal vector graphics translator. It is a command line tool which uses sK1 object model to convert one format to another
- File Formats for Popular Personal Computer Software: A Programmer's Reference http://www.amazon.com/Formats-Popular-Personal-Computer-Software/dp/0471836710/ref=la_B001KE1CFC_1_1 - @KevinSavetz owns and can digitize this if needed
- More File Formats for Popular PC Software: A Programmer's Reference http://www.amazon.com/More-File-Formats-Popular-Software/dp/0471850772/ref=la_B001KE1CFC_1_2 - @KevinSavetz owns and can digitize this if needed
- http://wiki.opf-labs.org/display/SPR/Digital+Preservation+Tools Wiki list of digital preservation tool lists
- http://wiki.opf-labs.org/display/REQ/Digital+Preservation+and+Data+Curation+Requirements+and+Solutions Digital Preservation and Data Curation Requirements and Solutions. Not classified by formats, but by datasets, issues and solutions.
- http://fileformats.wordpress.com/ - The File Formats Blog