Archive Team hostname file
Dan Tobias (Talk | contribs) (Created page with "{{FormatInfo |formattype=electronic |subcat=Archiving |extensions={{ext|hostnames}} }} When the [http://www.archiveteam.org/ Archive Team] is preparing to archive data from a...") |
Dan Tobias (Talk | contribs) (Add cat) |
||
Line 12: | Line 12: | ||
The file is saved with a '''.hostnames''' extension, and a filename that is a number one less than the first serial numbered line in the file (e.g., ''2000000.hostnames''). It is then compressed in [[gzip]] format for upload/download. | The file is saved with a '''.hostnames''' extension, and a filename that is a number one less than the first serial numbered line in the file (e.g., ''2000000.hostnames''). It is then compressed in [[gzip]] format for upload/download. | ||
+ | |||
+ | [[Category:Metadata]] |
Latest revision as of 06:42, 7 March 2013
When the Archive Team is preparing to archive data from a multi-user, multi-hostname site that's about to be terminated (e.g., Posterous), often an early step will be to obtain (through automated scripted access) a list of the hostnames used on that site, so that in a later stage of archiving, the web data in those hostnames can be retrieved.
The format is simple: plain ASCII, Unix-style line breaks (LF, hex 0A, as newline character), one hostname per line. Each line has a sequential serial number followed by a tab (09) and then the hostname:
2000001 dwellz.posterous.com
The file is saved with a .hostnames extension, and a filename that is a number one less than the first serial numbered line in the file (e.g., 2000000.hostnames). It is then compressed in gzip format for upload/download.