ZIP
- Not to be confused with Zip disk, an unrelated disk cartridge unit.
ZIP is one of the most popular file compression formats. It was created in 1989 as the native format of the PKZIP program, which was introduced by Phil Katz in the wake of a lawsuit (which he lost) against him by the makers of the then-popular ARC program (and file format) for copyright and trademark infringement in an earlier program PKARC which had been file-compatible with ARC. This resulted in Katz creating a new file format, which rapidly overtook ARC in popularity (to a large extent because of BBS sysops, then the primary users of such compression, resenting the lawsuit). Many programs have been released for a variety of operating systems to compress and decompress ZIP files, and native support for the format is built into several popular operating systems.
ZIP implementations vary in their support for features in the specification from PKWARE[1], particularly features added since version 2 (1993), some of which are protected by patents and require licensing. Many implementations limit the use of compression to the DEFLATE algorithm, introduced with version 2. Extensions incorporated into the specification that have been widely adopted are: long filenames; large files (using a technique known as ZIP64); and filenames in UTF-8. In 2011 work began on an interoperable subset of the latest APPNOTE.TXT with the intention of publication as ISO/IEC 21320-1, Document Container File -- Part 1: Core. As of November 2012, a discussion draft is available[2]. Designed to promote interoperable implementations, the draft ISO/IEC 21320-1 prohibits compression other than using DEFLATE, segmentation or multiple volumes, and features that are subject to patents.
While .zip is the usual file extension, ZIP-formatted files can be found with many other extensions since a number of other file formats use ZIP compression but store their files in application-specific extensions. See Category:ZIP based file formats for a list of such formats.
Contents |
See also
Identification
The byte sequence 'P' 'K' 0x05 0x06
(the "end of central directory signature") appears in the last 22 bytes of the file.
Most ZIP files begin with 'P' 'K' 0x03 0x04
(and some ZIP-based formats are required to), but self-extracting ZIP files do not.
Compression
Each file in a ZIP file is compressed one of a number of compression algorithms. Only compression types 0 (uncompressed) and 8 (DEFLATE) are likely to be seen in modern portable ZIP files. In old ZIP files, types 1 (Shrink) and 6 (Implode) are common.
Code | Compression scheme |
---|---|
0 | Uncompressed |
1 | Shrink |
2–5 | Reduce |
6 | Implode (Shannon–Fano) |
8 | DEFLATE |
9 | Deflate64 |
10 | PKWARE Data Compression Library Imploding (old IBM TERSE) |
12 | Bzip2 |
14 | LZMA (EFS) |
18 | IBM TERSE (new) |
19 | IBM LZ77 z Architecture (PFS) |
97 | WavPack |
98 | PPMd version I, Rev 1 |
Specifications
- APPNOTE from PKWARE (latest version of formal spec)
- APPNOTE Archives from PKWARE (selected versions all the way back to 1.0)
- An early version of APPNOTE (not numbered or dated); perhaps the very first
- IANA registration for application/zip in July 1993 (corresponds to version 2 of APPNOTE.TXT)
- Documentation from Info-ZIP (Includes Info-ZIP variants on APPNOTE.TXT dated from 1996 to 2004, specifications used as the basis for various open-source tools)
- November 2012 working draft of ISO/IEC WD 21320-1, Document Container File -- Part 1: Core Intended as restricted subset of APPNOTE 6.3.3 designed to promote interoperability.
- February 2013 committee draft of ISO/IEC CD 21320-1, Document Container File -- Part 1: Core Essentially the same as November 2012 working draft except that it mandates use of the UTF-8 indicator.
- Archive format info, including ZIP (from 1989, when ZIP was newly released)
- ZIP file header format (among other archive types)
- TorrentZip
- Note that in general there is no official file name encoding for ZIP files, and non ASCII filenames are not generally well supported. The original implementation specified IBM Code Page 437 for filenames, but as many characters cannot be expressed in that encoding the filename bytes have often interpreted using the current system codepage (implementation dependant behaviour). There is apparently a flag to specify UTF-8 is the encoding, but it is not supported in all major clients (e.g. Windows Explorer).
Software
- Info-ZIP: Zip, UnZip
- 7-Zip
- zziplib
- Archive::ZZip: Perl bindings for zziplib
- zlib - The zlib library does not support ZIP format, but it is distributed with "minizip" code that supports most ZIP files.
- libzip - Uses zlib.
- libarchive - Uses zlib.
- miniz
References
- ↑ http://www.pkware.com/documents/casestudies/APPNOTE.TXT
- ↑ http://kikaku.itscj.ipsj.or.jp/sc34/open/1855.pdf
Links
- ZIP (file format): Wikipedia
- Zip files all the way down (creating an infinitely-regressed ZIP file)
- ZIP101 an archive walkthrough