BZ2
Dan Tobias (Talk | contribs) |
Dexvertbot (Talk | contribs) m (Change telparia.com samples link to template) |
||
| Line 1: | Line 1: | ||
| − | + | {{FormatInfo | |
| + | |name=bzip2 | ||
| + | |formattype=electronic | ||
| + | |subcat=Compression | ||
| + | |extensions={{ext|bz2}} | ||
| + | |mimetypes={{mimetype|application/x-bzip2}} | ||
| + | |pronom={{PRONOM|x-fmt/268}} | ||
| + | |wikidata={{wikidata|Q27866052}} | ||
| + | |released=1997 | ||
| + | }} | ||
| + | '''bzip2''' is a data compression algorithm and compressed file format. It was developed by Julian Seward. | ||
| + | |||
| + | == Identification == | ||
| + | A bzip2 file starts with the byte pattern {{magic|42 5a 68 ?? 31 41 59 26 53 59}}. | ||
| + | |||
| + | The first three bytes are ASCII "{{magic|BZh}}". (For signature "{{magic|BZ0}}", refer to the original [[bzip]] format.) The "<code>h</code>" has been said to stand for "Huffman coding", but confirmation is needed. | ||
| + | |||
| + | The byte at offset 3 is a code for the block size. Its possible values range from <code>0x31</code> to <code>0x39</code> (ASCII "<code>1</code>" to "<code>9</code>"). | ||
| + | |||
| + | The bytes at offset 4-9 are derived from the digits of the mathematical constant π ([[Binary-coded decimal|BCD]]-encoded). | ||
| + | |||
| + | The end-of-file marker uses magic number (hex) {{magic|17 72 45 38 50 90}}, derived from the square root of π. However, it is not byte-aligned. The result is that one of the following byte sequences appears beginning 10 bytes from the end of the file: | ||
| + | |||
| + | b9 22 9c 28 48 | ||
| + | dc 91 4e 14 24 | ||
| + | ee 48 a7 0a 12 | ||
| + | 77 24 53 85 09 | ||
| + | bb 92 29 c2 84 | ||
| + | 5d c9 14 e1 42 | ||
| + | 2e e4 8a 70 a1 | ||
| + | 17 72 45 38 50 | ||
| + | |||
| + | == Specifications == | ||
| + | * [https://github.com/dsnet/compress/blob/master/doc/bzip2-format.pdf Unofficial specification by Joe Tsai] | ||
| + | |||
| + | == Software == | ||
| + | * [https://sourceware.org/bzip2/ bzip2 and libbzip2] | ||
| + | * [[7-Zip]] | ||
| + | |||
| + | == Sample files == | ||
| + | * {{DexvertSamples|archive/bz2}} | ||
| + | |||
| + | == See also == | ||
| + | * [[Burrows–Wheeler transform]] | ||
| + | * [[bzip]] (predecessor) | ||
| + | |||
| + | == Links == | ||
| + | * [[Wikipedia:Bzip2|Wikipedia article]] | ||
| + | * [https://sourceware.org/bzip2/ bzip2 and libbzip2 website] | ||
| + | * [https://github.com/corkami/pics/blob/master/binary/BZ2.png Chart of format details] | ||
| + | * [https://lwn.net/Articles/762264/ bzip.org changes hands] (LWN article from August 9, 2018) | ||
| + | * [{{ForensicsWikiURL|bzip2}} ForensicsWiki entry] (also includes more details on the headers) | ||
| + | * [http://www.bzip.org/ bzip.org] | ||
Revision as of 04:55, 28 December 2023
bzip2 is a data compression algorithm and compressed file format. It was developed by Julian Seward.
Contents |
Identification
A bzip2 file starts with the byte pattern 42 5a 68 ?? 31 41 59 26 53 59.
The first three bytes are ASCII "BZh". (For signature "BZ0", refer to the original bzip format.) The "h" has been said to stand for "Huffman coding", but confirmation is needed.
The byte at offset 3 is a code for the block size. Its possible values range from 0x31 to 0x39 (ASCII "1" to "9").
The bytes at offset 4-9 are derived from the digits of the mathematical constant π (BCD-encoded).
The end-of-file marker uses magic number (hex) 17 72 45 38 50 90, derived from the square root of π. However, it is not byte-aligned. The result is that one of the following byte sequences appears beginning 10 bytes from the end of the file:
b9 22 9c 28 48 dc 91 4e 14 24 ee 48 a7 0a 12 77 24 53 85 09 bb 92 29 c2 84 5d c9 14 e1 42 2e e4 8a 70 a1 17 72 45 38 50
Specifications
Software
Sample files
See also
- Burrows–Wheeler transform
- bzip (predecessor)
Links
- Wikipedia article
- bzip2 and libbzip2 website
- Chart of format details
- bzip.org changes hands (LWN article from August 9, 2018)
- ForensicsWiki entry (also includes more details on the headers)
- bzip.org