Bzip2

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
(add details from ForensicsWiki entry)
(Software: XAD)
 
(5 intermediate revisions by 2 users not shown)
Line 6: Line 6:
 
|mimetypes={{mimetype|application/x-bzip2}}
 
|mimetypes={{mimetype|application/x-bzip2}}
 
|pronom={{PRONOM|x-fmt/268}}
 
|pronom={{PRONOM|x-fmt/268}}
 +
|wikidata={{wikidata|Q27866052}}
 
|released=1997
 
|released=1997
 
}}
 
}}
'''bzip2''' is a data compression algorithm and compressed file format.
+
'''bzip2''' is a data compression algorithm and compressed file format. It was developed by Julian Seward.
  
 
== Identification ==
 
== Identification ==
 +
A bzip2 file starts with the byte pattern {{magic|42 5a 68 ?? 31 41 59 26 53 59}}.
  
Files begin with magic number "BZ" (bytes 42 5A). Then either an "h" (0x68; [[Huffman coding]]) or "0" (0x30; deprecated original version), then the block size, in 100kb units (TODO: clarify this).
+
The first three bytes are ASCII "{{magic|BZh}}". (For signature "{{magic|BZ0}}", refer to the original [[bzip]] format.) The "<code>h</code>" has been said to stand for "Huffman coding", but confirmation is needed.
  
Each compressed block starts with a magic number 0x314159265359 (yes, that is the start of decimal π, but in hex.)  
+
The byte at offset 3 is a code for the block size. Its possible values range from <code>0x31</code> to <code>0x39</code> (ASCII "<code>1</code>" to "<code>9</code>").
  
The end of file marker uses magic number 0x177245385090 (square root of π, in the same ... interesting ... format.)
+
The bytes at offset 4-9 are derived from the digits of the mathematical constant π ([[Binary-coded decimal|BCD]]-encoded).
 +
 
 +
The end-of-file marker uses magic number (hex) {{magic|17 72 45 38 50 90}}, derived from the square root of π. However, it is not byte-aligned. The result is that one of the following byte sequences appears beginning 10 bytes from the end of the file:
 +
 
 +
b9 22 9c 28 48
 +
dc 91 4e 14 24
 +
ee 48 a7 0a 12
 +
77 24 53 85 09
 +
bb 92 29 c2 84
 +
5d c9 14 e1 42
 +
2e e4 8a 70 a1
 +
17 72 45 38 50
 +
 
 +
== Specifications ==
 +
* [https://github.com/dsnet/compress/blob/master/doc/bzip2-format.pdf Unofficial specification by Joe Tsai]
  
 
== Software ==
 
== Software ==
* [http://bzip.org/ bzip2 and libbzip2]
+
* [https://sourceware.org/bzip2/ bzip2 and libbzip2]
 
* [[7-Zip]]
 
* [[7-Zip]]
 +
* {{XAD}}
  
 
== Sample files ==
 
== Sample files ==
* https://telparia.com/fileFormatSamples/archive/bz2/sm.tar.bz2
+
* {{DexvertSamples|archive/bz2}}
  
 
== See also ==
 
== See also ==
 
* [[Burrows–Wheeler transform]]
 
* [[Burrows–Wheeler transform]]
* [[bzip]]
+
* [[bzip]] (predecessor)
  
 
== Links ==
 
== Links ==
 
* [[Wikipedia:Bzip2|Wikipedia article]]
 
* [[Wikipedia:Bzip2|Wikipedia article]]
* [https://twitter.com/angealbertini/status/537704386080694274/photo/1 Chart of format details]
+
* [https://sourceware.org/bzip2/ bzip2 and libbzip2 website]
 +
* [https://github.com/corkami/pics/blob/master/binary/BZ2.png Chart of format details]
 
* [https://lwn.net/Articles/762264/ bzip.org changes hands] (LWN article from August 9, 2018)
 
* [https://lwn.net/Articles/762264/ bzip.org changes hands] (LWN article from August 9, 2018)
* [https://web.archive.org/web/20190809161013/http://www.forensicswiki.org/wiki/Bzip2 ForensicsWiki entry] (also includes more details on the headers)
+
* [{{ForensicsWikiURL|bzip2}} ForensicsWiki entry] (also includes more details on the headers)
 +
* [http://www.bzip.org/ bzip.org]

Latest revision as of 10:37, 12 April 2024

File Format
Name bzip2
Ontology
Extension(s) .bz2
MIME Type(s) application/x-bzip2
PRONOM x-fmt/268
Wikidata ID Q27866052
Released 1997

bzip2 is a data compression algorithm and compressed file format. It was developed by Julian Seward.

Contents

[edit] Identification

A bzip2 file starts with the byte pattern 42 5a 68 ?? 31 41 59 26 53 59.

The first three bytes are ASCII "BZh". (For signature "BZ0", refer to the original bzip format.) The "h" has been said to stand for "Huffman coding", but confirmation is needed.

The byte at offset 3 is a code for the block size. Its possible values range from 0x31 to 0x39 (ASCII "1" to "9").

The bytes at offset 4-9 are derived from the digits of the mathematical constant π (BCD-encoded).

The end-of-file marker uses magic number (hex) 17 72 45 38 50 90, derived from the square root of π. However, it is not byte-aligned. The result is that one of the following byte sequences appears beginning 10 bytes from the end of the file:

b9 22 9c 28 48
dc 91 4e 14 24
ee 48 a7 0a 12
77 24 53 85 09
bb 92 29 c2 84
5d c9 14 e1 42
2e e4 8a 70 a1
17 72 45 38 50

[edit] Specifications

[edit] Software

[edit] Sample files

[edit] See also

[edit] Links

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox