Variable-length quantity

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
m
 
(5 intermediate revisions by one user not shown)
Line 1: Line 1:
 
{{FormatInfo
 
{{FormatInfo
 
|formattype=electronic
 
|formattype=electronic
|subcat=Binary Data
+
|subcat=Elements of File Formats
 
}}
 
}}
'''Variable-length base-128''' refers a family of formats for encoding arbitrarily large integers. It does not seem to have a well-established name, though it's sometimes called '''variable-length quantity''' ('''VLQ'''), and specific forms of it may be given names by the specifications that use them.
+
By '''variable-length quantity''' ('''VLQ'''), we refer to a variety of schemes for encoding an integer in a variable number of [[byte]]s, where the required number of bytes grows as the magnitude of the integer grows. Such formats can encode arbitrarily large integers, though artificial limits are sometimes imposed.
 +
 
 +
As far as we know, there is no widely-used standard for such number formats. Each file format specification that uses them typically uses its own definition, and invents its own name.
  
 
== Details ==
 
== Details ==
The integer to be encoded is written in binary, then partitioned into as many 7-bit blocks as necessary, each of which is stored in a byte. The remaining bit in each byte (usually the most-significant bit) is used as a flag to indicate whether there are any more bytes. Most commonly, the flag bit is 1 in every byte except the last.
+
The most common VLQ formats use a '''variable-length base-128''' scheme in which 7 bits of each byte constitute a base-128 digit, and the 8th bit indicates whether there are any more digits.
  
Both little-[[Endianness|endian]] (where the 7 data bits of the first byte are the least-significant 7 bits of the encoded integer) and big-endian (where they are the most-significant) variants exist.
+
Even this simple case has a combinatorial explosion of variants:
 +
* The flag bit could be the most significant bit, or the least significant bit.
 +
* The flag for end-of-number could be 0, or 1.
 +
* The most-significant block of 7 bits could come first (big-[[Endianness|endian]] style), or last (little-endian style).
 +
* If negative numbers are supported, there are several different ways to do it.
 +
* There could be a defined maximum number of bytes. If so, then for maximal-length codes, the final byte's flag bit could be repurposed.
  
The encoded integer may be unsigned, or it may be a signed integer using, e.g., [[two's complement]] representation.
+
Instead of having one flag bit per byte, there could be one every two bytes, resulting in a base-32768 encoding.
 
+
Many variations are possible. Instead of bytes, 16-bit (base-32768) or larger code units can be used. If there is a defined limit on the number of bytes (or code units), then for maximal-length codes, the final byte's flag bit can be repurposed.
+
  
 
== Related formats ==
 
== Related formats ==
Among the formats that use variable-length base-128 are:
+
Among the formats that use variable-length quantities are:
 
* [[BPG]] ("ue7")
 
* [[BPG]] ("ue7")
 
* [[DWARF]] ("LEB128", "ULEB128", "SLEB128")
 
* [[DWARF]] ("LEB128", "ULEB128", "SLEB128")
* [[HLP (WinHelp)]] and [[Segmented Hypergraphics]] ("compressed short"). The "compressed long" type is a 16-bit variant.
+
* [[HLP (WinHelp)]] and [[Segmented Hypergraphics]] ("compressed short" and "compressed long").
 
* [[MIDI]] ("variable-length quantity")
 
* [[MIDI]] ("variable-length quantity")
 
* [[Protobuf]] ("base 128 varint")
 
* [[Protobuf]] ("base 128 varint")
Line 26: Line 31:
 
* [[Wikipedia:Variable-length quantity|Wikipedia: Variable-length quantity]]
 
* [[Wikipedia:Variable-length quantity|Wikipedia: Variable-length quantity]]
 
* [[Wikipedia:LEB128|Wikipedia: LEB128]]
 
* [[Wikipedia:LEB128|Wikipedia: LEB128]]
 +
* [http://formats.kaitai.io/vlq_base128_be/index.html Kaitai Struct: vlq_base128_be]
 +
* [http://formats.kaitai.io/vlq_base128_le/index.html Kaitai Struct: vlq_base128_le]
 +
 +
[[Category:Number formats]]

Latest revision as of 15:50, 16 September 2017

File Format
Name Variable-length quantity
Ontology

By variable-length quantity (VLQ), we refer to a variety of schemes for encoding an integer in a variable number of bytes, where the required number of bytes grows as the magnitude of the integer grows. Such formats can encode arbitrarily large integers, though artificial limits are sometimes imposed.

As far as we know, there is no widely-used standard for such number formats. Each file format specification that uses them typically uses its own definition, and invents its own name.

[edit] Details

The most common VLQ formats use a variable-length base-128 scheme in which 7 bits of each byte constitute a base-128 digit, and the 8th bit indicates whether there are any more digits.

Even this simple case has a combinatorial explosion of variants:

  • The flag bit could be the most significant bit, or the least significant bit.
  • The flag for end-of-number could be 0, or 1.
  • The most-significant block of 7 bits could come first (big-endian style), or last (little-endian style).
  • If negative numbers are supported, there are several different ways to do it.
  • There could be a defined maximum number of bytes. If so, then for maximal-length codes, the final byte's flag bit could be repurposed.

Instead of having one flag bit per byte, there could be one every two bytes, resulting in a base-32768 encoding.

[edit] Related formats

Among the formats that use variable-length quantities are:

[edit] Links

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox