UTF-32

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
(Clarify endianness)
Line 3: Line 3:
 
|subcat=Character Encodings
 
|subcat=Character Encodings
 
}}
 
}}
'''[[UCS]] Transformation Format—32-bit''' (UTF-32) is a [[Unicode]] character encoding. There is a one-to-one mapping of Unicode code points to 32-bit values, so all characters require the same number of bits. Since the largest code points can be expressed in only 21 bits, this encoding is inherently wasteful of space; [[UTF-8]] or [[UTF-16]] is a more efficient coding in most cases. UTF-32 does provide computational simplicity and is more often used for in-memory storage of characters than for stored documents.
+
'''UCS Transformation Format—32-bit''' (UTF-32) is a [[Unicode]] character encoding. There is a one-to-one mapping of Unicode code points to 32-bit values, so all characters require the same number of bits. Since the largest code points can be expressed in only 21 bits, this encoding is inherently wasteful of space; [[UTF-8]] or [[UTF-16]] is a more efficient coding in most cases. UTF-32 does provide computational simplicity and is more often used for in-memory storage of characters than for stored documents.
  
[http://www.unicode.org/faq/utf_bom.html#UTF32 UTF-32 FAQ]
+
As with UTF-16, this format exists in both big- and small-[[Endianness|endian]] varieties; since the relevant units are 32-bit chunks (not pairs of 16-bit chunks as the longer sequences of UTF-16 are), the endianness is applied to the entire 32 bits (4 bytes), meaning that the [[Byte Order Mark]] (zero-width no-break space) U+FEFF is encoded as byte sequence 00 00 FE FF in the big-endian version and FF FE 00 00 in the little-endian one (with all four bytes reversed from one version to the other).
 +
 
 +
== Links ==
 +
 
 +
* [[Wikipedia:UTF-32|Wikipedia article]]
 +
* [http://www.unicode.org/faq/utf_bom.html#UTF32 UTF-32 FAQ]

Revision as of 18:06, 17 February 2013

File Format
Name UTF-32
Ontology

UCS Transformation Format—32-bit (UTF-32) is a Unicode character encoding. There is a one-to-one mapping of Unicode code points to 32-bit values, so all characters require the same number of bits. Since the largest code points can be expressed in only 21 bits, this encoding is inherently wasteful of space; UTF-8 or UTF-16 is a more efficient coding in most cases. UTF-32 does provide computational simplicity and is more often used for in-memory storage of characters than for stored documents.

As with UTF-16, this format exists in both big- and small-endian varieties; since the relevant units are 32-bit chunks (not pairs of 16-bit chunks as the longer sequences of UTF-16 are), the endianness is applied to the entire 32 bits (4 bytes), meaning that the Byte Order Mark (zero-width no-break space) U+FEFF is encoded as byte sequence 00 00 FE FF in the big-endian version and FF FE 00 00 in the little-endian one (with all four bytes reversed from one version to the other).

Links

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox