UTF-8

From Just Solve the File Format Problem
Revision as of 06:05, 10 November 2012 by Gmcgath (Talk | contribs)

Jump to: navigation, search
File Format
Name UTF-8
Ontology

UCS Transformation Format—8-bit (UTF-8) is a Unicode character encoding. Codes 0-127 (0-7F hexadecimal) represent the equivalent ASCII characters, and these codes in a UTF-8 stream are never used in any other context. Codes FE and FF are never used, except in the optional Byte Order Mark at the beginning of a document. In UTF-8 the BOM is encoded as the bytes 0xEF, 0xBB, 0xBF. Since UTF-8 has no "endianness," this is not actually a byte order indicator but can be treated as a signature indicating the document is UTF-8 encoded.

UTF-8 is best suited for scripts that make heavy use of the Roman alphabet. With other scripts it may not provide as efficient an encoding as UTF-16 or UTF-32.

Specifications

External links

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox