UTF-8

File Format
Name	UTF-8
Ontology	Electronic File Formats Character Encodings UTF-8 ; ; ;

Revision as of 01:04, 29 November 2012

UCS Transformation Format—8-bit (UTF-8) is a Unicode character encoding. Codes 0-127 (00-7F hexadecimal) represent the equivalent ASCII characters, and these codes in a UTF-8 stream are never used in any other context. Codes FE and FF are never used, except in the optional Byte Order Mark at the beginning of a document. In UTF-8 the BOM is encoded as the bytes 0xEF, 0xBB, 0xBF. Since UTF-8 has no "endianness," this is not actually a byte order indicator but can be treated as a signature indicating the document is UTF-8 encoded.

UTF-8 is best suited for scripts that make heavy use of the Roman alphabet. With other scripts it may not provide as efficient an encoding as UTF-16 or UTF-32.

Specifications

STD 63
- RFC 3629 (2003-11)
- RFC 2279 (1998-01)
- RFC 2044 (1996-10)
Unicode 6.0, Chapter 3 (2011) – §3.9 D92, §3.10 D95
ISO/IEC 10646:2003 Annex D (2003)

External links

@@ Line 3: / Line 3: @@
 |subcat=Character Encodings
 }}
-'''[[UCS]] Transformation Format—8-bit''' (UTF-8) is a [[Unicode]] character encoding. Codes 0-127 (0-7F hexadecimal) represent the equivalent [[ASCII]] characters, and these codes in a UTF-8 stream are never used in any other context. Codes FE and FF are never used, except in the optional Byte Order Mark at the beginning of a document. In UTF-8 the BOM is encoded as the bytes 0xEF, 0xBB, 0xBF. Since UTF-8 has no "endianness," this is not actually a byte order indicator but can be treated as a signature indicating the document is UTF-8 encoded.
+'''[[UCS]] Transformation Format—8-bit''' (UTF-8) is a [[Unicode]] character encoding. Codes 0-127 (00-7F hexadecimal) represent the equivalent [[ASCII]] characters, and these codes in a UTF-8 stream are never used in any other context. Codes FE and FF are never used, except in the optional [[Byte Order Mark]] at the beginning of a document. In UTF-8 the BOM is encoded as the bytes 0xEF, 0xBB, 0xBF. Since UTF-8 has no "endianness," this is not actually a byte order indicator but can be treated as a signature indicating the document is UTF-8 encoded.
 UTF-8 is best suited for scripts that make heavy use of the Roman alphabet. With other scripts it may not provide as efficient an encoding as [[UTF-16]] or [[UTF-32]].

UTF-8

Revision as of 01:04, 29 November 2012

Specifications

External links

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Toolbox