Percent-encoding
Percent-encoding is transfer encoding in which certain "unsafe" bytes are, in most cases, replaced by a 3-byte escape sequence. The escape sequence is a percent sign (%
), followed by two (usually uppercase) hex digits. Sometimes, as a special case, a space character is allowed to be encoded as a single "+
" character.
It is primarily used as part of URL encoding, and Form URL encoding.
The term Percent-encoding is actually somewhat ambiguous, and is often conflated with URL encoding.
Encoding text
Percent-encoding encodes byte-oriented data, and doesn't necessarily suggest a way to encode text. Nowadays, it's normal (but far from universal) for text to be encoded as UTF-8 before being percent-encoded.
A Microsoft extension of percent-encoding allowed a Unicode character (or UTF-16 code unit?) to be encoded %uXXXX
, but this is not standard, and not recommended.