ASCII

From Just Solve the File Format Problem
Revision as of 17:49, 3 November 2012 by Dan Tobias (Talk | contribs)

Jump to: navigation, search
File Formats > Electronic File Formats > Character Encoding > ASCII

The American Standard Code for Information Interchange (ASCII) is a character encoding designed for English-based information interchange. The first version was published in 1963, but had a number of differences from the later version published in 1967, which had some minor tweaks in 1986 to result in what is now referred to as us-ascii when specifying character encodings. ASCII was intended to replace a number of proprietary character sets used by various device manufacturers, and largely succeeded at that although IBM continued to use EBCDIC for a number of years. However, since only the English alphabet was included, many so-called "extended ASCII" sets were used with different characters (accented letters, other alphabets, and special symbols) in the positions from 128 to 255 which were available when an eighth bit was added to the seven bits needed to encode the 128 ASCII characters. (Some systems, however, used the eighth bit as a checksum or flag of some sort, precluding such character set extensions.) Some writing systems such as Chinese, Japanese, and Korean were entirely unsuitable for ASCII-based character sets, and adopted various multi-byte representations. Thus, there was once again a profusion of proprietary character encodings until Unicode brought some order to the chaos.

Early personal computers didn't always implement ASCII consistently. The original version of the Apple II lacked lowercase letters, for instance, showing random gibberish where those characters were found. A "lower case adaptor" chip could be installed to remedy this, and later computers in the Apple II series (starting with the IIe) came with lowercase support built in. Meanwhile, the Commodore PET, VIC-20, 64, and 128 used an unusual variation sometimes called "PET ASCII", which could be switched between two modes, one which only had uppercase letters (with the codes usually containing lowercase instead containing graphical characters), and another which introduces lowercase, but in the odd manner of replacing the character codes normally used for uppercase with lowercase letters, and adding a new set of uppercase letters at a completely different position in the set (replacing some graphic characters, but not the ones that are in the spots usually used by lowercase). This makes the conversion of text files created on or for Commodore computers a challenge.

Control characters

The lower 32 characters of the ASCII set are control characters given various special uses by different systems and programs, and sometimes also given a graphic rendition in some platforms.

Hex Dec Codes Acronym Name Description and uses
00 0 ^@, \0 NUL Null character Marks unused space or padding (e.g., to intentionally slow down terminals or to leave space for added data in memory or storage media). Used in C-based programming languages to mark end of string.
01 1 ^A SOH Start of Heading Marks the beginning of a header in a message or data structure.
02 2 ^B STX Start of Text Marks the beginning of the body text of a message, and/or the end of the header.
03 3 ^C ETX End of Text Marks the end of the body text. Also used as "break character" (Control-C) to terminate a program or process.
04 4 ^D EOT End of Transmission In Unix-style operating systems, signals end-of-file and is used to log out of a terminal. On Apple II, this character signalled that what followed was a DOS command when it was "printed" to standard output.
05 5 ^E ENQ Enquiry Used in transmission protocols to request acknowledgement from the other end to make sure connection is still active.
06 6 ^F ACK Acknowledge Sent as response to ENQ message, or used to positively acknowledge receipt of data or messages (as opposed to NAK).
07 7 ^G, \a BEL Bell On some systems, this causes a bell, buzzer, or beep to sound, or flashes inverse video to alert a system operator. The Apple II had "BELL" on the front side of the "G" key to remind users that Ctrl-G caused this sound effect.
08 8 ^H, \b BS Backspace Moves back one space. Usually deletes last character (e.g., from input string), but on some old terminals it just moved backward without deleting and allowed "overstrike" effects overlaying multiple characters.
09 9 ^I, \t HT Horizontal Tab The typewriter "tab key", usually moving to the next tab stop as defined in the particular software being used.
0A 10 ^J, \n LF Line Feed Move down one line. In Unix-style operating systems, it also moves to the beginning of the next line so that it can be used as a line break (newline) character, while in some other systems and terminals it just moves down without moving to the left, requiring the "CR LF" sequence to break a line.
0B 11 ^K, \v VT Vertical Tab Moves to vertical tab stops; not used nearly as often as the more-common horizontal tab.
0C 12 ^L, \f FF Form Feed Causes page to eject in printers, and may clear the screen in some terminal emulators. Sometimes used as a logical division of sections of a document.
0D 13 ^M, \r CR Carriage Return Moves to the beginning of the line. In some systems (e.g., Apple II and Commodore 64), also moves to the next line so that it can be used as a line break character, while in other systems stays on the same line so that it must be accompanied by a LF character to break a line. Thus the three different line-break conventions (LF, CR, and CR+LF) which bedevil modern users of text files arose.
0E 14 ^N SO Shift Out Switch to alternate character set (reversed by SI)
0F 15 ^O SI Shift In Return to normal character set (reverses operation of SO)
10 16 ^P DLE Data Link Escape Signals the start of a sequence of raw data as opposed to normal printable or control characters.
11 17 ^Q DC1 Device Control 1 One of four device-control codes intended to be system-specific. This one (CTRL-Q, also known as XON) is often used to resume operations of a process, device, or output stream that has been paused with CTRL-S (XOFF).
12 18 ^R DC2 Device Control 2 Another device-control code; not used as much as DC1 and DC3.
13 19 ^S DC3 Device Control 3 The third of the device-control codes; this one (CTRL-S, also known as XOFF) is often used to pause processes, devices, or output streams, with CTRL-Q (XON) resuming them (though in some cases, any keypress causes output to resume).
14 20 ^T DC4 Device Control 4 The fourth device-control code; not used as much as DC1 or DC3.

Specifications

External links

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox