Plain text

From Just Solve the File Format Problem
Revision as of 12:45, 17 March 2017 by Jsummers (Talk | contribs)

Jump to: navigation, search
File Format
Name Plain text
Ontology
Extension(s) .txt, .doc, (none)
MIME Type(s) text/plain
PRONOM x-fmt/111

Plain text files (also known by the extension TXT) consist of characters encoded sequentially in some particular character encoding. Plain text files contain no formatting information other than white space characters. Some data formats (usually those intended to be human-readable) are based on plain text; see Text-based data for some structured formats that are stored in plain text (and hence can be opened in a plain text editor if no more specific program is available).

Traditionally, ASCII was used much of the time for maximum interoperability, though many platform-specific character sets were also in use. For non-English text an encoding supporting a broader character repertoire is needed, often UTF-8 nowadays. Note that if the file consists only of 7-bit ASCII characters, the bytes of the file are identical in us-ascii, ISO-8859-1, UTF-8, and a number of other encodings, so such a file can be identified as any of these depending on what is most convenient for a particular application. It is only when characters out of this repertoire are used that encoding-specific details need be considered. Some formats, such as HTML and XML, provide some sort of escape sequences (such as ampersands used for character references and entities) allowing special characters to be referenced within the document while leaving the document itself entirely ASCII.

Another point of contention or incompatibility in text-file formats is the conventions for line and paragraph breaks. Depending on what system the file was created on or intended to be viewed on, line breaks may be done as Carriage Return (ASCII 0D hex) and Linefeed (ASCII 0A hex) together (usually in that order, though in rare cases in the opposite order), or just one of those characters alone. Some text viewing or editing programs that are not cross-platform-friendly will really mess up badly in attempting to view/edit files using a different line break convention than the program expects, so you might see lines overwriting one another instead of going to the next line, or peculiar control characters show up within the file, or other strangeness. Files with linefeed alone are often referred to as "UNIX mode" (and the linefeed, in this context, referred to as NL for Newline), while files with carriage return alone are referred to as "Mac mode" (though it's also common in other early platforms such as the Apple II and Commodore 64, and no longer used in current Macs), while the CR+LF format is called "DOS" or "PC" or "Windows" mode (though it was used in various mainframes and network protocols as well).

Files may also use hard line breaks to keep line length within a fixed number of columns (usually 80, but other values such as 40 or 65 are used sometimes), or just have line breaks at the end of paragraphs and expect systems to word-wrap long lines; encountering files of a different convention than you expect may result in lines running way off to the right of the screen and requiring horizontal scrolling, or else short, choppy lines. Many text editors have a "paragraph reformat" command to bring paragraphs into compliance with your desired conventions.

Most operating systems include a simple text editor (e.g., Windows Notepad) which can open text files, but many other text editors exist (and computer people sometimes have "holy wars" over which one is best). Some of the common text editors are EMACS, vi, and UltraEdit. In the earlier days of computing, there was less distinction between text editors and word processors than there is now, as word processors generally used a format that was mostly plain text and could even be completely plain text if you refrained from using special embedded commands and features. However, modern word processors such as Microsoft Word default to using program-specific save formats that have little resemblance to plain text, unless you go out of your way to "Save As" .txt. A common "newbie error" is to attempt to create or edit plain text files in such a program, leaving the files as proprietarily-formatted in a way that messes up the operation of other programs that expect to find plain text.

Creating artwork using text characters is known as ASCII Art, or other variants such as ANSI Art if special control or escape codes are used in addition to the plain text characters.

See also

Software

Links and References

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox