WordStar
Dan Tobias (Talk | contribs) |
Dan Tobias (Talk | contribs) |
||
Line 23: | Line 23: | ||
Extended characters (when they appeared in the special escaped sequence, consisting of character 1B hex, followed by the special character, followed by character 1C hex) were generally of the [[MS-DOS encodings]], at least if the file was created in a DOS version of WordStar. | Extended characters (when they appeared in the special escaped sequence, consisting of character 1B hex, followed by the special character, followed by character 1C hex) were generally of the [[MS-DOS encodings]], at least if the file was created in a DOS version of WordStar. | ||
+ | |||
+ | There was also a [[WordStar 2000]] program, with its own different file format not compatible with other WordStar versions; this program (which, despite its name, was released in the 1980s, nowhere near the year 2000) was intended to be a new-generation word-processor to compete with the newer programs that were starting to catch on at the time, but didn't succeed and actually went out of use earlier than the original WordStar, which continued to get updated through the 1990s. | ||
== Converting WordStar files with high bits set == | == Converting WordStar files with high bits set == |
Revision as of 06:54, 14 November 2012
File Formats | > | Electronic File Formats | > | Document | > | WordStar |
WordStar was a word processor originally released in 1978 which was extremely popular in the early 1980s before losing ground to other word processors (particularly WordStar). Many professional writers used it in that era, and given their notorious conservatism regarding tools used for their writing, some are still using it to this day. This means that many original manuscripts are stored in this format.
The original version was for the CP/M operating system, but it was later ported to a number of other systems; the PC/MS-DOS version became the most popular one. The particular set of control keys used for accessing various functions (often requiring multiple keypresses) were widely imitated in other programs at the time, making a "de-facto standard" for editing keys that got even wider use than WordStar itself.
As with many early word processors, its files were basically plain text, with optional special functions causing control characters to be inserted. Files could be created or edited with any extension, but .wp (sometimes with an appended number to mark versions, like .wp3) was commonly used.
One quirk present in versions prior to 5.0 was its use of the high bit of each byte of its files to denote the last letter of a word. This limited the character set to 7-bit ASCII, where all characters in the document that were not the last letter of a word had a clear high bit (and thus had values from 00-7F hex corresponding to the ASCII values), while last letters had the high bit set (giving them values from 80-FF hex, but actually representing the corresponding characters from 00-7F). This interfered with internationalization, since it prevented the use of extended character sets beyond ASCII, and also resulted in WordStar files having characters at the end of words that looked like gibberish in other programs which interpreted the characters via some 8-bit encoding. Eventually this "feature" was dropped, but even in late versions extended characters were marked in the saved files by control characters both preceding and following them, making an 8-bit character take three bytes to store, which was necessary to preserve file compatibility (old WordStar files with high bits set at the end of words still needed to load correctly meaning that the program couldn't interpret high-bit characters as other characters in extended character sets without a special marker).
Extended characters (when they appeared in the special escaped sequence, consisting of character 1B hex, followed by the special character, followed by character 1C hex) were generally of the MS-DOS encodings, at least if the file was created in a DOS version of WordStar.
There was also a WordStar 2000 program, with its own different file format not compatible with other WordStar versions; this program (which, despite its name, was released in the 1980s, nowhere near the year 2000) was intended to be a new-generation word-processor to compete with the newer programs that were starting to catch on at the time, but didn't succeed and actually went out of use earlier than the original WordStar, which continued to get updated through the 1990s.
Contents |
Converting WordStar files with high bits set
Some other programs have special "WordStar import" features which handle high-bit characters, but if you need to deal with such files without a conversion utility, it's helpful to change high-bit characters to their corresponding 7-bit characters in order to have standard ASCII. This can be done simply in most programming or scripting languages; here's a Perl example, for instance.
open OUTFILE, ">out.txt"; open INFILE, "<in.ws"; while (<INFILE>) { tr [\200-\377] [\000-\177]; print OUTFILE $_; } close INFILE; close OUTFILE;
Control characters
These are the control characters as stored in WordStar documents, and their meanings. Most of them are program-specific, not corresponding to the standard ASCII control meanings, though some of these are preserved. The toggle options were used at the start and end of blocks of text intended to be formatted in a particular way (e.g., bold).
Hex | Dec | ASCII Char | Ctrl Key | WordStar Key | WordStar meaning |
---|---|---|---|---|---|
00 | 0 | NUL | ^@ | Control-PZ | In some versions right-aligns text; in others fixes print head to absolute position of character in line |
01 | 1 | SOH | ^A | Control-PA | Toggles alternate font |
02 | 2 | STX | ^B | Control-PB | Toggles Bold mode |
03 | 3 | ETX | ^C | Control-PC | Pause print for user response |
04 | 4 | EOT | ^D | Control-PD | Toggles double-strike mode |
05 | 5 | ENQ | ^E | Control-PE | Custom print control |
06 | 6 | ACK | ^F | Control-PF | Phantom space |
07 | 7 | BEL | ^G | Control-PG | Phantom rubout |
08 | 8 | BS | ^H | Control-PH | Overprint previous character (backspace) |
09 | 9 | HT | ^I | Control-PI | Tab |
0A | 10 | LF | ^J | Control-PJ | Linefeed: follows Carriage Return for line break. (Enter/Return inserts two-character sequence ^M^J) |
0B | 11 | VT | ^K | Control-PK | In some versions, centers text; in others marks text to be indexed (placed both before and after the text sequence) |
0C | 12 | FF | ^L | Control-PL | Form feed (page break) |
0D | 13 | CR | ^M | Control-PM | Carriage Return: precedes Linefeed for line break. (Enter/Return inserts two-character sequence ^M^J) |
0E | 14 | SO | ^N | Control-PN | Return to normal character width |
0F | 15 | SI | ^O | Control-PO | Non-breaking space |
10 | 16 | DLE | ^P | Control-PP | Unused |
11 | 17 | DC1 | ^Q | Control-PQ | Custom print control |
12 | 18 | DC2 | ^R | Contorl-PR | Custom print control |
13 | 19 | DC3 | ^S | Control-PS | Toggles underline mode |
14 | 20 | DC4 | ^T | Control-PT | Toggles superscript mode |
15 | 21 | NAK | ^U | Control-PU | Unused |
16 | 22 | SYN | ^V | Control-PV | Toggles subscript mode |
17 | 23 | ETB | ^W | Control-PW | Custom print control |
18 | 24 | CAN | ^X | Control-PX | Toggles overstrike mode |
19 | 25 | EM | ^Y | Control-PY | Toggles italic mode |
1A | 26 | SUB | ^Z | End-of-file character | |
1B | 27 | ESC | ^[ | Marks that following character is extended character | |
1C | 28 | FS | ^\ | Marks that previous character is extended character (you need both 1B and 1C to delimit extended characters) | |
1D | 29 | GS | ^] | Symmetrical sequence start/stop character | |
1E | 30 | RS | ^^ | Inactive Soft Hyphen | |
1F | 31 | US | ^_ | Active Soft Hyphen | |
8D | 141 | Soft Carriage Return (inserted, followed by normal linefeed 0A, to mark soft line break at word-wrap) | |||
A0 | 160 | Soft Space |
Dot commands
These commands are intended to be on a line by themselves, and started with the dot (.). This meant that regular text lines couldn't start with dots. Many other early word processors emulated WordStar in their use of "dot lines" for commands, though some of them required a control character to precede the dot in order to allow dots at the start of normal text lines. The specific commands varied a lot between programs, however.
- .. Comment line (followed by comment text; not printed)