GW-BASIC tokenized file
Dan Tobias (Talk | contribs) m (→Tokens) |
(→Sample files) |
||
(7 intermediate revisions by 3 users not shown) | |||
Line 8: | Line 8: | ||
'''GW-BASIC tokenized files''' stored programs in the version of the [[BASIC]] programming language used on IBM PC compatibles in the days when interpreted BASIC was regularly included on personal computers as shipped from the factory. Originally the IBM PC had versions of BASIC called BASIC and BASICA, the latter being an "advanced" BASIC with a few more features. Part of it was in ROM, and part was loaded from disk. Other manufacturers' PC compatibles (or "clones") didn't have the ROM BASIC, but used a BASIC from Microsoft which was compatible to it, and went by a few manufacturer-specific names but was generically known as GW-BASIC (with varying claims existing about what the GW stands for, either the initials of a Microsoft employee (Greg Whitten) involved in adapting it from Bill Gates' original CP/M BASIC, or possibly for "Gee Whiz"). | '''GW-BASIC tokenized files''' stored programs in the version of the [[BASIC]] programming language used on IBM PC compatibles in the days when interpreted BASIC was regularly included on personal computers as shipped from the factory. Originally the IBM PC had versions of BASIC called BASIC and BASICA, the latter being an "advanced" BASIC with a few more features. Part of it was in ROM, and part was loaded from disk. Other manufacturers' PC compatibles (or "clones") didn't have the ROM BASIC, but used a BASIC from Microsoft which was compatible to it, and went by a few manufacturer-specific names but was generically known as GW-BASIC (with varying claims existing about what the GW stands for, either the initials of a Microsoft employee (Greg Whitten) involved in adapting it from Bill Gates' original CP/M BASIC, or possibly for "Gee Whiz"). | ||
− | Like most BASICs of its era, BASIC/BASICA/GW-BASIC used a tokenized format to save its programs, rather than plain-text source code. Printable ASCII characters (space through tilde) generally stood for themselves (except when part of a multi-byte sequence), but other bytes had different meanings. The "high-bit" bytes from #128-#255 stood for the various BASIC commands (some as single bytes, others as part of two-byte sequences), while some of the control characters had special meanings including signifying the start of a binary-encoded sequence encapsulating a numeric constant. A null (#0) byte marked the end of a program line, and some header bytes were used at the start of the line to encode the line number and some byte offsets. | + | Like most BASICs of its era, BASIC/BASICA/GW-BASIC used a tokenized format to save its programs, rather than plain-text source code. Printable ASCII characters (space through tilde) generally stood for themselves (except when part of a multi-byte sequence), but other bytes had different meanings. The "high-bit" bytes from #128-#255 stood for the various BASIC commands (some as single bytes, others as part of two-byte sequences), while some of the control characters had special meanings including signifying the start of a binary-encoded sequence encapsulating a numeric constant. A null (#0) byte marked the end of a program line, and some header bytes were used at the start of the line to encode the line number and some byte offsets: specifically, two bytes containing the offset of the next line as a [[Endianness|little-endian]] integer and two bytes containing the line number as a little-endian integer. |
+ | |||
+ | Files saved to disk are preceded by a single byte to indicate if the program was protected: 0FEh if protected, 0FFh if not. Files saved to [[IBM PC data cassette|cassette tape]] omit this byte, because byte 9 of the cassette header holds the protection status. The file was terminated with Ctrl-Z (1A hex). | ||
== Tokens == | == Tokens == | ||
Line 487: | Line 489: | ||
== Format documentation == | == Format documentation == | ||
* [http://www.chebucto.ns.ca/~af380/GW-BASIC-tokens.html GW-BASIC tokenised program format] | * [http://www.chebucto.ns.ca/~af380/GW-BASIC-tokens.html GW-BASIC tokenised program format] | ||
+ | |||
+ | == Sample files == | ||
+ | * Many files on [http://cd.textfiles.com/bthevhell/ Between Heaven and Hell Version II CD], such as [http://cd.textfiles.com/bthevhell/200/104/ /200/104/*.bas] | ||
+ | * {{DexvertSamples|document/gwBasic}} | ||
+ | |||
+ | == Software == | ||
+ | * [https://github.com/rwtodd/bascat Bascat: decode GW-BASIC tokenized files] | ||
== Other links and references == | == Other links and references == | ||
* [[Wikipedia:GW-BASIC|Wikipedia article: GW-BASIC]] | * [[Wikipedia:GW-BASIC|Wikipedia article: GW-BASIC]] | ||
+ | * [https://github.com/historicalsource/GW-BASIC Source code to GW-BASIC from 1983] | ||
+ | |||
+ | [[Category:Microsoft]] |
Latest revision as of 01:41, 6 May 2024
GW-BASIC tokenized files stored programs in the version of the BASIC programming language used on IBM PC compatibles in the days when interpreted BASIC was regularly included on personal computers as shipped from the factory. Originally the IBM PC had versions of BASIC called BASIC and BASICA, the latter being an "advanced" BASIC with a few more features. Part of it was in ROM, and part was loaded from disk. Other manufacturers' PC compatibles (or "clones") didn't have the ROM BASIC, but used a BASIC from Microsoft which was compatible to it, and went by a few manufacturer-specific names but was generically known as GW-BASIC (with varying claims existing about what the GW stands for, either the initials of a Microsoft employee (Greg Whitten) involved in adapting it from Bill Gates' original CP/M BASIC, or possibly for "Gee Whiz").
Like most BASICs of its era, BASIC/BASICA/GW-BASIC used a tokenized format to save its programs, rather than plain-text source code. Printable ASCII characters (space through tilde) generally stood for themselves (except when part of a multi-byte sequence), but other bytes had different meanings. The "high-bit" bytes from #128-#255 stood for the various BASIC commands (some as single bytes, others as part of two-byte sequences), while some of the control characters had special meanings including signifying the start of a binary-encoded sequence encapsulating a numeric constant. A null (#0) byte marked the end of a program line, and some header bytes were used at the start of the line to encode the line number and some byte offsets: specifically, two bytes containing the offset of the next line as a little-endian integer and two bytes containing the line number as a little-endian integer.
Files saved to disk are preceded by a single byte to indicate if the program was protected: 0FEh if protected, 0FFh if not. Files saved to cassette tape omit this byte, because byte 9 of the cassette header holds the protection status. The file was terminated with Ctrl-Z (1A hex).
Contents |
[edit] Tokens
Blanks are unused, or at least unknown.
As noted, some of the tokens are preceded or followed by other bytes representing other symbols which are suppressed on listing the program (so they are "invisible"). These are presumably there to make parsing by the interpreter easier.
Hex | Dec | Token meaning |
---|---|---|
80 | 128 | |
81 | 129 | END |
82 | 130 | FOR |
83 | 131 | NEXT |
84 | 132 | DATA |
85 | 133 | INPUT |
86 | 134 | DIM |
87 | 135 | READ |
88 | 136 | LET |
89 | 137 | GOTO |
8A | 138 | RUN |
8B | 139 | IF |
8C | 140 | RESTORE |
8D | 141 | GOSUB |
8E | 142 | RETURN |
8F | 143 | REM |
90 | 144 | STOP |
91 | 145 | |
92 | 146 | CLEAR |
93 | 147 | LIST |
94 | 148 | NEW |
95 | 149 | ON |
96 | 150 | WAIT |
97 | 151 | DEF |
98 | 152 | POKE |
99 | 153 | CONT |
9A | 154 | |
9B | 155 | |
9C | 156 | OUT |
9D | 157 | LPRINT |
9E | 158 | LLIST |
9F | 159 | |
A0 | 160 | WIDTH |
A1 | 161 | ELSE (stored with invisible colon, 3A, before it) |
A2 | 162 | TRON |
A3 | 163 | TROFF |
A4 | 164 | SWAP |
A5 | 165 | ERASE |
A6 | 166 | EDIT |
A7 | 167 | ERROR |
A8 | 168 | RESUME |
A9 | 169 | DELETE |
AA | 170 | AUTO |
AB | 171 | RENUM |
AC | 172 | DEFSTR |
AD | 173 | DEFINT |
AE | 174 | DEFSNG |
AF | 175 | DEFDBL |
B0 | 176 | LINE |
B1 | 177 | WHILE (stored with invisible plus, E9, after it) |
B2 | 178 | WEND |
B3 | 179 | CALL |
B4 | 180 | |
B5 | 181 | |
B6 | 182 | |
B7 | 183 | WRITE |
B8 | 184 | OPTION |
B9 | 185 | RANDOMIZE |
BA | 186 | OPEN |
BB | 187 | CLOSE |
BC | 188 | LOAD |
BD | 189 | MERGE |
BE | 190 | SAVE |
BF | 191 | COLOR |
C0 | 192 | CLS |
C1 | 193 | MOTOR |
C2 | 194 | BSAVE |
C3 | 195 | BLOAD |
C4 | 196 | SOUND |
C5 | 197 | BEEP |
C6 | 198 | PSET |
C7 | 199 | PRESET |
C8 | 200 | SCREEM |
C9 | 201 | KEY |
CA | 202 | LOCATE |
CB | 203 | |
CC | 204 | TO |
CD | 205 | THEN |
CE | 206 | TAB( |
CF | 207 | STEP |
D0 | 208 | USR |
D1 | 209 | FN |
D2 | 210 | SPC( |
D3 | 211 | NOT |
D4 | 212 | ERL |
D5 | 213 | ERR |
D6 | 214 | STRING$ |
D7 | 215 | USING |
D8 | 216 | INSTR |
D9 | 217 | ' (stored with invisible ":REM", 3A 8F, before it) |
DA | 218 | VARPTR |
DB | 219 | CSRLIN |
DC | 220 | POINT |
DD | 221 | OFF |
DE | 222 | INKEY$ |
DF | 223 | |
E0 | 224 | |
E1 | 225 | |
E2 | 226 | |
E3 | 227 | |
E4 | 228 | |
E5 | 229 | |
E6 | 230 | > |
E7 | 231 | = |
E8 | 232 | < |
E9 | 233 | + |
EA | 234 | - |
EB | 235 | * |
EC | 236 | / |
ED | 237 | ^ |
EE | 238 | AND |
EF | 239 | OR |
E0 | 240 | XOR |
F1 | 241 | EQV |
F2 | 242 | IMP |
F3 | 243 | MOD |
F4 | 244 | \ |
F5 | 245 | |
F6 | 246 | |
F7 | 247 | |
F8 | 248 | |
F9 | 249 | |
FA | 250 | |
FB | 251 | |
FC | 252 | |
FD | 253 | (signals that next byte represents token from List 2) |
FE | 254 | (signals that next byte represents token from List 3) |
FF | 255 | (signals that next byte represents token from List 4) |
[edit] List 2: 2nd-byte tokens following FD
These are preceded by a FD (hex) byte.
Hex | Dec | Token meaning |
---|---|---|
81 | 129 | CVI |
82 | 130 | CVS |
83 | 131 | CVD |
84 | 132 | MKI$ |
85 | 133 | MKS$ |
86 | 134 | MKD$ |
87 | 135 | |
88 | 136 | |
89 | 137 | |
8A | 138 | |
8B | 139 | EXTERR |
[edit] List 3: 2nd-byte tokens following FE
These are preceded by a FE (hex) byte.
Hex | Dec | Token meaning |
---|---|---|
81 | 129 | FILES |
82 | 130 | FIELD |
83 | 131 | SYSTEM |
84 | 132 | NAME |
85 | 133 | LSET |
86 | 134 | RSET |
87 | 135 | KILL |
88 | 136 | PUT |
89 | 137 | GET |
8A | 138 | RESET |
8B | 139 | COMMON |
8C | 140 | CHAIN |
8D | 141 | DATE$ |
8E | 142 | TIME$ |
8F | 143 | PAINT |
90 | 144 | COM |
91 | 145 | CIRCLE |
92 | 146 | DRAW |
93 | 147 | PLAY |
94 | 148 | TIMER |
95 | 149 | ERDEV |
96 | 150 | IOCTL |
97 | 151 | CHDIR |
98 | 152 | MKDIR |
99 | 153 | RMDIR |
9A | 154 | SHELL |
9B | 155 | ENVIRON |
9C | 156 | VIEW |
9D | 157 | WINDOW |
9E | 158 | PMAP |
9F | 159 | PALETTE |
A0 | 160 | LCOPY |
A1 | 161 | CALLS |
A2 | 162 | |
A3 | 163 | |
A4 | 164 | NOISE (PCjr), DEBUG (Sperry PC) |
A5 | 165 | PCOPY (PCjr, EGA system) |
A6 | 166 | TERM (PCjr) |
A7 | 167 | LOCK |
A8 | 168 | UNLOCK |
[edit] List 4: 2nd-byte tokens following FF
These are preceded by a FF (hex) byte.
Hex | Dec | Token meaning |
---|---|---|
81 | 129 | LEFT$ |
82 | 130 | RIGHT$ |
83 | 131 | MID$ |
84 | 132 | SGN |
85 | 133 | INT |
86 | 134 | ABS |
87 | 135 | SQR |
88 | 136 | RND |
89 | 137 | SIN |
8A | 138 | LOG |
8B | 139 | EXP |
8C | 140 | COS |
8D | 141 | TAN |
8E | 142 | ATN |
8F | 143 | FRE |
90 | 144 | INP |
91 | 145 | POS |
92 | 146 | LEN |
93 | 147 | STR$ |
94 | 148 | VAL |
95 | 149 | ASC |
96 | 150 | CHR$ |
97 | 151 | PEEK |
98 | 152 | SPACE$ |
99 | 153 | OCT$ |
9A | 154 | HEX$ |
9B | 155 | LPOS |
9C | 156 | CINT |
9D | 157 | CSNG |
9E | 158 | CDBL |
9F | 159 | FIX |
A0 | 160 | PEN |
A1 | 161 | STICK |
A2 | 162 | STRIG |
A3 | 163 | EOF |
A4 | 164 | LOC |
A5 | 165 | LOF |
[edit] Format documentation
[edit] Sample files
- Many files on Between Heaven and Hell Version II CD, such as /200/104/*.bas
- dexvert samples — document/gwBasic