GW-BASIC tokenized file

From Just Solve the File Format Problem
Jump to: navigation, search
File Format
Name GW-BASIC tokenized file
Ontology
Extension(s) .bas
Released 1981

GW-BASIC tokenized files stored programs in the version of the BASIC programming language used on IBM PC compatibles in the days when interpreted BASIC was regularly included on personal computers as shipped from the factory. Originally the IBM PC had versions of BASIC called BASIC and BASICA, the latter being an "advanced" BASIC with a few more features. Part of it was in ROM, and part was loaded from disk. Other manufacturers' PC compatibles (or "clones") didn't have the ROM BASIC, but used a BASIC from Microsoft which was compatible to it, and went by a few manufacturer-specific names but was generically known as GW-BASIC (with varying claims existing about what the GW stands for, either the initials of a Microsoft employee (Greg Whitten) involved in adapting it from Bill Gates' original CP/M BASIC, or possibly for "Gee Whiz").

Like most BASICs of its era, BASIC/BASICA/GW-BASIC used a tokenized format to save its programs, rather than plain-text source code. Printable ASCII characters (space through tilde) generally stood for themselves (except when part of a multi-byte sequence), but other bytes had different meanings. The "high-bit" bytes from #128-#255 stood for the various BASIC commands (some as single bytes, others as part of two-byte sequences), while some of the control characters had special meanings including signifying the start of a binary-encoded sequence encapsulating a numeric constant. A null (#0) byte marked the end of a program line, and some header bytes were used at the start of the line to encode the line number and some byte offsets: specifically, two bytes containing the offset of the next line as a little-endian integer and two bytes containing the line number as a little-endian integer.

Files saved to disk are preceded by a single byte to indicate if the program was protected: 0FEh if protected, 0FFh if not. Files saved to cassette tape omit this byte, because byte 9 of the cassette header holds the protection status. The file was terminated with Ctrl-Z (1A hex).

Contents

Tokens

Blanks are unused, or at least unknown.

As noted, some of the tokens are preceded or followed by other bytes representing other symbols which are suppressed on listing the program (so they are "invisible"). These are presumably there to make parsing by the interpreter easier.

Hex Dec Token meaning
80 128
81 129 END
82 130 FOR
83 131 NEXT
84 132 DATA
85 133 INPUT
86 134 DIM
87 135 READ
88 136 LET
89 137 GOTO
8A 138 RUN
8B 139 IF
8C 140 RESTORE
8D 141 GOSUB
8E 142 RETURN
8F 143 REM
90 144 STOP
91 145 PRINT
92 146 CLEAR
93 147 LIST
94 148 NEW
95 149 ON
96 150 WAIT
97 151 DEF
98 152 POKE
99 153 CONT
9A 154
9B 155
9C 156 OUT
9D 157 LPRINT
9E 158 LLIST
9F 159
A0 160 WIDTH
A1 161 ELSE (stored with invisible colon, 3A, before it)
A2 162 TRON
A3 163 TROFF
A4 164 SWAP
A5 165 ERASE
A6 166 EDIT
A7 167 ERROR
A8 168 RESUME
A9 169 DELETE
AA 170 AUTO
AB 171 RENUM
AC 172 DEFSTR
AD 173 DEFINT
AE 174 DEFSNG
AF 175 DEFDBL
B0 176 LINE
B1 177 WHILE (stored with invisible plus, E9, after it)
B2 178 WEND
B3 179 CALL
B4 180
B5 181
B6 182
B7 183 WRITE
B8 184 OPTION
B9 185 RANDOMIZE
BA 186 OPEN
BB 187 CLOSE
BC 188 LOAD
BD 189 MERGE
BE 190 SAVE
BF 191 COLOR
C0 192 CLS
C1 193 MOTOR
C2 194 BSAVE
C3 195 BLOAD
C4 196 SOUND
C5 197 BEEP
C6 198 PSET
C7 199 PRESET
C8 200 SCREEM
C9 201 KEY
CA 202 LOCATE
CB 203
CC 204 TO
CD 205 THEN
CE 206 TAB(
CF 207 STEP
D0 208 USR
D1 209 FN
D2 210 SPC(
D3 211 NOT
D4 212 ERL
D5 213 ERR
D6 214 STRING$
D7 215 USING
D8 216 INSTR
D9 217 ' (stored with invisible ":REM", 3A 8F, before it)
DA 218 VARPTR
DB 219 CSRLIN
DC 220 POINT
DD 221 OFF
DE 222 INKEY$
DF 223
E0 224
E1 225
E2 226
E3 227
E4 228
E5 229
E6 230 >
E7 231 =
E8 232 <
E9 233 +
EA 234 -
EB 235 *
EC 236 /
ED 237 ^
EE 238 AND
EF 239 OR
E0 240 XOR
F1 241 EQV
F2 242 IMP
F3 243 MOD
F4 244 \
F5 245
F6 246
F7 247
F8 248
F9 249
FA 250
FB 251
FC 252
FD 253 (signals that next byte represents token from List 2)
FE 254 (signals that next byte represents token from List 3)
FF 255 (signals that next byte represents token from List 4)

List 2: 2nd-byte tokens following FD

These are preceded by a FD (hex) byte.

Hex Dec Token meaning
81 129 CVI
82 130 CVS
83 131 CVD
84 132 MKI$
85 133 MKS$
86 134 MKD$
87 135
88 136
89 137
8A 138
8B 139 EXTERR

List 3: 2nd-byte tokens following FE

These are preceded by a FE (hex) byte.

Hex Dec Token meaning
81 129 FILES
82 130 FIELD
83 131 SYSTEM
84 132 NAME
85 133 LSET
86 134 RSET
87 135 KILL
88 136 PUT
89 137 GET
8A 138 RESET
8B 139 COMMON
8C 140 CHAIN
8D 141 DATE$
8E 142 TIME$
8F 143 PAINT
90 144 COM
91 145 CIRCLE
92 146 DRAW
93 147 PLAY
94 148 TIMER
95 149 ERDEV
96 150 IOCTL
97 151 CHDIR
98 152 MKDIR
99 153 RMDIR
9A 154 SHELL
9B 155 ENVIRON
9C 156 VIEW
9D 157 WINDOW
9E 158 PMAP
9F 159 PALETTE
A0 160 LCOPY
A1 161 CALLS
A2 162
A3 163
A4 164 NOISE (PCjr), DEBUG (Sperry PC)
A5 165 PCOPY (PCjr, EGA system)
A6 166 TERM (PCjr)
A7 167 LOCK
A8 168 UNLOCK

List 4: 2nd-byte tokens following FF

These are preceded by a FF (hex) byte.

Hex Dec Token meaning
81 129 LEFT$
82 130 RIGHT$
83 131 MID$
84 132 SGN
85 133 INT
86 134 ABS
87 135 SQR
88 136 RND
89 137 SIN
8A 138 LOG
8B 139 EXP
8C 140 COS
8D 141 TAN
8E 142 ATN
8F 143 FRE
90 144 INP
91 145 POS
92 146 LEN
93 147 STR$
94 148 VAL
95 149 ASC
96 150 CHR$
97 151 PEEK
98 152 SPACE$
99 153 OCT$
9A 154 HEX$
9B 155 LPOS
9C 156 CINT
9D 157 CSNG
9E 158 CDBL
9F 159 FIX
A0 160 PEN
A1 161 STICK
A2 162 STRIG
A3 163 EOF
A4 164 LOC
A5 165 LOF

Format documentation

Sample files

Software

Other links and references

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox