Sol BASIC tokenized file

From Just Solve the File Format Problem
Jump to: navigation, search
File Format
Name Sol BASIC tokenized file
Ontology
Released 1976

Sol BASIC screen shot (as simulated in Solace)

Sol BASIC screen shot (as simulated in Solace)

Sol was a line of computers in the late 1970s, the most popular of which was the SOL-20. It was one of the S-100 bus computers of that era which, if you added a disk drive, could run the CP/M operating system, but was often used with cassette data storage instead. It had a version of the BASIC programming language (not in ROM; you had to load it). When you saved a BASIC program to tape or disk, you could add a parameter to the SAVE command to make it save the program as plain text, which was more suitable for transfer to other systems. However, the default save mode was the more compact (but less transferable) tokenized form. On cassette, the low-level format was Kansas City standard (Or maybe CUTS?).

Contents

Documenting the format

No documentation of the specific tokenized format appeared to be readily accessible (but see below), but it is possible to piece it together with the help of the Solace emulator (linked below). It does a great job of imitating a SOL-20 computer in MS-Windows, even down to saving a BASIC program into a file which imitates the form in which it would have been written to cassette on a real SOL-20. Then with a bit of "geek detective" skills, one can piece together how the data is structured. (First you have to figure out how to do anything in the emulator in the first place... it meticulously imitates everything on the SOL, including the things that are a pain in the butt like the need to enter all commands in uppercase and the need to load BASIC first before using BASIC programs.)

If you enter this program:

10 PRINT "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abc"
20 FOR I=1 TO 10
30 PRINT I,I*4
40 NEXT I
50 PRINT "Done."
60 END

then save it to a "virtual cassette" (and use the "File" menu of the "virtual cassette player" window to save that to a real disk file on your computer; it will have a .SVT extension), you get this (in a format specific to the emulator, but to some extent a representation of what would be written to a cassette on a real SOL-20; however, it can't entirely be relied upon in this regard since some of its content is emulator-specific):

C 29
H PROG C2 005B 1AD9 0000
D 2E0A0089224142434445464748494A4B
D 4C4D4E4F505152535455565758595A30
D 313233343536373839616263220D0B14
D 008849F5319E31300D0A1E0089492C49
D E2340D06280081490D0C32008922446F
D 6E652E220D053C008D0D0120

C 10

The lines starting with "C" and "H" appear to be part of the filesystem (which include the name the program was saved as, "PROG"), so any documentation on their format belongs in the Filesystem section (or would if they were the true tape format of a SOL-20 rather than just an emulated version that might differ). The "D" lines encode the program file data itself, as a series of hexadecimal digits. Take them in pairs to get the successive bytes of the tokenized BASIC file.

It appears to be a series of program lines, separated by the carriage return character (hex 0D). The first byte of each line (each program line, that is; ignore the physical lines in the representation of the file and divide lines only at 0D bytes) represents the number of bytes the line takes up; you can quickly skip to the next line by going forward that number of bytes. The next two bytes are the line number, represented as a 2-byte unsigned integer (little-endian). Then follows the tokenized program code itself. ASCII printable characters represent themselves (in literal strings, variable names, and numeric constants, the latter of which are simply stored as the series of ASCII digits instead of being encoded as integer or floating-point numbers as some other BASICs do). Some symbols like the quote marks are also represented as their plain ASCII values, though others (such as the equal sign) have different token representations (apparently to signal that they are operators or functions with special meaning). The keywords of BASIC each have a byte (in the high-bit-set range from 80 to FF hex) representing them; for instance, 89 hex is "PRINT". All spaces other than ones within quoted strings are stripped, as they are unnecessary to the syntax. They are added back on listing the program.

In this manner, it should be possible to build a list of all the tokens by writing a program that uses all of them and seeing how it ends up when saved.

But even better...

But it's not necessary to go to all this work to find out the tokens, since the person who created the emulator (Jim Battle) has done the detective work already. You can find the token list for BASIC-80 (the standard Sol BASIC) within the code of a Perl script here (you need to unZIP it), and a similar list for the different tokens of the different "Extended BASIC" also available for the SOL-20 here.

Note that BASIC-80 and Extended BASIC (the two major BASICs used on Sol computers) use entirely different token lists.

BASIC-80 tokens

Blank values indicate either that the token is unused or is used for something unknown.

Hex Dec Token meaning
80 128 LET
81 129 NEXT
82 130 IF
83 131 GOTO
84 132 GOSUB
85 133 RETURN
86 134 READ
87 135 DATA
88 136 FOR
89 137 PRINT
8A 138 INPUT
8B 139 DIM
8C 140 STOP
8D 141 END
8E 142 RESTORE
8F 143 REM
90 144 CLEAR
91 145 SET
92 146 FILE
93 147 CLOSE
94 148 BYE
95 149 :
96 150 ;
97 151
98 152
99 153
9A 154
9B 155
9C 156 TAB
9D 157 THEN
9E 158 TO
9F 159 STEP
A0 160 RUN
A1 161 LIST
A2 162 NEW
A3 163 SAVE
A4 164 GET
A5 165 EDIT
A6 166 XEQ
A7 167
A8 168
A9 169
AA 170
AB 171
AC 172
AD 173
AE 174
AF 175
B0 176
B1 177
B2 178
B3 179
B4 180
B5 181
B6 182
B7 183
B8 184
B9 185
BA 186
BB 187
BC 188
BD 189
BE 190
BF 191
C0 192
C1 193
C2 194
C3 195
C4 196 SQR
C5 197
C6 198 INT
C7 199
C8 200
C9 201
CA 202
CB 203
CC 204 ARG
CD 205 CALL
CE 206 RND
CF 207
D0 208
D1 209
D2 210 SGN
D3 211 SIN
D4 212
D5 213
D6 214
D7 215 TAN
D8 216 COS
D9 217
DA 218
DB 219
DC 220
DD 221
DE 222
DF 223
E0 224 (
E1 225
E2 226 *
E3 227 +
E4 228
E5 229 -
E6 230
E7 231 /
E8 232
E9 233
EA 234
EB 235
EC 236
ED 237
EE 238
EF 239 >=
E0 240 <=
F1 241 <>
F2 242
F3 243
F4 244 <
F5 245 =
F6 246 >
F7 247
F8 248
F9 249
FA 250
FB 251
FC 252
FD 253
FE 254
FF 255

Extended BASIC tokens

Blank values indicate either that the token is unused or is used for something unknown.

Hex Dec Token meaning
80 128 STEP
81 129 TO
82 130 ELSE
83 131 THEN
84 132 FN
85 133 TAB
86 134 CHR
87 135 ASC
88 136 ERR
89 137 VAL
8A 138 STR
8B 139 ZER
8C 140 CON
8D 141 IDN
8E 142 INV
8F 143 TRN
90 144 LL=
91 145 ML=
92 146 IP=
93 147 OS=
94 148 DS=
95 149 DB=
96 150 Indicates that the next 2 bytes are a line number (little-endian unsigned integer)
97 151 :
98 152 LET
99 153 FOR
9A 154 PRINT
9B 155 NEXT
9C 156 IF
9D 157 READ
9E 158 INPUT
9F 159 DATA
A0 160 GOTO
A1 161 GOSUB
A2 162 RETURN
A3 163 DIM
A4 164 STOP
A5 165 END
A6 166 RESTORE
A7 167 REM
A8 168 FNEND
A9 169 DEF
AA 170 ON
AB 171 OUT
AC 172 POKE
AD 173 BYE
AE 174 SET
AF 175 SCR
B0 176 CLEAR
B1 177 XEQ
B2 178 FILE
B3 179 REWIND
B4 180 CLOSE
B5 181 CURSOR
B6 182 WAIT
B7 183 SEARCH
B8 184 TUON
B9 185 TUOFF
BA 186 ERRSET
BB 187 ERRCLR
BC 188 MAT
BD 189 PAUSE
BE 190 EXIT
BF 191 RUN
C0 192 LIST
C1 193 CONT
C2 194 EDIT
C3 195 DEL
C4 196 GET
C5 197 APPEND
C6 198 SAVE
C7 199 REN
C8 200 (
C9 201
CA 202 *
CB 203 +
CC 204 -
CD 205 /
CE 206 AND
CF 207 OR
D0 208 >=
D1 209 <=
D2 210 <>
D3 211 <
D4 212 =
D5 213 >
D6 214 NOT
D7 215 ^
D8 216
D9 217 ABS
DA 218 INT
DB 219 LEN
DC 220 CALL
DD 221 RND
DE 222 SGN
DF 223 POS
E0 224 EOF
E1 225 TYP
E2 226 SIN
E3 227 SQR
E4 228 FREE
E5 229 INP
E6 230 PEEK
E7 231 COS
E8 232 EXP
E9 233 TAN
EA 234 ATN
EB 235 LOG10
EC 236 LOG
ED 237
EE 238
EF 239
E0 240
F1 241
F2 242
F3 243
F4 244
F5 245
F6 246
F7 247
F8 248
F9 249
FA 250
FB 251
FC 252
FD 253
FE 254
FF 255

Software

Documentation

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox