Sol BASIC tokenized file

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
(Add BASIC-80 tokens)
Line 44: Line 44:
  
 
But it's not necessary to go to all this work to find out the tokens, since the person who created the emulator (Jim Battle) has done the detective work already. You can find the token list for BASIC-80 (the standard Sol BASIC) within the code of a [[Perl]] script [http://sol20.org/utility/basic5.zip here] (you need to un[[ZIP]] it), and a similar list for the different tokens of the different "Extended BASIC" also available for the SOL-20 [http://sol20.org/utility/extbas.zip here].
 
But it's not necessary to go to all this work to find out the tokens, since the person who created the emulator (Jim Battle) has done the detective work already. You can find the token list for BASIC-80 (the standard Sol BASIC) within the code of a [[Perl]] script [http://sol20.org/utility/basic5.zip here] (you need to un[[ZIP]] it), and a similar list for the different tokens of the different "Extended BASIC" also available for the SOL-20 [http://sol20.org/utility/extbas.zip here].
 +
 +
Note that BASIC-80 and Extended BASIC (the two major BASICs used on Sol computers) use entirely different token lists.
 +
 +
== BASIC-80 tokens ==
 +
 +
Blank values indicate either that the token is unused or is used for something unknown.
 +
 +
{| class="wikitable"
 +
! title="Hexadecimal code point" | Hex
 +
! title="Decimal code point" | Dec
 +
! title="BASIC element the token stands for" | Token meaning
 +
|-
 +
|80||128||LET
 +
|-
 +
|81||129||NEXT
 +
|-
 +
|82||130||IF
 +
|-
 +
|83||131||GOTO
 +
|-
 +
|84||132||GOSUB
 +
|-
 +
|85||133||RETURN
 +
|-
 +
|86||134||READ
 +
|-
 +
|87||135||DATA
 +
|-
 +
|88||136||FOR
 +
|-
 +
|89||137||PRINT
 +
|-
 +
|8A||138||INPUT
 +
|-
 +
|8B||139||DIM
 +
|-
 +
|8C||140||STOP
 +
|-
 +
|8D||141||END
 +
|-
 +
|8E||142||RESTORE
 +
|-
 +
|8F||143||REM
 +
|-
 +
|90||144||CLEAR
 +
|-
 +
|91||145||SET
 +
|-
 +
|92||146||FILE
 +
|-
 +
|93||147||CLOSE
 +
|-
 +
|94||148||BYE
 +
|-
 +
|95||149||:
 +
|-
 +
|96||150||;
 +
|-
 +
|97||151||
 +
|-
 +
|98||152||
 +
|-
 +
|99||153||
 +
|-
 +
|9A||154||
 +
|-
 +
|9B||155||
 +
|-
 +
|9C||156||TAB
 +
|-
 +
|9D||157||THEN
 +
|-
 +
|9E||158||TO
 +
|-
 +
|9F||159||STEP
 +
|-
 +
|A0||160||RUN
 +
|-
 +
|A1||161||LIST
 +
|-
 +
|A2||162||NEW
 +
|-
 +
|A3||163||SAVE
 +
|-
 +
|A4||164||GET
 +
|-
 +
|A5||165||EDIT
 +
|-
 +
|A6||166||XEQ
 +
|-
 +
|A7||167||
 +
|-
 +
|A8||168||
 +
|-
 +
|A9||169||
 +
|-
 +
|AA||170||
 +
|-
 +
|AB||171||
 +
|-
 +
|AC||172||
 +
|-
 +
|AD||173||
 +
|-
 +
|AE||174||
 +
|-
 +
|AF||175||
 +
|-
 +
|B0||176||
 +
|-
 +
|B1||177||
 +
|-
 +
|B2||178||
 +
|-
 +
|B3||179||
 +
|-
 +
|B4||180||
 +
|-
 +
|B5||181||
 +
|-
 +
|B6||182||
 +
|-
 +
|B7||183||
 +
|-
 +
|B8||184||
 +
|-
 +
|B9||185||
 +
|-
 +
|BA||186||
 +
|-
 +
|BB||187||
 +
|-
 +
|BC||188||
 +
|-
 +
|BD||189||
 +
|-
 +
|BE||190||
 +
|-
 +
|BF||181||
 +
|-
 +
|C0||192||
 +
|-
 +
|C1||193||
 +
|-
 +
|C2||194||
 +
|-
 +
|C3||195||
 +
|-
 +
|C4||196||SQR
 +
|-
 +
|C5||197||
 +
|-
 +
|C6||198||INT
 +
|-
 +
|C7||199||
 +
|-
 +
|C8||200||
 +
|-
 +
|C9||201||
 +
|-
 +
|CA||202||
 +
|-
 +
|CB||203||
 +
|-
 +
|CC||204||ARG
 +
|-
 +
|CD||205||CALL
 +
|-
 +
|CE||206||RND
 +
|-
 +
|CF||207||
 +
|-
 +
|D0||208||
 +
|-
 +
|D1||209||
 +
|-
 +
|D2||210||SGN
 +
|-
 +
|D3||211||SIN
 +
|-
 +
|D4||212||
 +
|-
 +
|D5||213||
 +
|-
 +
|D6||214||
 +
|-
 +
|D7||215||TAN
 +
|-
 +
|D8||216||COS
 +
|-
 +
|D9||217||
 +
|-
 +
|DA||218||
 +
|-
 +
|DB||219||
 +
|-
 +
|DC||220||
 +
|-
 +
|DD||221||
 +
|-
 +
|DE||222||
 +
|-
 +
|DF||223||
 +
|-
 +
|E0||224||(
 +
|-
 +
|E1||225||
 +
|-
 +
|E2||226||*
 +
|-
 +
|E3||227||+
 +
|-
 +
|E4||228||
 +
|-
 +
|E5||229||-
 +
|-
 +
|E6||230||
 +
|-
 +
|E7||231||/
 +
|-
 +
|E8||232||
 +
|-
 +
|E9||233||
 +
|-
 +
|EA||234||
 +
|-
 +
|EB||235||
 +
|-
 +
|EC||236||
 +
|-
 +
|ED||237||
 +
|-
 +
|EE||238||
 +
|-
 +
|EF||239||>=
 +
|-
 +
|E0||240||<=
 +
|-
 +
|F1||241||<>
 +
|-
 +
|F2||242||
 +
|-
 +
|F3||243||
 +
|-
 +
|F4||244||<
 +
|-
 +
|F5||245||=
 +
|-
 +
|F6||246||>
 +
|-
 +
|F7||247||
 +
|-
 +
|F8||248||
 +
|-
 +
|F9||249||
 +
|-
 +
|FA||250||
 +
|-
 +
|FB||251||
 +
|-
 +
|FC||252||
 +
|-
 +
|FD||253||
 +
|-
 +
|FE||254||
 +
|-
 +
|FF||255||
 +
|}
  
 
== Software ==
 
== Software ==

Revision as of 18:17, 5 January 2013

File Format
Name Sol BASIC tokenized file
Ontology
Released 1976

Sol BASIC screen shot (as simulated in Solace)

Sol BASIC screen shot (as simulated in Solace)

Sol was a line of computers in the late 1970s, the most popular of which was the SOL-20. It was one of the S-100 bus computers of that era which, if you added a disk drive, could run the CP/M operating system, but was often used with cassette data storage instead. It had a version of the BASIC programming language (not in ROM; you had to load it). When you saved a BASIC program to tape or disk, you could add a parameter to the SAVE command to make it save the program as plain text, which was more suitable for transfer to other systems. However, the default save mode was the more compact (but less transferable) tokenized form. On cassette, the low-level format was Kansas City standard (Or maybe CUTS?).

Contents

Documenting the format

No documentation of the specific tokenized format appeared to be readily accessible (but see below), but it is possible to piece it together with the help of the Solace emulator (linked below). It does a great job of imitating a SOL-20 computer in MS-Windows, even down to saving a BASIC program into a file which imitates the form in which it would have been written to cassette on a real SOL-20. Then with a bit of "geek detective" skills, one can piece together how the data is structured. (First you have to figure out how to do anything in the emulator in the first place... it meticulously imitates everything on the SOL, including the things that are a pain in the butt like the need to enter all commands in uppercase and the need to load BASIC first before using BASIC programs.)

If you enter this program:

10 PRINT "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abc"
20 FOR I=1 TO 10
30 PRINT I,I*4
40 NEXT I
50 PRINT "Done."
60 END

then save it to a "virtual cassette" (and use the "File" menu of the "virtual cassette player" window to save that to a real disk file on your computer; it will have a .SVT extension), you get this (in a format specific to the emulator, but to some extent a representation of what would be written to a cassette on a real SOL-20; however, it can't entirely be relied upon in this regard since some of its content is emulator-specific):

C 29
H PROG C2 005B 1AD9 0000
D 2E0A0089224142434445464748494A4B
D 4C4D4E4F505152535455565758595A30
D 313233343536373839616263220D0B14
D 008849F5319E31300D0A1E0089492C49
D E2340D06280081490D0C32008922446F
D 6E652E220D053C008D0D0120

C 10

The lines starting with "C" and "H" appear to be part of the filesystem (which include the name the program was saved as, "PROG"), so any documentation on their format belongs in the Filesystem section (or would if they were the true tape format of a SOL-20 rather than just an emulated version that might differ). The "D" lines encode the program file data itself, as a series of hexadecimal digits. Take them in pairs to get the successive bytes of the tokenized BASIC file.

It appears to be a series of program lines, separated by the carriage return character (hex 0D). The first byte of each line (each program line, that is; ignore the physical lines in the representation of the file and divide lines only at 0D bytes) represents the number of bytes the line takes up; you can quickly skip to the next line by going forward that number of bytes. The next two bytes are the line number, represented as a 2-byte unsigned integer (little-endian). Then follows the tokenized program code itself. ASCII printable characters represent themselves (in literal strings, variable names, and numeric constants, the latter of which are simply stored as the series of ASCII digits instead of being encoded as integer or floating-point numbers as some other BASICs do). Some symbols like the quote marks are also represented as their plain ASCII values, though others (such as the equal sign) have different token representations (apparently to signal that they are operators or functions with special meaning). The keywords of BASIC each have a byte (in the high-bit-set range from 80 to FF hex) representing them; for instance, 89 hex is "PRINT". All spaces other than ones within quoted strings are stripped, as they are unnecessary to the syntax. They are added back on listing the program.

In this manner, it should be possible to build a list of all the tokens by writing a program that uses all of them and seeing how it ends up when saved.

But even better...

But it's not necessary to go to all this work to find out the tokens, since the person who created the emulator (Jim Battle) has done the detective work already. You can find the token list for BASIC-80 (the standard Sol BASIC) within the code of a Perl script here (you need to unZIP it), and a similar list for the different tokens of the different "Extended BASIC" also available for the SOL-20 here.

Note that BASIC-80 and Extended BASIC (the two major BASICs used on Sol computers) use entirely different token lists.

BASIC-80 tokens

Blank values indicate either that the token is unused or is used for something unknown.

Hex Dec Token meaning
80 128 LET
81 129 NEXT
82 130 IF
83 131 GOTO
84 132 GOSUB
85 133 RETURN
86 134 READ
87 135 DATA
88 136 FOR
89 137 PRINT
8A 138 INPUT
8B 139 DIM
8C 140 STOP
8D 141 END
8E 142 RESTORE
8F 143 REM
90 144 CLEAR
91 145 SET
92 146 FILE
93 147 CLOSE
94 148 BYE
95 149 :
96 150 ;
97 151
98 152
99 153
9A 154
9B 155
9C 156 TAB
9D 157 THEN
9E 158 TO
9F 159 STEP
A0 160 RUN
A1 161 LIST
A2 162 NEW
A3 163 SAVE
A4 164 GET
A5 165 EDIT
A6 166 XEQ
A7 167
A8 168
A9 169
AA 170
AB 171
AC 172
AD 173
AE 174
AF 175
B0 176
B1 177
B2 178
B3 179
B4 180
B5 181
B6 182
B7 183
B8 184
B9 185
BA 186
BB 187
BC 188
BD 189
BE 190
BF 181
C0 192
C1 193
C2 194
C3 195
C4 196 SQR
C5 197
C6 198 INT
C7 199
C8 200
C9 201
CA 202
CB 203
CC 204 ARG
CD 205 CALL
CE 206 RND
CF 207
D0 208
D1 209
D2 210 SGN
D3 211 SIN
D4 212
D5 213
D6 214
D7 215 TAN
D8 216 COS
D9 217
DA 218
DB 219
DC 220
DD 221
DE 222
DF 223
E0 224 (
E1 225
E2 226 *
E3 227 +
E4 228
E5 229 -
E6 230
E7 231 /
E8 232
E9 233
EA 234
EB 235
EC 236
ED 237
EE 238
EF 239 >=
E0 240 <=
F1 241 <>
F2 242
F3 243
F4 244 <
F5 245 =
F6 246 >
F7 247
F8 248
F9 249
FA 250
FB 251
FC 252
FD 253
FE 254
FF 255

Software

Documentation

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox