Tandy 200 BASIC tokenized file
(→File Format: Add specific details on the bytes in the file.) |
m (→Calculating PL PH) |
||
(32 intermediate revisions by 2 users not shown) | |||
Line 2: | Line 2: | ||
|subcat=Source code | |subcat=Source code | ||
|subcat2=Tokenized BASIC | |subcat2=Tokenized BASIC | ||
− | |released= | + | |released=1983 |
+ | |extensions={{ext|ba}} | ||
}} | }} | ||
− | '''Tandy 200 BASIC''' was a version of Microsoft BASIC for the Radio Shack Tandy 200 computer. The | + | '''Tandy 200 BASIC''' was a version of Microsoft BASIC for the Radio Shack Tandy 200 computer. The tokenizations for the TRS-80 Model 100 and Tandy 102 appear to be identical. In fact, while the research that created this page was done on a Tandy 200, experiments using [https://sourceforge.net/projects/virtualt/ emulated systems] indicate that the tokenized BASIC file format is the same across most of the computers related to the Kyocera Kyotronic-85, with the exception being the models made by NEC. |
− | == | + | {| class="wikitable" style="margin-left: auto; margin-right: auto; border: none;" |
+ | |+ Which of the Kyotronic-85 Sisters use this format | ||
+ | ! title="Name of the manufacturer and computer model" | Computer Model | ||
+ | ! title="Whether the tokenization is the same as for the Tandy 200" | Uses Tandy 200 Tokenization? | ||
+ | |- style=text-align:center | ||
+ | |Kyocera Kyotronic-85<br/>Olivetti M10<br/>TRS-80 Model 100<br/>Tandy 200<br/>Tandy 102||Yes | ||
+ | |- style=text-align:center | ||
+ | | NEC PC-8201 <br/> NEC PC-8201A <br/> NEC PC-8300 || No | ||
+ | |} | ||
− | + | While the NEC PC-8201/8300 format is not the same, the BASIC it runs (N82 BASIC) is very close and it is expected the tokenization will be similar. | |
+ | |||
+ | == Tokens == | ||
+ | All BASIC tokens on the Tandy 200 are a single byte with a value from 128 to 255. | ||
{| class="wikitable" | {| class="wikitable" | ||
− | ! title=" | + | |+ Table mapping token values to BASIC keywords |
− | ! title=" | + | ! title="The value of the token in hexadecimal" | Hex |
− | ! title="BASIC | + | ! title="The value of the token in decimal" | Dec |
+ | ! title="The BASIC keyword the token stands for" | Keyword | ||
+ | ! title="Miscellany" | Notes | ||
|- | |- | ||
|- | |- | ||
Line 51: | Line 65: | ||
|90||144||WIDTH | |90||144||WIDTH | ||
|- | |- | ||
− | |91||145||ELSE | + | |91||145||ELSE||When tokenizing, the Tandy 200 always adds a colon (':') before the ELSE token (3A 91). If the user actually writes :ELSE it is tokenized as 3A 3A 91. |
|- | |- | ||
|92||146||LINE | |92||146||LINE | ||
Line 71: | Line 85: | ||
|9A||154||CLOSE | |9A||154||CLOSE | ||
|- | |- | ||
− | |9B||155||LOAD | + | |9B||155||LOAD||Note that LOADM is simply the token for LOAD followed by an ASCII 'M'. |
|- | |- | ||
|9C||156||MERGE | |9C||156||MERGE | ||
− | |- | + | |-QUOTE |
|9D||157||FILES | |9D||157||FILES | ||
|- | |- | ||
Line 205: | Line 219: | ||
|DD||221||= | |DD||221||= | ||
|- | |- | ||
− | |DE||222|| | + | |DE||222||< |
|- | |- | ||
|DF||223||SGN | |DF||223||SGN | ||
Line 271: | Line 285: | ||
|FE||254||MID$ | |FE||254||MID$ | ||
|- | |- | ||
− | |FF||255||' (QUOTE) | + | |FF||255||' (QUOTE) || When tokenizing, the single quote character expands to three characters: a colon (3A), the byte for REM (8E), and then FF. |
|} | |} | ||
Line 280: | Line 294: | ||
{|class="wikitable" | {|class="wikitable" | ||
+ | |+ Format of a single tokenized line | ||
!Name!!Length!!Description | !Name!!Length!!Description | ||
|- | |- | ||
|PL PH | |PL PH | ||
|2 bytes | |2 bytes | ||
− | |Address of the next line | + | |Address in RAM of the next line of BASIC; unsigned 16-bit integer, little-endian. In a file, PL PH is merely a placeholder and the value is never used. In RAM, PL PH == 0 marks the end of the tokenized BASIC listing. (See [[#Details for PL PH]].) |
|- | |- | ||
|LL LH | |LL LH | ||
|2 bytes | |2 bytes | ||
− | |Line number, little endian. | + | |Line number; unsigned 16-bit integer, little-endian. |
|- | |- | ||
− | | | + | |B<sub>0</sub> … B<sub>n</sub> |
− | |Any number | + | |Any number of bytes |
− | | | + | |Depending on the high bit, each byte is either a token or an ASCII character. |
|- | |- | ||
|NULL | |NULL | ||
|1 byte | |1 byte | ||
− | |NULL byte ( | + | |NULL byte (<code>0x00</code>) to signal end of line. |
|} | |} | ||
+ | |||
+ | Typically, the list of tokenized lines continues until the end of the file. Once a program is loaded into RAM, the end is signaled by two consecutive NULLs in the location where PL PH would normally start the next line. | ||
+ | |||
+ | === Limits === | ||
+ | |||
+ | * Line numbers can range from 0 to 65529 (but see variances below). | ||
+ | * Tokenized line length is limited to 256 characters, including the trailing NULL byte. (To do: double check this.) | ||
+ | |||
+ | === Acceptable variances === | ||
+ | |||
+ | While Tandy BASIC will never generate such files, it has no problem loading a tokenized program that: | ||
+ | |||
+ | * Has line numbers out of order; the numerical value of a line number is ignored. Program lines are always LISTed and RUN in the order they appear in the tokenized file. | ||
+ | * Contains duplicate line numbers; all lines are kept and run in the order found in the file. (GOTO is complicated as search starts at the next line, not the beginning of the program.) | ||
+ | * Contains a line with zero bytes of data; LIST will show the line number with no text afterward. Does not replace or delete a preexisting line with that number. | ||
+ | * Contains arbitrary bytes for PL PH; the value of PL PH is always recalculated at LOAD time. | ||
+ | * Contains line numbers greater than the nominal maximum of 65529; this can be used to store hidden binary data [https://www.mail-archive.com/m100@lists.bitchin100.com/msg16195.html according to John R. Hogerhuis]. | ||
+ | |||
+ | ''Sidenote'': while these variances are accepted by actual hardware, emulators may not be able to handle such files. For example, [https://sourceforge.net/projects/virtualt/ Virtual T] (as of version 1.7 in 2022) refuses to load tokenized BASIC programs that have lines out of order, duplicate lines, or bogus values for PL PH. | ||
+ | |||
+ | === Details for PL PH === | ||
+ | In a file, the specific values of the 16-bit integer PL PH addresses are artifacts of where the file happened to be in memory before saving. The file's values are ''never'' used as Tandy BASIC always recalculates PL PH upon loading a program into RAM. | ||
+ | |||
+ | Although not technically part of the file format, one may want to calculate PL PH for compatibility with broken or persnickety emulators. Expand the section below to see details. | ||
+ | |||
+ | <div class="toccolours mw-collapsible mw-collapsed"> | ||
+ | ==== Calculating PL PH ==== | ||
+ | <div class="mw-collapsible-content"> | ||
+ | |||
+ | Once in RAM, PL PH represents the absolute memory address of the next line. According to John Hogerhuis, PL PH is positioned before the line's data so the BASIC interpreter can quickly skip ahead to reach the target line of GOTO, GOSUB, THEN, ELSE, RESTORE, RESUME, RUN, LIST, LLIST, or EDIT.<ref group="note">Searching begins at the ''next'' line in the program. The claim that one can speed up commonly used subroutines by putting them at low numbered lines is plausible as the search would detect immediately that the next line's number was too high and restart the search from the beginning.</ref>) | ||
+ | Additionally, the end of a tokenized BASIC listing is marked by the final line's PL PH equal to zero (two NULL bytes). | ||
+ | |||
+ | Note that because the addresses are absolute, not relative, their values depend not only on the program code but on an initial offset. That offset is why identical programs may not have identical checksums (but see the bacmp program). | ||
+ | |||
+ | By default on the Tandy 200, the first token of the first line of the BASIC program in RAM is at address 0xA001. For all other Model T computers, including the Model 100, the first token is at 0x8001. | ||
+ | |||
+ | ---- | ||
+ | |||
+ | <references group="note" /> | ||
+ | |||
+ | </div> | ||
+ | </div> | ||
== Format documentation == | == Format documentation == | ||
Line 303: | Line 360: | ||
* [https://m100.bitchin100.narkive.com/eP2uSl4J/tokenized-basic-programs-bytes-of-mystery Discussion of byte format on Bitchin 100 Mailing List] | * [https://m100.bitchin100.narkive.com/eP2uSl4J/tokenized-basic-programs-bytes-of-mystery Discussion of byte format on Bitchin 100 Mailing List] | ||
* [http://www.club100.org/library/ups/tokens.do BASIC program to create a table of tokens] | * [http://www.club100.org/library/ups/tokens.do BASIC program to create a table of tokens] | ||
+ | * [https://archive.org/details/HiddenPowersOfTheTrs80Model100/page/n229/mode/2up?view=theater Hidden Powers of the TRS 80 Model 100]. Appendix B lists the BASIC token values for the Model 100. Other than the omission of 255 for a single quote remark, this reference matches what hackerb9 found programmatically for the Tandy 200. | ||
+ | * [http://web.archive.org/web/20220324135218/https://help.ayra.ch/trs80-reference#trs-80-model-100-basic-reference Ayra Model 100 BASIC Reference] | ||
== Other links == | == Other links == | ||
+ | * [https://github.com/hackerb9/tokenize/ Hackerb9's C program for tokenizing BASIC code] | ||
* [https://sourceforge.net/projects/virtualt/ A Tandy 200 & Model 100/102 emulator] | * [https://sourceforge.net/projects/virtualt/ A Tandy 200 & Model 100/102 emulator] | ||
* [http://bitchin100.com/CloudT/ A TRS-80 Model 100 emulator that runs in a web browser] | * [http://bitchin100.com/CloudT/ A TRS-80 Model 100 emulator that runs in a web browser] | ||
+ | [[Category:TRS-80]] | ||
[[Category:Microsoft]] | [[Category:Microsoft]] |
Latest revision as of 04:19, 18 July 2024
Tandy 200 BASIC was a version of Microsoft BASIC for the Radio Shack Tandy 200 computer. The tokenizations for the TRS-80 Model 100 and Tandy 102 appear to be identical. In fact, while the research that created this page was done on a Tandy 200, experiments using emulated systems indicate that the tokenized BASIC file format is the same across most of the computers related to the Kyocera Kyotronic-85, with the exception being the models made by NEC.
Computer Model | Uses Tandy 200 Tokenization? |
---|---|
Kyocera Kyotronic-85 Olivetti M10 TRS-80 Model 100 Tandy 200 Tandy 102 |
Yes |
NEC PC-8201 NEC PC-8201A NEC PC-8300 |
No |
While the NEC PC-8201/8300 format is not the same, the BASIC it runs (N82 BASIC) is very close and it is expected the tokenization will be similar.
Contents |
[edit] Tokens
All BASIC tokens on the Tandy 200 are a single byte with a value from 128 to 255.
Hex | Dec | Keyword | Notes |
---|---|---|---|
80 | 128 | END | |
81 | 129 | FOR | |
82 | 130 | NEXT | |
83 | 131 | DATA | |
84 | 132 | INPUT | |
85 | 133 | DIM | |
86 | 134 | READ | |
87 | 135 | LET | |
88 | 136 | GOTO | |
89 | 137 | RUN | |
8A | 138 | IF | |
8B | 139 | RESTORE | |
8C | 140 | GOSUB | |
8D | 141 | RETURN | |
8E | 142 | REM | |
8F | 143 | STOP | |
90 | 144 | WIDTH | |
91 | 145 | ELSE | When tokenizing, the Tandy 200 always adds a colon (':') before the ELSE token (3A 91). If the user actually writes :ELSE it is tokenized as 3A 3A 91. |
92 | 146 | LINE | |
93 | 147 | EDIT | |
94 | 148 | ERROR | |
95 | 149 | RESUME | |
96 | 150 | OUT | |
97 | 151 | ON | |
98 | 152 | DSKO$ | |
99 | 153 | OPEN | |
9A | 154 | CLOSE | |
9B | 155 | LOAD | Note that LOADM is simply the token for LOAD followed by an ASCII 'M'. |
9C | 156 | MERGE | |
9D | 157 | FILES | |
9E | 158 | SAVE | |
9F | 159 | LFILES | |
A0 | 160 | LPRINT | |
A1 | 161 | DEF | |
A2 | 162 | POKE | |
A3 | 163 | ||
A4 | 164 | CONT | |
A5 | 165 | LIST | |
A6 | 166 | LLIST | |
A7 | 167 | CLEAR | |
A8 | 168 | CLOAD | |
A9 | 169 | CSAVE | |
AA | 170 | TIME$ | |
AB | 171 | DATE$ | |
AC | 172 | DAY$ | |
AD | 173 | COM | |
AE | 174 | MDM | |
AF | 175 | KEY | |
B0 | 176 | CLS | |
B1 | 177 | BEEP | |
B2 | 178 | SOUND | |
B3 | 179 | LCOPY | |
B4 | 180 | PSET | |
B5 | 181 | PRESET | |
B6 | 182 | MOTOR | |
B7 | 183 | MAX | |
B8 | 184 | POWER | |
B9 | 185 | CALL | |
BA | 186 | MENU | |
BB | 187 | IPL | |
BC | 188 | NAME | |
BD | 189 | KILL | |
BE | 190 | SCREEN | |
BF | 191 | NEW | |
C0 | 192 | TAB( | |
C1 | 193 | TO | |
C2 | 194 | USING | |
C3 | 195 | VARPTR | |
C4 | 196 | ERL | |
C5 | 197 | ERR | |
C6 | 198 | STRING$ | |
C7 | 199 | INSTR | |
C8 | 200 | DSKI$ | |
C9 | 201 | INKEY$ | |
CA | 202 | CSRLIN | |
CB | 203 | OFF | |
CC | 204 | HIMEM | |
CD | 205 | THEN | |
CE | 206 | NOT | |
CF | 207 | STEP | |
D0 | 208 | + | |
D1 | 209 | - | |
D2 | 210 | * | |
D3 | 211 | / | |
D4 | 212 | ^ | |
D5 | 213 | AND | |
D6 | 214 | OR | |
D7 | 215 | XOR | |
D8 | 216 | EQV | |
D9 | 217 | IMP | |
DA | 218 | MOD | |
DB | 219 | \ | |
DC | 220 | > | |
DD | 221 | = | |
DE | 222 | < | |
DF | 223 | SGN | |
E0 | 224 | INT | |
E1 | 225 | ABS | |
E2 | 226 | FRE | |
E3 | 227 | INP | |
E4 | 228 | LPOS | |
E5 | 229 | POS | |
E6 | 230 | SQR | |
E7 | 231 | RND | |
E8 | 232 | LOG | |
E9 | 233 | EXP | |
EA | 234 | COS | |
EB | 235 | SIN | |
EC | 236 | TAN | |
ED | 237 | ATN | |
EE | 238 | PEEK | |
EF | 239 | EOF | |
F0 | 240 | LOC | |
F1 | 241 | LOF | |
F2 | 242 | CINT | |
F3 | 243 | CSNG | |
F4 | 244 | CDBL | |
F5 | 245 | FIX | |
F6 | 246 | LEN | |
F7 | 247 | STR$ | |
F8 | 248 | VAL | |
F9 | 249 | ASC | |
FA | 250 | CHR$ | |
FB | 251 | SPACE$ | |
FC | 252 | LEFT$ | |
FD | 253 | RIGHT$ | |
FE | 254 | MID$ | |
FF | 255 | ' (QUOTE) | When tokenizing, the single quote character expands to three characters: a colon (3A), the byte for REM (8E), and then FF. |
[edit] File Format
Tokenised BASIC code is a sequence of tokenized lines. Each tokenized line has the following format:
Name | Length | Description |
---|---|---|
PL PH | 2 bytes | Address in RAM of the next line of BASIC; unsigned 16-bit integer, little-endian. In a file, PL PH is merely a placeholder and the value is never used. In RAM, PL PH == 0 marks the end of the tokenized BASIC listing. (See #Details for PL PH.) |
LL LH | 2 bytes | Line number; unsigned 16-bit integer, little-endian. |
B0 … Bn | Any number of bytes | Depending on the high bit, each byte is either a token or an ASCII character. |
NULL | 1 byte | NULL byte (0x00 ) to signal end of line.
|
Typically, the list of tokenized lines continues until the end of the file. Once a program is loaded into RAM, the end is signaled by two consecutive NULLs in the location where PL PH would normally start the next line.
[edit] Limits
- Line numbers can range from 0 to 65529 (but see variances below).
- Tokenized line length is limited to 256 characters, including the trailing NULL byte. (To do: double check this.)
[edit] Acceptable variances
While Tandy BASIC will never generate such files, it has no problem loading a tokenized program that:
- Has line numbers out of order; the numerical value of a line number is ignored. Program lines are always LISTed and RUN in the order they appear in the tokenized file.
- Contains duplicate line numbers; all lines are kept and run in the order found in the file. (GOTO is complicated as search starts at the next line, not the beginning of the program.)
- Contains a line with zero bytes of data; LIST will show the line number with no text afterward. Does not replace or delete a preexisting line with that number.
- Contains arbitrary bytes for PL PH; the value of PL PH is always recalculated at LOAD time.
- Contains line numbers greater than the nominal maximum of 65529; this can be used to store hidden binary data according to John R. Hogerhuis.
Sidenote: while these variances are accepted by actual hardware, emulators may not be able to handle such files. For example, Virtual T (as of version 1.7 in 2022) refuses to load tokenized BASIC programs that have lines out of order, duplicate lines, or bogus values for PL PH.
[edit] Details for PL PH
In a file, the specific values of the 16-bit integer PL PH addresses are artifacts of where the file happened to be in memory before saving. The file's values are never used as Tandy BASIC always recalculates PL PH upon loading a program into RAM.
Although not technically part of the file format, one may want to calculate PL PH for compatibility with broken or persnickety emulators. Expand the section below to see details.
[edit] Calculating PL PH
Once in RAM, PL PH represents the absolute memory address of the next line. According to John Hogerhuis, PL PH is positioned before the line's data so the BASIC interpreter can quickly skip ahead to reach the target line of GOTO, GOSUB, THEN, ELSE, RESTORE, RESUME, RUN, LIST, LLIST, or EDIT.[note 1]) Additionally, the end of a tokenized BASIC listing is marked by the final line's PL PH equal to zero (two NULL bytes).
Note that because the addresses are absolute, not relative, their values depend not only on the program code but on an initial offset. That offset is why identical programs may not have identical checksums (but see the bacmp program).
By default on the Tandy 200, the first token of the first line of the BASIC program in RAM is at address 0xA001. For all other Model T computers, including the Model 100, the first token is at 0x8001.
- ↑ Searching begins at the next line in the program. The claim that one can speed up commonly used subroutines by putting them at low numbered lines is plausible as the search would detect immediately that the next line's number was too high and restart the search from the beginning.
[edit] Format documentation
- Discussion of byte format on Bitchin 100 Mailing List
- BASIC program to create a table of tokens
- Hidden Powers of the TRS 80 Model 100. Appendix B lists the BASIC token values for the Model 100. Other than the omission of 255 for a single quote remark, this reference matches what hackerb9 found programmatically for the Tandy 200.
- Ayra Model 100 BASIC Reference