Apple Integer BASIC tokenized file
Apple Integer BASIC, created by Apple co-founder Steve Wozniak, was first created as Apple I BASIC, for the Apple I hobbyist computer which was the first product of the newly-founded Apple company. On the Apple I, it had to be loaded from tape. When the Apple II came along the following year, it had a slightly improved version of this BASIC built into its ROM, called "Apple Integer BASIC" because it supported only integer numbers. Not long afterward, Applesoft Floating Point BASIC was licensed from Microsoft and made available to be loaded from tape or disk. Subsequent Apple models starting with the Apple II+ had Applesoft BASIC in ROM, so Integer BASIC went out of use.
Integer BASIC programs were stored in a tokenized format, in files which were designated in Apple DOS directories as type "I".
Unlike most other BASIC tokenizations which preserve literal printable ASCII characters in the 7-bit range and use high-bit (#128-#255) characters for tokens and other special functions (sometimes also using some of the ASCII control characters in #0-#31 for special functions as well), Integer BASIC tokenization instead stored normal ASCII characters with the high bit set, so that a letter A (ASCII 41 hex) was stored as C1 hex. Then the 7-bit characters with the high bit clear were used for tokens. Also, some of the control characters in the high-bit range (B0 - B9 hex) were used as flags to signal that the next two bytes were an integer constant (little-endian), except when the B0-B9 byte was preceded by an alphanumeric character (with high bit set), in which case it was considered part of a variable name.
Program lines were separated with the byte 01. The null byte 00 was not used; this is something which might be noted as a way to distinguish Integer BASIC programs from S-C Assembler source files, also stored with file type "I", but which used nulls as line separators. (But note that both bytes 00 and 01 might appear as part of integer constants.)
All BASIC keywords were assigned tokens, including command keywords which were only allowed in immediate mode on command lines, and couldn't actually appear in stored programs. Some keywords and symbols have multiple tokens for them, sometimes a large number of them; this appears to distinguish different contexts and meanings of the symbol for the assistance of the interpreter, but there doesn't seem to be any clear documentation of all of these distinctions. Some of them distinguish unary (one-argument) and binary (two-argument) versions of mathematical functions.
The program file started with a two-byte little-endian integer giving the file length, and each line started with a one-byte line length (thus, lines could not exceed 255 tokenized bytes) and a two-byte little-endian integer for the line number.
Blank values indicate either that the token is unused or is used for something unknown.
|00||0||HIMEM: (direct cmd)|