DIET (compression)

From Just Solve the File Format Problem
Jump to: navigation, search
File Format
Name DIET (compression)
Ontology
Released 1990

DIET is an executable compression and file compression utility for DOS, developed by Teddy Matsumoto. It does executable compression of EXE files (to EXE) and COM files (to EXE or COM).

It can also compress arbitrary data files. Such files can be transparently decompressed by DIET's TSR utility.

Both types of files can be decompressed using the -RA option.

Contents

Technical notes

Researchers should note that DIET's behavior depends on the cluster size of the relevant filesystem. Use the -B option (introduced in v1.10a) to turn off this feature, or else DIET will probably decide not to compress most of your files.

Format details

Roughly speaking, the known versions of DIET can be grouped into three format "eras": 1.00-1.00d, 1.02b-1.20, and 1.44-1.45f. Multiplied by the three file types (EXE, COM, data), that makes about 9 different DIET file formats.

Most of the formats contain a common 11-byte header preceding the compressed data:

Offset Size Description
+0 3 Signature: ASCII "dlz"
+3 1 Flags, and high 4 bits of compressed size
+4 2 Low 16 bits of compressed size
+6 2 CRC-16/ARC of compressed data
+8 1 High 6 bits of original size
+9 2 Low 16 bits of original size

There is also a two-byte signature, 0x9d 0x89, that appears in most of the formats.

Identification

For what it's worth, the newer versions of DIET detect compressed files by searching for the byte sequence 0x9d 0x89, and ASCII "dlz", in the first 126 bytes of the file. Both must appear, in that order. This works for the newer formats, but not for all of the older ones.

Identification of EXE files

Below are some version-specific characteristics of DIET-compressed EXE files.

Some DIET-compressed EXE files have 9d 89 in the EXE checksum field at offset 18 (refer to MS-DOS EXE#Header structure), and some have ASCII "diet" in the unused bytes at offset 28. These signatures might be less reliable than other means of identifying DIET format, as they could be modified.

Also, be aware of LGLZ format, which can be mistaken for DIET.

Let "8e db 8e..." be the byte sequence 8e db 8e c0 33 f6 33 ff b9 08 00 f3 a5 4b 48 4a.

v1.00-1.00d:

  • 03 00 at offset 20 (the IP register)
  • 8e db 8e... at offset 55

v1.02b-1.20

  • 9d 89 at offset 18
  • 8e db 8e... at offset 52
  • "dlz" at offset 87

v1.44

  • 9d 89 at offset 18
  • "diet" at offset 28
  • 8e db 8e... at offset 72
  • "dlz" at offset 107

v1.45f

  • 9d 89 at offset 18
  • "diet" at offset 28
  • 8e db 8e... at offset 77
  • "dlz" at offset 108

Identification of COM files

v1.00-1.00d: Files start with bf, and have fd f3 a5 fc 8b f7 bf 00 at offset 17. Note: The CRC field is at offset 35, and the compressed data starts at offset 37.

v1.02b-1.20: Files start with be, have fd f3 a5 fc 8b f7 bf 00 at offset 17, and 'd' 'l' 'z' at offset 35.

v1.44-1.45f: Files start with f9, have 9d 89 at offset 10, and 'd' 'l' 'z' at offset 65.

Identification of data files

v1.00-1.00d: Files start with bytes b4 4c cd 21 9d 89. Note: The CRC field is at offset 6, and the compressed data starts at offset 8.

v1.02b-1.20: Files start with bytes 9d 89 'd' 'l' 'z'.

v1.44-1.45f: Files start with bytes b4 4c cd 21 9d 89 'd' 'l' 'z'.

See also

Specifications

  • DIET v1.02b → DIETTECH.DOC [Possibly an unfinished draft -- lots of errors.]

Software

DIET:

Decompression:

Sample files

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox