MS-DOS EXE

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
(Deleted some things moved to EXE, and restored some deleted things)
(Extended Header)
 
(17 intermediate revisions by 4 users not shown)
Line 4: Line 4:
 
|extensions={{ext|exe}}
 
|extensions={{ext|exe}}
 
|pronom={{PRONOM|x-fmt/409}}
 
|pronom={{PRONOM|x-fmt/409}}
 +
|kaitai struct=dos_mz
 
}}
 
}}
'''MS-DOS EXE''', also known as '''MZ''' format, is an executable file format used mainly by [[MS-DOS]]. It is the successor of [[DOS executable (.com)|COM]]. A number of other executable formats are extensions of it; see [[EXE]] for those formats.
+
'''MS-DOS EXE''' (or '''DOS EXE'''), also known as '''MZ''' format, is an executable file format used mainly by [[MS-DOS]]. It is the successor of [[DOS executable (.com)|COM]]. A number of other executable formats are extensions or hybrids of it; see [[EXE]] for those formats.
 +
 
 +
== Format details ==
 +
=== Header structure ===
 +
DOS EXE files begin with a fixed 28-byte header.
 +
 
 +
The field names in this table are taken from the IMAGE_DOS_HEADER structure defined in modern Windows SDKs. Byte order is little-endian.
 +
 
 +
{| class="wikitable"
 +
! Offset !! Type !! Name !! Description and remarks
 +
|-
 +
|0 || byte[2] || e_magic || Signature - ASCII "<code>MZ</code>" or "<code>ZM</code>"
 +
|-
 +
|2 || uint16 || e_cblp || If nonzero, the number of bytes in the last page
 +
|-
 +
|4 || uint16 || e_cp || Number of 512-byte pages in the file, not counting the "overlay" segment
 +
|-
 +
|6 || uint16 || e_crlc || Number of relocations
 +
|-
 +
|8 || uint16 || e_cparhdr || Header size, in 16-byte paragraphs
 +
|-
 +
|10 || uint16 || e_minalloc || Minimum allocation
 +
|-
 +
|12 || uint16 || e_maxalloc || Maximum allocation
 +
|-
 +
|14 || int16  || e_ss || Initial SS register
 +
|-
 +
|16 || uint16 || e_sp || Initial SP register
 +
|-
 +
|18 || uint16 || e_csum || Checksum - Usually unused and set to 0
 +
|-
 +
|20 || uint16 || e_ip || Initial IP register
 +
|-
 +
|22 || int16  || e_cs || Initial CS register
 +
|-
 +
|24 || uint16 || e_lfarlc || Relocation table offset, in bytes from the start of the file
 +
|-
 +
|26 || uint16 || e_ovno || Overlay number (or other custom data) - Usually unused
 +
|}
 +
 
 +
The ZM signature was used by very old versions of the Microsoft linker (from while DOS 1.0 was still under development). By the time PC-DOS 1.0 was shipped, the ZM signature was already considered obsolete. However, DOS 1.0 accepted it for backwards compatibility, and that code was retained by all future DOS versions. Windows, however, rejects ZM and only accepts MZ.
 +
 
 +
Old versions of Microsoft's development tools would calculate the checksum correctly, but DOS has always ignored it when loading EXE files. As a result, many third party tools would either calculate it using the wrong algorithm, leave it at zero or at some other fixed value. In response to this reality, Microsoft eventually gave up and stopped setting it in their own build tools too (in Microsoft LINK 5.3, which corresponds to Microsoft C/C++ 7.0, which came out in the early 1990s). Hence, while in 1980s era executables it is commonly set, executables from the 1990s onwards it is likely zero. (see also [https://entropymine.wordpress.com/2023/09/27/the-exe-checksum-field/ blog post with detailed analysis of checksum])
 +
 
 +
==== Extended Header ====
 +
DOS executables don't always contain these additional fields, but Windows and OS/2 executables always do:
 +
 
 +
{| class="wikitable"
 +
! Offset !! Type !! Name !! Description and remarks
 +
|-
 +
|28 || byte[8] || e_res || Reserved bytes
 +
|-
 +
|36 || uint16 || e_oemid || OEM identifier (rarely used)
 +
|-
 +
|38 || uint16 || e_oeminfo || OEM information (rarely used, meaning depends on OEM identifier)
 +
|-
 +
|40 || byte[20] || e_res2 || Reserved bytes
 +
|-
 +
|60 || uint32 || e_lfanew || File offset of new format executable header (NE, LE, LX or PE)
 +
|}
 +
 
 +
The extended header only exists if <code>e_lfarlc</code> is greater than 28 (0x1C); some MZ executables have the relocation table starting at offset 28 and the extended header is absent. Note however that some tools which handle newer executable formats (NE/LE/LX/PE/etc) ignore <code>e_lfarlc</code> and always test the validity of the <code>e_lfanew</code> offset. In particular, although NT-based Windows requires all executables and DLLs to start with an MZ header, it only actually checks the signature and the <code>e_lfanew</code> field, and the rest can all be zeroed. Such an executable obviously will not work under DOS.
 +
 
 +
There is some conflicting information on the purpose of the start of the <code>eres</code> field, offset 28 or 0x1C:
 +
* [https://wiki.osdev.org/MZ OSDev Wiki] claims this is "Overlay information" and that "Files sometimes contain extra information for the main's program overlay management"
 +
* The file [https://github.com/microsoft/MS-DOS/blob/main/v2.0/source/DOSSYM.ASM#L585 DOSSYM.ASM] in the open sourced MS-DOS 2.0 source code claims this is a 4 byte field called "exe_sym_tab" with comment "offset of symbol table in file". It also contains a [https://github.com/microsoft/MS-DOS/blob/main/v2.0/source/DOSSYM.ASM#L591 symbol_entry] structure definition which is likely the contents of the table. The same definitions occur in the open source MS-DOS 4.0 source code, moved to a different file ([https://github.com/microsoft/MS-DOS/blob/main/v4.0/src/INC/EXE.INC#L67 EXE.INC]). However, this field is not used anywhere in the open source DOS source code; it is unknown whether any Microsoft tools stored a symbol table offset in this field or if any executables survive with one set. Later DOS era Microsoft tooling stored the symbol table in a "CodeView trailer" at end of the executable, so if this mechanism was ever used at all, it is likely to have only been in the early-to-mid 1980s. 
 +
* [https://www.ctyme.com/intr/rb-2939.htm#Table1594 Ralf Brown Interrupt List Table 1594] reports various uses for this field, including signatures for EXE packers. Under New Executable format, it mentions a 16-bit field "behavior bits" at offset 0x20 but there is limited information available on what that means (possibly used by multitasking MS-DOS 4.0; speculatively, may have stored some of the information that later ended up in PIF files, until MS worked out there was too much config required to just add it to the EXE). It also reports that Borland TLINK puts the bytes 0x01 0x00 in the field at offset 0x1C; however, some executables shipped with MS-DOS have those bytes there as well, suggesting that whatever that means, it might not be unique to Borland, since that implies Microsoft tooling sometimes generates those bytes as well.
 +
 
 +
=== Special file positions ===
 +
When analyzing DOS EXE files, especially [[Executable envelopes|"envelope" formats]], it can be helpful to calculate certain special file positions. The positions given here are in bytes, from the start of the file.
 +
 
 +
* ''End of relocation table'': e_lfarlc + 4×e_crlc
 +
* ''Start of code image segment'': 16×e_cparhdr
 +
* ''Execution starting point'' (a.k.a. ''entry point''): 16×e_cparhdr + 16×e_cs + e_ip. Note that e_cs may be negative.
 +
* ''Start of overlay segment'' (or ''end of code image segment''): If e_cblp=0, this is 512×e_cp. Otherwise, 512×(e_cp−1) + e_cblp.
  
 
== Identification ==
 
== Identification ==
An MS-DOS EXE file begins with an ASCII signature of "{{magic|MZ}}" (or, rarely, "{{magic|ZM}}"), followed by a series of 16-bit fields. The field at offset 24 (the ''relocation table offset'') is ''usually'' (but apparently not always) less than 64, and at least 28. A value of 64 or more, or 0, suggests the format may not be MS-DOS EXE.
+
See [[EXE#Identification]] for EXE format in general.
 +
 
 +
It's not clear if there is any completely reliable way to identify a file as strictly DOS EXE, except in the negative (i.e., it looks like EXE, and is not a valid [[NE]], [[PE]], etc., file).
 +
 
 +
If the relocation table offset is from 28 to 63, or any segment (relocation table or code image) overlaps the four bytes starting at offset 60, it is pretty certainly DOS EXE.
 +
 
 +
Most non-DOS EXE files set the relocation table offset to 64, but it's probably not safe to rely on that.
  
It's not clear whether there is any completely reliable way to identify an MS-DOS EXE, except in the negative (i.e. it begins with "MZ", and is not a valid [[NE]], [[PE]], etc., file).
+
== Sample files ==
 +
* {{DexvertSamples|executable/exe}}
  
 
== Links ==
 
== Links ==
Line 21: Line 103:
  
 
[[Category:Microsoft]]
 
[[Category:Microsoft]]
 +
[[Category:MS-DOS]]

Latest revision as of 21:49, 4 September 2024

File Format
Name MS-DOS EXE
Ontology
Extension(s) .exe
PRONOM x-fmt/409
Kaitai Struct Spec dos_mz.ksy

MS-DOS EXE (or DOS EXE), also known as MZ format, is an executable file format used mainly by MS-DOS. It is the successor of COM. A number of other executable formats are extensions or hybrids of it; see EXE for those formats.

Contents

[edit] Format details

[edit] Header structure

DOS EXE files begin with a fixed 28-byte header.

The field names in this table are taken from the IMAGE_DOS_HEADER structure defined in modern Windows SDKs. Byte order is little-endian.

Offset Type Name Description and remarks
0 byte[2] e_magic Signature - ASCII "MZ" or "ZM"
2 uint16 e_cblp If nonzero, the number of bytes in the last page
4 uint16 e_cp Number of 512-byte pages in the file, not counting the "overlay" segment
6 uint16 e_crlc Number of relocations
8 uint16 e_cparhdr Header size, in 16-byte paragraphs
10 uint16 e_minalloc Minimum allocation
12 uint16 e_maxalloc Maximum allocation
14 int16 e_ss Initial SS register
16 uint16 e_sp Initial SP register
18 uint16 e_csum Checksum - Usually unused and set to 0
20 uint16 e_ip Initial IP register
22 int16 e_cs Initial CS register
24 uint16 e_lfarlc Relocation table offset, in bytes from the start of the file
26 uint16 e_ovno Overlay number (or other custom data) - Usually unused

The ZM signature was used by very old versions of the Microsoft linker (from while DOS 1.0 was still under development). By the time PC-DOS 1.0 was shipped, the ZM signature was already considered obsolete. However, DOS 1.0 accepted it for backwards compatibility, and that code was retained by all future DOS versions. Windows, however, rejects ZM and only accepts MZ.

Old versions of Microsoft's development tools would calculate the checksum correctly, but DOS has always ignored it when loading EXE files. As a result, many third party tools would either calculate it using the wrong algorithm, leave it at zero or at some other fixed value. In response to this reality, Microsoft eventually gave up and stopped setting it in their own build tools too (in Microsoft LINK 5.3, which corresponds to Microsoft C/C++ 7.0, which came out in the early 1990s). Hence, while in 1980s era executables it is commonly set, executables from the 1990s onwards it is likely zero. (see also blog post with detailed analysis of checksum)

[edit] Extended Header

DOS executables don't always contain these additional fields, but Windows and OS/2 executables always do:

Offset Type Name Description and remarks
28 byte[8] e_res Reserved bytes
36 uint16 e_oemid OEM identifier (rarely used)
38 uint16 e_oeminfo OEM information (rarely used, meaning depends on OEM identifier)
40 byte[20] e_res2 Reserved bytes
60 uint32 e_lfanew File offset of new format executable header (NE, LE, LX or PE)

The extended header only exists if e_lfarlc is greater than 28 (0x1C); some MZ executables have the relocation table starting at offset 28 and the extended header is absent. Note however that some tools which handle newer executable formats (NE/LE/LX/PE/etc) ignore e_lfarlc and always test the validity of the e_lfanew offset. In particular, although NT-based Windows requires all executables and DLLs to start with an MZ header, it only actually checks the signature and the e_lfanew field, and the rest can all be zeroed. Such an executable obviously will not work under DOS.

There is some conflicting information on the purpose of the start of the eres field, offset 28 or 0x1C:

  • OSDev Wiki claims this is "Overlay information" and that "Files sometimes contain extra information for the main's program overlay management"
  • The file DOSSYM.ASM in the open sourced MS-DOS 2.0 source code claims this is a 4 byte field called "exe_sym_tab" with comment "offset of symbol table in file". It also contains a symbol_entry structure definition which is likely the contents of the table. The same definitions occur in the open source MS-DOS 4.0 source code, moved to a different file (EXE.INC). However, this field is not used anywhere in the open source DOS source code; it is unknown whether any Microsoft tools stored a symbol table offset in this field or if any executables survive with one set. Later DOS era Microsoft tooling stored the symbol table in a "CodeView trailer" at end of the executable, so if this mechanism was ever used at all, it is likely to have only been in the early-to-mid 1980s.
  • Ralf Brown Interrupt List Table 1594 reports various uses for this field, including signatures for EXE packers. Under New Executable format, it mentions a 16-bit field "behavior bits" at offset 0x20 but there is limited information available on what that means (possibly used by multitasking MS-DOS 4.0; speculatively, may have stored some of the information that later ended up in PIF files, until MS worked out there was too much config required to just add it to the EXE). It also reports that Borland TLINK puts the bytes 0x01 0x00 in the field at offset 0x1C; however, some executables shipped with MS-DOS have those bytes there as well, suggesting that whatever that means, it might not be unique to Borland, since that implies Microsoft tooling sometimes generates those bytes as well.

[edit] Special file positions

When analyzing DOS EXE files, especially "envelope" formats, it can be helpful to calculate certain special file positions. The positions given here are in bytes, from the start of the file.

  • End of relocation table: e_lfarlc + 4×e_crlc
  • Start of code image segment: 16×e_cparhdr
  • Execution starting point (a.k.a. entry point): 16×e_cparhdr + 16×e_cs + e_ip. Note that e_cs may be negative.
  • Start of overlay segment (or end of code image segment): If e_cblp=0, this is 512×e_cp. Otherwise, 512×(e_cp−1) + e_cblp.

[edit] Identification

See EXE#Identification for EXE format in general.

It's not clear if there is any completely reliable way to identify a file as strictly DOS EXE, except in the negative (i.e., it looks like EXE, and is not a valid NE, PE, etc., file).

If the relocation table offset is from 28 to 63, or any segment (relocation table or code image) overlaps the four bytes starting at offset 60, it is pretty certainly DOS EXE.

Most non-DOS EXE files set the relocation table offset to 64, but it's probably not safe to rely on that.

[edit] Sample files

[edit] Links

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox