MS-DOS EXE
(→Header structure) |
(→Extended Header) |
||
(3 intermediate revisions by one user not shown) | |||
Line 48: | Line 48: | ||
The ZM signature was used by very old versions of the Microsoft linker (from while DOS 1.0 was still under development). By the time PC-DOS 1.0 was shipped, the ZM signature was already considered obsolete. However, DOS 1.0 accepted it for backwards compatibility, and that code was retained by all future DOS versions. Windows, however, rejects ZM and only accepts MZ. | The ZM signature was used by very old versions of the Microsoft linker (from while DOS 1.0 was still under development). By the time PC-DOS 1.0 was shipped, the ZM signature was already considered obsolete. However, DOS 1.0 accepted it for backwards compatibility, and that code was retained by all future DOS versions. Windows, however, rejects ZM and only accepts MZ. | ||
− | Old versions of Microsoft's development tools would calculate the checksum correctly, but DOS has always ignored it when loading EXE files. As a result, many third party tools would either calculate it using the wrong algorithm, leave it at zero or at some other fixed value. In response to this reality, Microsoft eventually gave up and stopped setting it in their own build tools | + | Old versions of Microsoft's development tools would calculate the checksum correctly, but DOS has always ignored it when loading EXE files. As a result, many third party tools would either calculate it using the wrong algorithm, leave it at zero or at some other fixed value. In response to this reality, Microsoft eventually gave up and stopped setting it in their own build tools too (in Microsoft LINK 5.3, which corresponds to Microsoft C/C++ 7.0, which came out in the early 1990s). Hence, while in 1980s era executables it is commonly set, executables from the 1990s onwards it is likely zero. (see also [https://entropymine.wordpress.com/2023/09/27/the-exe-checksum-field/ blog post with detailed analysis of checksum]) |
==== Extended Header ==== | ==== Extended Header ==== | ||
Line 66: | Line 66: | ||
|60 || uint32 || e_lfanew || File offset of new format executable header (NE, LE, LX or PE) | |60 || uint32 || e_lfanew || File offset of new format executable header (NE, LE, LX or PE) | ||
|} | |} | ||
+ | |||
+ | The extended header only exists if <code>e_lfarlc</code> is greater than 28 (0x1C); some MZ executables have the relocation table starting at offset 28 and the extended header is absent. Note however that some tools which handle newer executable formats (NE/LE/LX/PE/etc) ignore <code>e_lfarlc</code> and always test the validity of the <code>e_lfanew</code> offset. In particular, although NT-based Windows requires all executables and DLLs to start with an MZ header, it only actually checks the signature and the <code>e_lfanew</code> field, and the rest can all be zeroed. Such an executable obviously will not work under DOS. | ||
+ | |||
+ | There is some conflicting information on the purpose of the start of the <code>eres</code> field, offset 28 or 0x1C: | ||
+ | * [https://wiki.osdev.org/MZ OSDev Wiki] claims this is "Overlay information" and that "Files sometimes contain extra information for the main's program overlay management" | ||
+ | * The file [https://github.com/microsoft/MS-DOS/blob/main/v2.0/source/DOSSYM.ASM#L585 DOSSYM.ASM] in the open sourced MS-DOS 2.0 source code claims this is a 4 byte field called "exe_sym_tab" with comment "offset of symbol table in file". It also contains a [https://github.com/microsoft/MS-DOS/blob/main/v2.0/source/DOSSYM.ASM#L591 symbol_entry] structure definition which is likely the contents of the table. The same definitions occur in the open source MS-DOS 4.0 source code, moved to a different file ([https://github.com/microsoft/MS-DOS/blob/main/v4.0/src/INC/EXE.INC#L67 EXE.INC]). However, this field is not used anywhere in the open source DOS source code; it is unknown whether any Microsoft tools stored a symbol table offset in this field or if any executables survive with one set. Later DOS era Microsoft tooling stored the symbol table in a "CodeView trailer" at end of the executable, so if this mechanism was ever used at all, it is likely to have only been in the early-to-mid 1980s. | ||
+ | * [https://www.ctyme.com/intr/rb-2939.htm#Table1594 Ralf Brown Interrupt List Table 1594] reports various uses for this field, including signatures for EXE packers. Under New Executable format, it mentions a 16-bit field "behavior bits" at offset 0x20 but there is limited information available on what that means (possibly used by multitasking MS-DOS 4.0; speculatively, may have stored some of the information that later ended up in PIF files, until MS worked out there was too much config required to just add it to the EXE). It also reports that Borland TLINK puts the bytes 0x01 0x00 in the field at offset 0x1C; however, some executables shipped with MS-DOS have those bytes there as well, suggesting that whatever that means, it might not be unique to Borland, since that implies Microsoft tooling sometimes generates those bytes as well. | ||
=== Special file positions === | === Special file positions === |
Latest revision as of 21:49, 4 September 2024
MS-DOS EXE (or DOS EXE), also known as MZ format, is an executable file format used mainly by MS-DOS. It is the successor of COM. A number of other executable formats are extensions or hybrids of it; see EXE for those formats.
Contents |
[edit] Format details
[edit] Header structure
DOS EXE files begin with a fixed 28-byte header.
The field names in this table are taken from the IMAGE_DOS_HEADER structure defined in modern Windows SDKs. Byte order is little-endian.
Offset | Type | Name | Description and remarks |
---|---|---|---|
0 | byte[2] | e_magic | Signature - ASCII "MZ " or "ZM "
|
2 | uint16 | e_cblp | If nonzero, the number of bytes in the last page |
4 | uint16 | e_cp | Number of 512-byte pages in the file, not counting the "overlay" segment |
6 | uint16 | e_crlc | Number of relocations |
8 | uint16 | e_cparhdr | Header size, in 16-byte paragraphs |
10 | uint16 | e_minalloc | Minimum allocation |
12 | uint16 | e_maxalloc | Maximum allocation |
14 | int16 | e_ss | Initial SS register |
16 | uint16 | e_sp | Initial SP register |
18 | uint16 | e_csum | Checksum - Usually unused and set to 0 |
20 | uint16 | e_ip | Initial IP register |
22 | int16 | e_cs | Initial CS register |
24 | uint16 | e_lfarlc | Relocation table offset, in bytes from the start of the file |
26 | uint16 | e_ovno | Overlay number (or other custom data) - Usually unused |
The ZM signature was used by very old versions of the Microsoft linker (from while DOS 1.0 was still under development). By the time PC-DOS 1.0 was shipped, the ZM signature was already considered obsolete. However, DOS 1.0 accepted it for backwards compatibility, and that code was retained by all future DOS versions. Windows, however, rejects ZM and only accepts MZ.
Old versions of Microsoft's development tools would calculate the checksum correctly, but DOS has always ignored it when loading EXE files. As a result, many third party tools would either calculate it using the wrong algorithm, leave it at zero or at some other fixed value. In response to this reality, Microsoft eventually gave up and stopped setting it in their own build tools too (in Microsoft LINK 5.3, which corresponds to Microsoft C/C++ 7.0, which came out in the early 1990s). Hence, while in 1980s era executables it is commonly set, executables from the 1990s onwards it is likely zero. (see also blog post with detailed analysis of checksum)
[edit] Extended Header
DOS executables don't always contain these additional fields, but Windows and OS/2 executables always do:
Offset | Type | Name | Description and remarks |
---|---|---|---|
28 | byte[8] | e_res | Reserved bytes |
36 | uint16 | e_oemid | OEM identifier (rarely used) |
38 | uint16 | e_oeminfo | OEM information (rarely used, meaning depends on OEM identifier) |
40 | byte[20] | e_res2 | Reserved bytes |
60 | uint32 | e_lfanew | File offset of new format executable header (NE, LE, LX or PE) |
The extended header only exists if e_lfarlc
is greater than 28 (0x1C); some MZ executables have the relocation table starting at offset 28 and the extended header is absent. Note however that some tools which handle newer executable formats (NE/LE/LX/PE/etc) ignore e_lfarlc
and always test the validity of the e_lfanew
offset. In particular, although NT-based Windows requires all executables and DLLs to start with an MZ header, it only actually checks the signature and the e_lfanew
field, and the rest can all be zeroed. Such an executable obviously will not work under DOS.
There is some conflicting information on the purpose of the start of the eres
field, offset 28 or 0x1C:
- OSDev Wiki claims this is "Overlay information" and that "Files sometimes contain extra information for the main's program overlay management"
- The file DOSSYM.ASM in the open sourced MS-DOS 2.0 source code claims this is a 4 byte field called "exe_sym_tab" with comment "offset of symbol table in file". It also contains a symbol_entry structure definition which is likely the contents of the table. The same definitions occur in the open source MS-DOS 4.0 source code, moved to a different file (EXE.INC). However, this field is not used anywhere in the open source DOS source code; it is unknown whether any Microsoft tools stored a symbol table offset in this field or if any executables survive with one set. Later DOS era Microsoft tooling stored the symbol table in a "CodeView trailer" at end of the executable, so if this mechanism was ever used at all, it is likely to have only been in the early-to-mid 1980s.
- Ralf Brown Interrupt List Table 1594 reports various uses for this field, including signatures for EXE packers. Under New Executable format, it mentions a 16-bit field "behavior bits" at offset 0x20 but there is limited information available on what that means (possibly used by multitasking MS-DOS 4.0; speculatively, may have stored some of the information that later ended up in PIF files, until MS worked out there was too much config required to just add it to the EXE). It also reports that Borland TLINK puts the bytes 0x01 0x00 in the field at offset 0x1C; however, some executables shipped with MS-DOS have those bytes there as well, suggesting that whatever that means, it might not be unique to Borland, since that implies Microsoft tooling sometimes generates those bytes as well.
[edit] Special file positions
When analyzing DOS EXE files, especially "envelope" formats, it can be helpful to calculate certain special file positions. The positions given here are in bytes, from the start of the file.
- End of relocation table: e_lfarlc + 4×e_crlc
- Start of code image segment: 16×e_cparhdr
- Execution starting point (a.k.a. entry point): 16×e_cparhdr + 16×e_cs + e_ip. Note that e_cs may be negative.
- Start of overlay segment (or end of code image segment): If e_cblp=0, this is 512×e_cp. Otherwise, 512×(e_cp−1) + e_cblp.
[edit] Identification
See EXE#Identification for EXE format in general.
It's not clear if there is any completely reliable way to identify a file as strictly DOS EXE, except in the negative (i.e., it looks like EXE, and is not a valid NE, PE, etc., file).
If the relocation table offset is from 28 to 63, or any segment (relocation table or code image) overlaps the four bytes starting at offset 60, it is pretty certainly DOS EXE.
Most non-DOS EXE files set the relocation table offset to 64, but it's probably not safe to rely on that.
[edit] Sample files
[edit] Links
- Wikipedia article
- MZ, from the OSDev Wiki
- http://www.delorie.com/djgpp/doc/exe/
- DOS EXE format
- EXE Explorer utility
- Ralf Brown's Interrupt Reference has an extensive list of (mostly older) MZ-based executable formats