DOC
Dan Tobias (Talk | contribs) |
(Added sample files and moved section) |
||
(28 intermediate revisions by 6 users not shown) | |||
Line 1: | Line 1: | ||
{{FormatInfo | {{FormatInfo | ||
|subcat=Document | |subcat=Document | ||
+ | |subcat2=Word Processor | ||
+ | |subcat3=Microsoft Word | ||
|extensions={{ext|doc}} | |extensions={{ext|doc}} | ||
|mimetypes= | |mimetypes= | ||
{{mimetype|application/msword}}, {{mimetype|application/vnd.ms-word}}, others have been used | {{mimetype|application/msword}}, {{mimetype|application/vnd.ms-word}}, others have been used | ||
+ | |pronom={{PRONOM|fmt/40}} | ||
+ | |locfdd={{LoCFDD|fdd000509}} | ||
+ | |wikidata={{wikidata|Q686498}}, {{wikidata|Q28858035}} | ||
}} | }} | ||
− | + | '''MS Word Doc''' format is a family of formats used by older versions of [[Microsoft Word|MS Word]] (they now use [[DOCX]] as a default as of [[Microsoft Office|Office]] 2007). | |
− | '''MS Word Doc''' format is a family of formats used by older versions of MS Word (they now use [[DOCX]] as a default as of Office 2007). | + | |
== File Types == | == File Types == | ||
Line 15: | Line 19: | ||
* Word 97, 2000, 2002, 2003, 2007 & 2010 for MS Windows, and Word 98, 2001, X, & 2004 for Mac | * Word 97, 2000, 2002, 2003, 2007 & 2010 for MS Windows, and Word 98, 2001, X, & 2004 for Mac | ||
− | + | There is also the closely-related [[DOT]] format used for Word templates. | |
− | + | Wordpad, the program that comes with current versions of Windows, saves in Word 6 format. | |
− | == | + | == What's up, DOC? == |
− | + | But just because the file you stumbled onto has a DOC extension doesn't mean it is necessarily actually an MS Word file, though if it's not that old it probably is. Older files, like from the 1980s or 1990s, might be something else entirely. Several other word processors in that era used .DOC file extensions, even though their format was nothing like MS Word's. Also, it was fairly common for people to save plain text files with that extension when they were DOCumenting something, like the instructions for a program that was packed up in an [[ARC (compression format)|ARC]] or [[ZIP]] file for download on a bulletin board system (BBS). But you might still try opening them with Word (as will normally happen in Windows if you have Word installed and double-click on a DOC file), since it will open plain-text files all right (even ancient ones). | |
− | + | ||
== Opening Word for DOS files in a modern Microsoft Word == | == Opening Word for DOS files in a modern Microsoft Word == | ||
Line 33: | Line 36: | ||
# Now you should be able to see the Word for DOS file within the modern Microsoft Word. | # Now you should be able to see the Word for DOS file within the modern Microsoft Word. | ||
− | == | + | == Opening earlier Word for Windows files in a modern Microsoft Word == |
+ | |||
+ | While various Word for Windows formats are still supported (unlike the DOS ones noted above), some of them are now disabled by default for security reasons, as Microsoft thinks that their own legacy code to open them is vulnerable to risks. Thus, in order to open such files, you may need to make registry changes, documented in a help page linked below. | ||
+ | |||
+ | == Official specs == | ||
* [http://msdn.microsoft.com/en-us/library/cc313153.aspx Microsoft's specification on the .DOC format] | * [http://msdn.microsoft.com/en-us/library/cc313153.aspx Microsoft's specification on the .DOC format] | ||
− | * [http://msxnet.org/word2rtf/formats/dosword5 Word 5.0 (for DOS) file format, with notes on earlier versions] | + | |
− | * [http://msxnet.org/word2rtf/formats/ffh-dosword5 Another site with notes on Word for DOS file format] | + | == Other format descriptions == |
+ | * [https://web.archive.org/web/20160404194249/http://msxnet.org/word2rtf/formats/dosword5 Word 5.0 (for DOS) file format, with notes on earlier versions] | ||
+ | * [https://web.archive.org/web/20160404192303/http://msxnet.org/word2rtf/formats/ffh-dosword5 Another site with notes on Word for DOS file format] | ||
+ | |||
+ | == Software and Program Code == | ||
+ | * [http://www.computerhistory.org/_static/atchm/microsoft-word-for-windows-1-1a-source-code/ Microsoft Word for Windows Version 1.1a Source Code] | ||
+ | * [http://lcamtuf.coredump.cx/strikeout/ Tool for finding hidden metadata in Word files, and some of the stuff it found] | ||
+ | * [http://textract.readthedocs.org/en/latest/ Textract: extract text from various document formats including DOC] | ||
+ | * [http://download.microsoft.com/download/word97win/Wd55_be/97/WIN98/EN-US/Wd55_ben.exe Word for DOS 5.5] | ||
+ | |||
+ | == Sample files == | ||
+ | * [http://people.freedesktop.org/~fridrich/blogs/Business_Letter Mac Word 5.1] | ||
+ | * [https://www.dan.info/sampledata/msword/testing.doc Windows Word 2007, saving in Word 2003 format] | ||
+ | * [https://groups.google.com/forum/#!msg/droid-list/v4CHVddELaM/uaPukLXBGD0J Sample DOC and DOT (template) files from Word 97-2003, with discussion on distinguishing them] | ||
+ | * [https://web.archive.org/web/20020313074855/http://ftp.sunet.se/pub/Internet-documents/isoc/charts/presentations/ Directory contains 2 Word 6.0 documents] | ||
+ | * {{DexvertSamples|document/wordDoc}} | ||
+ | |||
+ | == Commentary == | ||
* [http://www.joelonsoftware.com/items/2008/02/19.html Why are the Microsoft Office file formats so complicated? (And some workarounds)] | * [http://www.joelonsoftware.com/items/2008/02/19.html Why are the Microsoft Office file formats so complicated? (And some workarounds)] | ||
* [http://www.antipope.org/charlie/blog-static/2013/10/why-microsoft-word-must-die.html Why Microsoft Word must Die] | * [http://www.antipope.org/charlie/blog-static/2013/10/why-microsoft-word-must-die.html Why Microsoft Word must Die] | ||
* [http://fridrich.blogspot.co.uk/2013/06/libreoffice-import-filter-for-legacy.html LibreOffice import filter for legacy Mac file-formats] | * [http://fridrich.blogspot.co.uk/2013/06/libreoffice-import-filter-for-legacy.html LibreOffice import filter for legacy Mac file-formats] | ||
* [http://web.archive.org/web/20040630082459/http://weblogs.asp.net/Rick_Schaut/archive/2004/02/26/80193.aspx Why Mac Word 6.0 was crappy, from a developer] | * [http://web.archive.org/web/20040630082459/http://weblogs.asp.net/Rick_Schaut/archive/2004/02/26/80193.aspx Why Mac Word 6.0 was crappy, from a developer] | ||
− | * [http:// | + | * [http://toastytech.com/guis/word115.html Article about Word 1.15] |
+ | * [http://msdn.microsoft.com/en-us/library/dd904907%28v=office.12%29.aspx Retrieving text from Word documents] | ||
+ | * [http://blogs.msdn.com/b/david_leblanc/archive/2008/01/04/office-sp3-and-file-formats.aspx Office SP3 and File formats] | ||
+ | |||
+ | == Other links == | ||
+ | * [{{ForensicsWikiURL|word_document_%28doc%29}} Forensics Wiki article] | ||
+ | * [http://decalage.info/file_formats_security/office MS Office 97-2003 legacy/binary formats security] - article with lots of resources on MS Office formats, including analysis techniques, tools and parsing libraries | ||
+ | * [http://blog.rootshell.be/2015/01/08/searching-for-microsoft-office-files-containing-macro/ Searching for Microsoft Office files containg macros] | ||
+ | * [http://support2.microsoft.com/?kbid=922850 Error message in Office when a file is blocked by registry policy settings] | ||
+ | * [http://fridrich.blogspot.com/2013/06/libreoffice-import-filter-for-legacy.html LibreOffice import filter for legacy Mac file-formats] | ||
+ | * [https://www.loc.gov/preservation/digital/formats/fdd/fdd000509.shtml Sustainability of Digital Formats: Planning for Library of Congress Collections - Microsoft Office Word 97-2003 Binary File Format (.doc)] | ||
[[Category:Microsoft]] | [[Category:Microsoft]] | ||
+ | [[Category:Microsoft Compound File]] |
Latest revision as of 15:18, 28 December 2023
MS Word Doc format is a family of formats used by older versions of MS Word (they now use DOCX as a default as of Office 2007).
Contents |
[edit] File Types
Wikipedia says the following four types exist
- Word for DOS
- Word 1 & Word 2 for MS Windows, and Word 4 & 5 for Mac
- Word 6 & Word 95 for MS Windows, and Word 6 for Mac
- Word 97, 2000, 2002, 2003, 2007 & 2010 for MS Windows, and Word 98, 2001, X, & 2004 for Mac
There is also the closely-related DOT format used for Word templates.
Wordpad, the program that comes with current versions of Windows, saves in Word 6 format.
[edit] What's up, DOC?
But just because the file you stumbled onto has a DOC extension doesn't mean it is necessarily actually an MS Word file, though if it's not that old it probably is. Older files, like from the 1980s or 1990s, might be something else entirely. Several other word processors in that era used .DOC file extensions, even though their format was nothing like MS Word's. Also, it was fairly common for people to save plain text files with that extension when they were DOCumenting something, like the instructions for a program that was packed up in an ARC or ZIP file for download on a bulletin board system (BBS). But you might still try opening them with Word (as will normally happen in Windows if you have Word installed and double-click on a DOC file), since it will open plain-text files all right (even ancient ones).
[edit] Opening Word for DOS files in a modern Microsoft Word
Word for DOS files can't be opened natively with the current versions of Microsoft Word anymore. However it is possible to import such old Word files with an additional converter for Word.
- Download the file ftp://ftp.microsoft.com/Softlib/MSLFILES/WDSUPCNV.EXE, open it (it is a self extracting zip file) and select a directory to save the files.
- Copy all the resulting *.cnv files (but most importantly
Doswrd32.cnv
) toC:\Program Files (x86)\Common Files\microsoft shared\TextConv
(For users with a 32-bit Windows it is justC:\Program Files\Common Files\microsoft shared\TextConv
) - (Re)Start Microsoft Word
- Open the old Word file via the Open-dialog within Word.
- Word will show a prompt informing you that a text converter has to be started and that this might impose a security risk which you should only do if you trust the source where you got the files from. Press OK (if you trust the source of the files).
- Word will most likely show a prompt like "Style Sheet D:/STANDARD.DFV not found". Press Ok. Now a file selector dialog will open asking you to select a style sheet (*.sty) file. If you do have a style sheet for the file then select this. Otherwise create a new empty file in the regular Windows Explorer, rename it "empty.sty" and select it in the file selector. Selecting such an empty file could cause opened file to lose some general properties like print margins etc.
- Now you should be able to see the Word for DOS file within the modern Microsoft Word.
[edit] Opening earlier Word for Windows files in a modern Microsoft Word
While various Word for Windows formats are still supported (unlike the DOS ones noted above), some of them are now disabled by default for security reasons, as Microsoft thinks that their own legacy code to open them is vulnerable to risks. Thus, in order to open such files, you may need to make registry changes, documented in a help page linked below.
[edit] Official specs
[edit] Other format descriptions
- Word 5.0 (for DOS) file format, with notes on earlier versions
- Another site with notes on Word for DOS file format
[edit] Software and Program Code
- Microsoft Word for Windows Version 1.1a Source Code
- Tool for finding hidden metadata in Word files, and some of the stuff it found
- Textract: extract text from various document formats including DOC
- Word for DOS 5.5
[edit] Sample files
- Mac Word 5.1
- Windows Word 2007, saving in Word 2003 format
- Sample DOC and DOT (template) files from Word 97-2003, with discussion on distinguishing them
- Directory contains 2 Word 6.0 documents
- dexvert samples — document/wordDoc
[edit] Commentary
- Why are the Microsoft Office file formats so complicated? (And some workarounds)
- Why Microsoft Word must Die
- LibreOffice import filter for legacy Mac file-formats
- Why Mac Word 6.0 was crappy, from a developer
- Article about Word 1.15
- Retrieving text from Word documents
- Office SP3 and File formats
[edit] Other links
- Forensics Wiki article
- MS Office 97-2003 legacy/binary formats security - article with lots of resources on MS Office formats, including analysis techniques, tools and parsing libraries
- Searching for Microsoft Office files containg macros
- Error message in Office when a file is blocked by registry policy settings
- LibreOffice import filter for legacy Mac file-formats
- Sustainability of Digital Formats: Planning for Library of Congress Collections - Microsoft Office Word 97-2003 Binary File Format (.doc)