DOC

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
(References)
(Added sample files and moved section)
 
(31 intermediate revisions by 6 users not shown)
Line 1: Line 1:
 
{{FormatInfo
 
{{FormatInfo
 
|subcat=Document
 
|subcat=Document
 +
|subcat2=Word Processor
 +
|subcat3=Microsoft Word
 
|extensions={{ext|doc}}
 
|extensions={{ext|doc}}
 
|mimetypes=
 
|mimetypes=
{{mimetype|application/msword}}
+
{{mimetype|application/msword}}, {{mimetype|application/vnd.ms-word}}, others have been used
 +
|pronom={{PRONOM|fmt/40}}
 +
|locfdd={{LoCFDD|fdd000509}}
 +
|wikidata={{wikidata|Q686498}}, {{wikidata|Q28858035}}
 
}}
 
}}
 
+
'''MS Word Doc''' format is a family of formats used by older versions of [[Microsoft Word|MS Word]] (they now use [[DOCX]] as a default as of [[Microsoft Office|Office]] 2007).  
'''MS Word Doc''' format is a family of formats used by older versions of MS Word (they now use [[DOCX]] as a default as of Office 2007).  
+
  
 
== File Types ==
 
== File Types ==
Line 15: Line 19:
 
* Word 97, 2000, 2002, 2003, 2007 & 2010 for MS Windows, and Word 98, 2001, X, & 2004 for Mac
 
* Word 97, 2000, 2002, 2003, 2007 & 2010 for MS Windows, and Word 98, 2001, X, & 2004 for Mac
  
== What's up, DOC? ==
+
There is also the closely-related [[DOT]] format used for Word templates.
  
But just because the file you stumbled onto has a DOC extension doesn't mean it is necessarily actually an MS Word file, though if it's not that old it probably is. Older files, like from the 1980s or 1990s, might be something else entirely. Several other word processors in that era used .DOC file extensions, even though their format was nothing like MS Word's. Also, it was fairly common for people to save plain text files with that extension when they were DOCumenting something, like the instructions for a program that was packed up in an [[ARC]] or [[ZIP]] file for download on a bulletin board system (BBS). But you might still try opening them with Word (as will normally happen in Windows if you have Word installed and double-click on a DOC file), since it will open plain-text files all right (even ancient ones).
+
Wordpad, the program that comes with current versions of Windows, saves in Word 6 format.
  
== Sample files ==
+
== What's up, DOC? ==
 
+
But just because the file you stumbled onto has a DOC extension doesn't mean it is necessarily actually an MS Word file, though if it's not that old it probably is. Older files, like from the 1980s or 1990s, might be something else entirely. Several other word processors in that era used .DOC file extensions, even though their format was nothing like MS Word's. Also, it was fairly common for people to save plain text files with that extension when they were DOCumenting something, like the instructions for a program that was packed up in an [[ARC (compression format)|ARC]] or [[ZIP]] file for download on a bulletin board system (BBS). But you might still try opening them with Word (as will normally happen in Windows if you have Word installed and double-click on a DOC file), since it will open plain-text files all right (even ancient ones).
* [http://people.freedesktop.org/~fridrich/blogs/Business_Letter Mac Word 5.1]
+
  
 
== Opening Word for DOS files in a modern Microsoft Word ==
 
== Opening Word for DOS files in a modern Microsoft Word ==
Line 33: Line 36:
 
# Now you should be able to see the Word for DOS file within the modern Microsoft Word.
 
# Now you should be able to see the Word for DOS file within the modern Microsoft Word.
  
== References ==
+
== Opening earlier Word for Windows files in a modern Microsoft Word ==
 +
 
 +
While various Word for Windows formats are still supported (unlike the DOS ones noted above), some of them are now disabled by default for security reasons, as Microsoft thinks that their own legacy code to open them is vulnerable to risks. Thus, in order to open such files, you may need to make registry changes, documented in a help page linked below.
 +
 
 +
== Official specs ==
 
* [http://msdn.microsoft.com/en-us/library/cc313153.aspx Microsoft's specification on the .DOC format]
 
* [http://msdn.microsoft.com/en-us/library/cc313153.aspx Microsoft's specification on the .DOC format]
* [http://msxnet.org/word2rtf/formats/dosword5 Word 5.0 (for DOS) file format, with notes on earlier versions]
+
 
* [http://msxnet.org/word2rtf/formats/ffh-dosword5 Another site with notes on Word for DOS file format]
+
== Other format descriptions ==
 +
* [https://web.archive.org/web/20160404194249/http://msxnet.org/word2rtf/formats/dosword5 Word 5.0 (for DOS) file format, with notes on earlier versions]
 +
* [https://web.archive.org/web/20160404192303/http://msxnet.org/word2rtf/formats/ffh-dosword5 Another site with notes on Word for DOS file format]
 +
 
 +
== Software and Program Code ==
 +
* [http://www.computerhistory.org/_static/atchm/microsoft-word-for-windows-1-1a-source-code/ Microsoft Word for Windows Version 1.1a Source Code]
 +
* [http://lcamtuf.coredump.cx/strikeout/ Tool for finding hidden metadata in Word files, and some of the stuff it found]
 +
* [http://textract.readthedocs.org/en/latest/ Textract: extract text from various document formats including DOC]
 +
* [http://download.microsoft.com/download/word97win/Wd55_be/97/WIN98/EN-US/Wd55_ben.exe Word for DOS 5.5]
 +
 
 +
== Sample files ==
 +
* [http://people.freedesktop.org/~fridrich/blogs/Business_Letter Mac Word 5.1]
 +
* [https://www.dan.info/sampledata/msword/testing.doc Windows Word 2007, saving in Word 2003 format]
 +
* [https://groups.google.com/forum/#!msg/droid-list/v4CHVddELaM/uaPukLXBGD0J Sample DOC and DOT (template) files from Word 97-2003, with discussion on distinguishing them]
 +
* [https://web.archive.org/web/20020313074855/http://ftp.sunet.se/pub/Internet-documents/isoc/charts/presentations/ Directory contains 2 Word 6.0 documents]
 +
* {{DexvertSamples|document/wordDoc}}
 +
 
 +
== Commentary ==
 
* [http://www.joelonsoftware.com/items/2008/02/19.html Why are the Microsoft Office file formats so complicated? (And some workarounds)]
 
* [http://www.joelonsoftware.com/items/2008/02/19.html Why are the Microsoft Office file formats so complicated? (And some workarounds)]
 
* [http://www.antipope.org/charlie/blog-static/2013/10/why-microsoft-word-must-die.html Why Microsoft Word must Die]
 
* [http://www.antipope.org/charlie/blog-static/2013/10/why-microsoft-word-must-die.html Why Microsoft Word must Die]
 
* [http://fridrich.blogspot.co.uk/2013/06/libreoffice-import-filter-for-legacy.html LibreOffice import filter for legacy Mac file-formats]
 
* [http://fridrich.blogspot.co.uk/2013/06/libreoffice-import-filter-for-legacy.html LibreOffice import filter for legacy Mac file-formats]
 +
* [http://web.archive.org/web/20040630082459/http://weblogs.asp.net/Rick_Schaut/archive/2004/02/26/80193.aspx Why Mac Word 6.0 was crappy, from a developer]
 +
* [http://toastytech.com/guis/word115.html Article about Word 1.15]
 +
* [http://msdn.microsoft.com/en-us/library/dd904907%28v=office.12%29.aspx Retrieving text from Word documents]
 +
* [http://blogs.msdn.com/b/david_leblanc/archive/2008/01/04/office-sp3-and-file-formats.aspx Office SP3 and File formats]
 +
 +
== Other links ==
 +
* [{{ForensicsWikiURL|word_document_%28doc%29}} Forensics Wiki article]
 +
* [http://decalage.info/file_formats_security/office MS Office 97-2003 legacy/binary formats security] - article with lots of resources on MS Office formats, including analysis techniques, tools and parsing libraries
 +
* [http://blog.rootshell.be/2015/01/08/searching-for-microsoft-office-files-containing-macro/ Searching for Microsoft Office files containg macros]
 +
* [http://support2.microsoft.com/?kbid=922850 Error message in Office when a file is blocked by registry policy settings]
 +
* [http://fridrich.blogspot.com/2013/06/libreoffice-import-filter-for-legacy.html LibreOffice import filter for legacy Mac file-formats]
 +
* [https://www.loc.gov/preservation/digital/formats/fdd/fdd000509.shtml  Sustainability of Digital Formats: Planning for Library of Congress Collections - Microsoft Office Word 97-2003 Binary File Format (.doc)]
  
 
[[Category:Microsoft]]
 
[[Category:Microsoft]]
 +
[[Category:Microsoft Compound File]]

Latest revision as of 15:18, 28 December 2023

File Format
Name DOC
Ontology
Extension(s) .doc
MIME Type(s) application/msword, application/vnd.ms-word, others have been used
LoCFDD fdd000509
PRONOM fmt/40
Wikidata ID Q686498, Q28858035

MS Word Doc format is a family of formats used by older versions of MS Word (they now use DOCX as a default as of Office 2007).

Contents

[edit] File Types

Wikipedia says the following four types exist

  • Word for DOS
  • Word 1 & Word 2 for MS Windows, and Word 4 & 5 for Mac
  • Word 6 & Word 95 for MS Windows, and Word 6 for Mac
  • Word 97, 2000, 2002, 2003, 2007 & 2010 for MS Windows, and Word 98, 2001, X, & 2004 for Mac

There is also the closely-related DOT format used for Word templates.

Wordpad, the program that comes with current versions of Windows, saves in Word 6 format.

[edit] What's up, DOC?

But just because the file you stumbled onto has a DOC extension doesn't mean it is necessarily actually an MS Word file, though if it's not that old it probably is. Older files, like from the 1980s or 1990s, might be something else entirely. Several other word processors in that era used .DOC file extensions, even though their format was nothing like MS Word's. Also, it was fairly common for people to save plain text files with that extension when they were DOCumenting something, like the instructions for a program that was packed up in an ARC or ZIP file for download on a bulletin board system (BBS). But you might still try opening them with Word (as will normally happen in Windows if you have Word installed and double-click on a DOC file), since it will open plain-text files all right (even ancient ones).

[edit] Opening Word for DOS files in a modern Microsoft Word

Word for DOS files can't be opened natively with the current versions of Microsoft Word anymore. However it is possible to import such old Word files with an additional converter for Word.

  1. Download the file ftp://ftp.microsoft.com/Softlib/MSLFILES/WDSUPCNV.EXE, open it (it is a self extracting zip file) and select a directory to save the files.
  2. Copy all the resulting *.cnv files (but most importantly Doswrd32.cnv) to C:\Program Files (x86)\Common Files\microsoft shared\TextConv (For users with a 32-bit Windows it is just C:\Program Files\Common Files\microsoft shared\TextConv)
  3. (Re)Start Microsoft Word
  4. Open the old Word file via the Open-dialog within Word.
  5. Word will show a prompt informing you that a text converter has to be started and that this might impose a security risk which you should only do if you trust the source where you got the files from. Press OK (if you trust the source of the files).
  6. Word will most likely show a prompt like "Style Sheet D:/STANDARD.DFV not found". Press Ok. Now a file selector dialog will open asking you to select a style sheet (*.sty) file. If you do have a style sheet for the file then select this. Otherwise create a new empty file in the regular Windows Explorer, rename it "empty.sty" and select it in the file selector. Selecting such an empty file could cause opened file to lose some general properties like print margins etc.
  7. Now you should be able to see the Word for DOS file within the modern Microsoft Word.

[edit] Opening earlier Word for Windows files in a modern Microsoft Word

While various Word for Windows formats are still supported (unlike the DOS ones noted above), some of them are now disabled by default for security reasons, as Microsoft thinks that their own legacy code to open them is vulnerable to risks. Thus, in order to open such files, you may need to make registry changes, documented in a help page linked below.

[edit] Official specs

[edit] Other format descriptions

[edit] Software and Program Code

[edit] Sample files

[edit] Commentary

[edit] Other links

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox