OpenDocument Text

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
(Links)
(Sample files)
 
(12 intermediate revisions by 4 users not shown)
Line 2: Line 2:
 
|formattype=electronic
 
|formattype=electronic
 
|subcat=Document
 
|subcat=Document
|extensions={{ext|fodt}}, {{ext|odt}}
+
|subcat2=Word Processor
|mimetypes={{mimetype|application/vnd.oasis.opendocument.text}}
+
|extensions={{ext|odt}}, {{ext|fodt}}, {{ext|ott}}
|pronom={{PRONOM|fmt/136}},{{PRONOM|fmt/290}},{{PRONOM|fmt/291}}
+
|mimetypes={{mimetype|application/vnd.oasis.opendocument.text}}, {{mimetype|application/vnd.oasis.opendocument.text-template}}
 +
|locfdd={{LoCFDD|fdd000427}}, {{LoCFDD|fdd000428}}
 +
|pronom={{PRONOM|fmt/136}}, {{PRONOM|fmt/290}}, {{PRONOM|fmt/291}}
 
|released=2005-05-01
 
|released=2005-05-01
 
}}
 
}}
 
+
The '''OpenDocument Text''' format is one of a number of types of the [[OpenDocument|Open Document Format for Office Applications]] (commonly referred to as OpenDocument), an [[XML]]-based file format defined by the Organization for the Advancement of Structured Information Standards (OASIS) in 2005.
The OpenDocument Text format is one of a number of types of the [[OpenDocument|Open Document Format for Office Applications]] (commonly referred to as OpenDocument), an [[XML]]-based file format defined by the Organization for the Advancement of Structured Information Standards (OASIS) in 2005.
+
  
 
OpenDocument Text can, like all [[OpenDocument]] files, be represented in one of two fashions - as a single XML document or as a collection of several sub-documents within a single package (commonly a [[ZIP]] archive). Generally, the extension '''.fodt''' is used for the uncommonly-used single XML documents and '''.odt''' for packaged sub-documents.
 
OpenDocument Text can, like all [[OpenDocument]] files, be represented in one of two fashions - as a single XML document or as a collection of several sub-documents within a single package (commonly a [[ZIP]] archive). Generally, the extension '''.fodt''' is used for the uncommonly-used single XML documents and '''.odt''' for packaged sub-documents.
 
  
 
== Image embedding issue ==
 
== Image embedding issue ==
Both OpenOffice and LibreOffice are affected by a long-running bug where upon pasting an image into a document, only a ''hyperlink'' to the image is inserted by default (rather than the ''actual image data''). More details can be found [http://wiki.opf-labs.org/display/TR/Images+not+embedded+because+of+paste+as+link+bug+in+OpenOffice+and+LibreOffice here]. This bug was first reported in 2004, and as of 2013 it still hasn't been resolved.
+
Both OpenOffice and LibreOffice are affected by a long-running bug where upon pasting an image into a document, only a ''hyperlink'' to the image is inserted by default (rather than the ''actual image data''). More details can be found [http://wiki.opf-labs.org/display/TR/Images+not+embedded+because+of+paste+as+link+bug+in+OpenOffice+and+LibreOffice here]. This bug was first reported in 2004, and as of 2013 it still had not been resolved.  It was apparently fixed in LibreOffice in early 2014 [http://wiki.opf-labs.org/display/TR/Images+not+embedded+because+of+paste+as+link+bug+in+OpenOffice+and+LibreOffice].
  
 
== Information ==
 
== Information ==
Line 23: Line 23:
 
* [http://en.wikipedia.org/wiki/OpenDocument Wikipedia - OpenDocument]
 
* [http://en.wikipedia.org/wiki/OpenDocument Wikipedia - OpenDocument]
  
 +
== Zipped Archive Structure ==
  
=== Microsoft Office 2010 ===
+
When stored as a ZIP archive (with an .odt extension), it contains [[XML]] files describing text and relationships and [[JPEG]], [[PNG]], and other graphical files for pictures and other media included in the document.
 +
 
 +
The layout of the regular ODT file is the following:
 +
* META-INF
 +
** manifest.xml
 +
* Thumbnails
 +
** thumbnail.png
 +
* content.xml
 +
* manifest.rdf
 +
* meta.xml
 +
* mimetype
 +
* settings.xml
 +
* styles.xml
 +
 
 +
===Inner files description===
 +
====manifest.xml====
 +
Lists all the other xml files that are in this document. In the event of the simple document its contents maybe something like:
 +
 
 +
<?xml version="1.0" encoding="UTF-8"?>
 +
<!DOCTYPE manifest:manifest PUBLIC "-//OpenOffice.org//DTD Manifest 1.0//EN" "Manifest.dtd">
 +
<manifest:manifest xmlns:manifest="urn:oasis:names:tc:opendocument:xmlns:manifest:1.0">
 +
  <manifest:file-entry manifest:media-type="application/vnd.oasis.opendocument.text" manifest:full-path="/"/>
 +
  <manifest:file-entry manifest:media-type="text/xml" manifest:full-path="content.xml"/>
 +
  <manifest:file-entry manifest:media-type="text/xml" manifest:full-path="styles.xml"/>
 +
  <manifest:file-entry manifest:media-type="text/xml" manifest:full-path="meta.xml"/>
 +
  <manifest:file-entry manifest:media-type="text/xml" manifest:full-path="settings.xml"/>
 +
</manifest:manifest>
 +
 
 +
====content.xml====
 +
This is the file that contains all the text in the document.
 +
 
 +
The root element is always &lt;office:document-content&gt;. To get the text without metadata you go through the following hierarchy:
 +
* office:document-content
 +
** office:body
 +
*** office:text
 +
 
 +
There you will find tags in the ''text'' namespace, that either mirror HTML in their names or are self-explanatory for the most part. Some examplese are:
 +
* text:p - paragraph
 +
* text:list - a listing that will have several text:list-item elements
 +
* text:list-item - a single item of the list
 +
 
 +
Each text tag may have text:style attribute that links it to the style that is defined in office:document-content > office:automatic-styles > style:style.
 +
 
 +
====manifest.rdf====
 +
[[RDF]] metadata. Most often the contents are just
 +
 
 +
  <?xml version="1.0" encoding="utf-8"?>
 +
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
 +
  </rdf:RDF>
 +
 
 +
====meta.xml====
 +
This is the metadata that somebody fills in to describe the document or is automatically recorded by the software. The root element is always office:document-meta. The contents are defined rather loosely, the editing software is advised not to delete tags that it doesn't recognise, since other software maybe using them. In practice deleting all the contents of office:document-meta > office:meta will not damage the document, and it can be considered non-essential information.
 +
 
 +
====mimetype====
 +
A text file that consists of
 +
  application/vnd.oasis.opendocument.text
 +
 
 +
====settings.xml====
 +
Software specific settings of the document. The root tag is &lt;office:document-settings&gt;. No inner contents are required for the functioning document.
 +
 
 +
====styles.xml====
 +
Non-automatic document styles, that are held in &lt;office:document-styles&gt; tag.
 +
 
 +
== Microsoft Office 2010 ==
  
 
Microsoft Office 2010 seems to have some issues adhering to the OpenDocument standard. See the following link for more information;
 
Microsoft Office 2010 seems to have some issues adhering to the OpenDocument standard. See the following link for more information;
  
* [http://tbullock.comlore.com/2011/04/odf-12-support-in-microsoft-office.html Depth of Knowledge: ODF 1.2 Support in Microsoft Office]
+
* [http://tbullock.comlore.com/2011/04/odf-12-support-in-microsoft-office.html Depth of Knowledge: ODF 1.2 Support in Microsoft Office] April 9, 2011
 +
 
 +
== Microsoft Office 2013 ==
 +
 
 +
Microsoft Office 2013 supports ODF 1.2. See the following links for more information;
 +
 
 +
* [http://blogs.msdn.com/b/chrisrae/archive/2014/04/15/odf-1-2-enters-the-iso-standardization-process.aspx ODF 1.2 enters the ISO standardization process]  April 15, 2014
 +
* [http://www.zdnet.com/article/microsofts-office-2013-odf-1-2-support-could-be-true-catalyst-for-openoffice-adoption/ Microsoft's Office 2013 ODF 1.2 support could be true catalyst for OpenOffice adoption] August 15, 2012
 +
 
 +
== Sample files ==
 +
* [https://www.dan.info/sampledata/msword/testing.odt Sample document saved from Windows Word 2007]
  
 
== Links ==
 
== Links ==
Line 34: Line 108:
 
* [http://wiki.opf-labs.org/display/TR/OpenDocument+Text OpenDocument Text entries in OPF File Format Risk Registry ]
 
* [http://wiki.opf-labs.org/display/TR/OpenDocument+Text OpenDocument Text entries in OPF File Format Risk Registry ]
 
* [https://updegrove.wordpress.com/2014/03/12/odf-vs-ooxml-war-of-the-words/ ODF vs. OOXML: War of the Words]
 
* [https://updegrove.wordpress.com/2014/03/12/odf-vs-ooxml-war-of-the-words/ ODF vs. OOXML: War of the Words]
 +
* [http://askubuntu.com/questions/60778/how-can-i-convert-an-odt-file-to-a-pdf How to convert ODT to PDF]
 +
* [http://wiki.dpconline.org/images/c/c6/ODT_Assessment-v1.pdf File format preservation assessment (British Library)]
  
 
[[Category:XML based file formats]]
 
[[Category:XML based file formats]]
 
[[Category:ZIP based file formats]]
 
[[Category:ZIP based file formats]]

Latest revision as of 00:23, 12 February 2020

File Format
Name OpenDocument Text
Ontology
Extension(s) .odt, .fodt, .ott
MIME Type(s) application/vnd.oasis.opendocument.text, application/vnd.oasis.opendocument.text-template
LoCFDD fdd000427, fdd000428
PRONOM fmt/136, fmt/290, fmt/291
Released 2005-05-01

The OpenDocument Text format is one of a number of types of the Open Document Format for Office Applications (commonly referred to as OpenDocument), an XML-based file format defined by the Organization for the Advancement of Structured Information Standards (OASIS) in 2005.

OpenDocument Text can, like all OpenDocument files, be represented in one of two fashions - as a single XML document or as a collection of several sub-documents within a single package (commonly a ZIP archive). Generally, the extension .fodt is used for the uncommonly-used single XML documents and .odt for packaged sub-documents.

Contents

[edit] Image embedding issue

Both OpenOffice and LibreOffice are affected by a long-running bug where upon pasting an image into a document, only a hyperlink to the image is inserted by default (rather than the actual image data). More details can be found here. This bug was first reported in 2004, and as of 2013 it still had not been resolved. It was apparently fixed in LibreOffice in early 2014 [1].

[edit] Information

[edit] Zipped Archive Structure

When stored as a ZIP archive (with an .odt extension), it contains XML files describing text and relationships and JPEG, PNG, and other graphical files for pictures and other media included in the document.

The layout of the regular ODT file is the following:

  • META-INF
    • manifest.xml
  • Thumbnails
    • thumbnail.png
  • content.xml
  • manifest.rdf
  • meta.xml
  • mimetype
  • settings.xml
  • styles.xml

[edit] Inner files description

[edit] manifest.xml

Lists all the other xml files that are in this document. In the event of the simple document its contents maybe something like:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE manifest:manifest PUBLIC "-//OpenOffice.org//DTD Manifest 1.0//EN" "Manifest.dtd">
<manifest:manifest xmlns:manifest="urn:oasis:names:tc:opendocument:xmlns:manifest:1.0">
 <manifest:file-entry manifest:media-type="application/vnd.oasis.opendocument.text" manifest:full-path="/"/>
 <manifest:file-entry manifest:media-type="text/xml" manifest:full-path="content.xml"/>
 <manifest:file-entry manifest:media-type="text/xml" manifest:full-path="styles.xml"/>
 <manifest:file-entry manifest:media-type="text/xml" manifest:full-path="meta.xml"/>
 <manifest:file-entry manifest:media-type="text/xml" manifest:full-path="settings.xml"/>
</manifest:manifest>

[edit] content.xml

This is the file that contains all the text in the document.

The root element is always <office:document-content>. To get the text without metadata you go through the following hierarchy:

  • office:document-content
    • office:body
      • office:text

There you will find tags in the text namespace, that either mirror HTML in their names or are self-explanatory for the most part. Some examplese are:

  • text:p - paragraph
  • text:list - a listing that will have several text:list-item elements
  • text:list-item - a single item of the list

Each text tag may have text:style attribute that links it to the style that is defined in office:document-content > office:automatic-styles > style:style.

[edit] manifest.rdf

RDF metadata. Most often the contents are just

 <?xml version="1.0" encoding="utf-8"?>
 <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
 </rdf:RDF>

[edit] meta.xml

This is the metadata that somebody fills in to describe the document or is automatically recorded by the software. The root element is always office:document-meta. The contents are defined rather loosely, the editing software is advised not to delete tags that it doesn't recognise, since other software maybe using them. In practice deleting all the contents of office:document-meta > office:meta will not damage the document, and it can be considered non-essential information.

[edit] mimetype

A text file that consists of

  application/vnd.oasis.opendocument.text

[edit] settings.xml

Software specific settings of the document. The root tag is <office:document-settings>. No inner contents are required for the functioning document.

[edit] styles.xml

Non-automatic document styles, that are held in <office:document-styles> tag.

[edit] Microsoft Office 2010

Microsoft Office 2010 seems to have some issues adhering to the OpenDocument standard. See the following link for more information;

[edit] Microsoft Office 2013

Microsoft Office 2013 supports ODF 1.2. See the following links for more information;

[edit] Sample files

[edit] Links

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox