XML
| Dan Tobias  (Talk | contribs)  (XHTML) | m (added category ! as the general overview page of the category) | ||
| Line 37: | Line 37: | ||
| [[Category:Text-based data]] | [[Category:Text-based data]] | ||
| + | [[Category:XML based file formats|!]] | ||
Revision as of 17:10, 20 November 2012
Extensible Markup Language (XML) is a markup language used to encode data.
XML is a language from which languages are made. A body of rules for how an XML document for specific purpose may be constructed is often called a "language" or a "format" in its own right. These rules may be specified in several different ways, the most common being Document Type Definition (DTD) and Schema. A document which follows the syntactic rules of XML is considered "well-formed." A document which is well-formed and also conforms to its DTD or schema declarations is considered "valid."
A Document Type Definition may be included in an XML document or be referenced by a Document Type Declaration, or both approaches may be combined. An external reference to a DTD is provided by a Document Type Declaration, which confusingly has the same initials.
A Schema, unlike a DTD, is itself written in XML. A document can have a Schema for each of its namespaces. DTDs have been largely superseded by Schemas because of the former's limitation of one DTD per document and the latter's greater capacity for describing rules and namespace support.
XML documents refer to both Schemas and DTDs by a URI. It is crucial to remember that this reference is a Universal Resource Identifier, nor a Universal Resource Locator (URL). There is no requirement that the URI point to a resource on the Internet, or even that such a resource exist. This is a potential preservation risk with XML documents, as they may outlive the DTD and Schema documents that characterize them, or the documents may move and be difficult to locate.
There are variants of HTML which are expressed in XML-compliant syntax (which, for instance, requires the tags to be consistently lowercase, and elements with no closing tag have a slash before the right angle bracket at the end of the tag, like <BR /> instead of <BR>), and these are known as XHTML. This format may be served under the XML or HTML MIME types, and browsers might treat them differently in these cases.

