<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="http://fileformats.archiveteam.org/skins/common/feed.css?303"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
		<id>http://fileformats.archiveteam.org/index.php?action=history&amp;feed=atom&amp;title=Plain_text</id>
		<title>Plain text - Revision history</title>
		<link rel="self" type="application/atom+xml" href="http://fileformats.archiveteam.org/index.php?action=history&amp;feed=atom&amp;title=Plain_text"/>
		<link rel="alternate" type="text/html" href="http://fileformats.archiveteam.org/index.php?title=Plain_text&amp;action=history"/>
		<updated>2026-04-08T17:24:11Z</updated>
		<subtitle>Revision history for this page on the wiki</subtitle>
		<generator>MediaWiki 1.19.2</generator>

	<entry>
		<id>http://fileformats.archiveteam.org/index.php?title=Plain_text&amp;diff=51033&amp;oldid=prev</id>
		<title>265 993 303: Unicode as of 17.0 still does not include U+0A00 or U+0A0D</title>
		<link rel="alternate" type="text/html" href="http://fileformats.archiveteam.org/index.php?title=Plain_text&amp;diff=51033&amp;oldid=prev"/>
				<updated>2025-09-13T09:59:20Z</updated>
		
		<summary type="html">&lt;p&gt;Unicode as of 17.0 still does not include U+0A00 or U+0A0D&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;← Older revision&lt;/td&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 09:59, 13 September 2025&lt;/td&gt;
			&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 27:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 27:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[UTF-32]] text files are arrays of 32-bit integers representing Unicode code points and are usually detected by starting with the ''Byte Order Mark'' (BOM) consisting of the bytes &amp;lt;code&amp;gt;FF FE 00 00&amp;lt;/code&amp;gt; (for little endian &amp;lt;code&amp;gt;0x0000FEFF&amp;lt;/code&amp;gt;) or &amp;lt;code&amp;gt;00 00 FE FF&amp;lt;/code&amp;gt; (for big endian &amp;lt;code&amp;gt;0x0000FEFF&amp;lt;/code&amp;gt;). In some cases UTF-32 files may occur without the BOM, however, only &amp;lt;code&amp;gt;0x00000000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x0000D7FF&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;0x0000E000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x0010FFFF&amp;lt;/code&amp;gt; are valid ranges for dwords; &amp;lt;code&amp;gt;0x0000D800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x0000DFFF&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;0x00110000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFFFFFFFF&amp;lt;/code&amp;gt; are invalid.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[UTF-32]] text files are arrays of 32-bit integers representing Unicode code points and are usually detected by starting with the ''Byte Order Mark'' (BOM) consisting of the bytes &amp;lt;code&amp;gt;FF FE 00 00&amp;lt;/code&amp;gt; (for little endian &amp;lt;code&amp;gt;0x0000FEFF&amp;lt;/code&amp;gt;) or &amp;lt;code&amp;gt;00 00 FE FF&amp;lt;/code&amp;gt; (for big endian &amp;lt;code&amp;gt;0x0000FEFF&amp;lt;/code&amp;gt;). In some cases UTF-32 files may occur without the BOM, however, only &amp;lt;code&amp;gt;0x00000000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x0000D7FF&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;0x0000E000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x0010FFFF&amp;lt;/code&amp;gt; are valid ranges for dwords; &amp;lt;code&amp;gt;0x0000D800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x0000DFFF&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;0x00110000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFFFFFFFF&amp;lt;/code&amp;gt; are invalid.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[UTF-16]] text files are arrays of 16-bit integers representing code units and are usually detected by starting with the byte order mark (BOM) consisting of the bytes &amp;lt;code&amp;gt;FF FE&amp;lt;/code&amp;gt; (for little endian &amp;lt;code&amp;gt;0xFEFF&amp;lt;/code&amp;gt;) or &amp;lt;code&amp;gt;FE FF&amp;lt;/code&amp;gt; (for big endian &amp;lt;code&amp;gt;0xFEFF&amp;lt;/code&amp;gt;). However, in some cases UTF-16 files may occur without the BOM, in which case, detection is not guaranteed to be reliable, but the line feed (&amp;lt;code&amp;gt;0x000A&amp;lt;/code&amp;gt;) in its byte reversal (&amp;lt;code&amp;gt;0x0A00&amp;lt;/code&amp;gt;) is not in ''Unicode &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;16&lt;/del&gt;.0'', and null bytes are unlikely to occur in other text encodings, so the presence of word-aligned &amp;lt;code&amp;gt;00 0A&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;0A 00&amp;lt;/code&amp;gt; can rule out 8-bit encodings and one of the endianness and therefore may be used for UTF-16 detection. On the other hand, the bytes &amp;lt;code&amp;gt;0D 0A&amp;lt;/code&amp;gt; in little endian form &amp;lt;code&amp;gt;U+0A0D&amp;lt;/code&amp;gt; which is not in ''Unicode &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;16&lt;/del&gt;.0'' either but it is a common newline in 8-bit encodings. The detection of [[UCS-2]] text works similarly, since UCS-2 is the precursor of UTF-16, as UTF-16 introduced surrogate pairs formed by &amp;lt;code&amp;gt;0xD800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDBFF&amp;lt;/code&amp;gt; followed by &amp;lt;code&amp;gt;0xDC00&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDFFF&amp;lt;/code&amp;gt;, with other combinations of &amp;lt;code&amp;gt;0xD800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDFFF&amp;lt;/code&amp;gt; being invalid.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[UTF-16]] text files are arrays of 16-bit integers representing code units and are usually detected by starting with the byte order mark (BOM) consisting of the bytes &amp;lt;code&amp;gt;FF FE&amp;lt;/code&amp;gt; (for little endian &amp;lt;code&amp;gt;0xFEFF&amp;lt;/code&amp;gt;) or &amp;lt;code&amp;gt;FE FF&amp;lt;/code&amp;gt; (for big endian &amp;lt;code&amp;gt;0xFEFF&amp;lt;/code&amp;gt;). However, in some cases UTF-16 files may occur without the BOM, in which case, detection is not guaranteed to be reliable, but the line feed (&amp;lt;code&amp;gt;0x000A&amp;lt;/code&amp;gt;) in its byte reversal (&amp;lt;code&amp;gt;0x0A00&amp;lt;/code&amp;gt;) is not in ''Unicode &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;17&lt;/ins&gt;.0'', and null bytes are unlikely to occur in other text encodings, so the presence of word-aligned &amp;lt;code&amp;gt;00 0A&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;0A 00&amp;lt;/code&amp;gt; can rule out 8-bit encodings and one of the endianness and therefore may be used for UTF-16 detection. On the other hand, the bytes &amp;lt;code&amp;gt;0D 0A&amp;lt;/code&amp;gt; in little endian form &amp;lt;code&amp;gt;U+0A0D&amp;lt;/code&amp;gt; which is not in ''Unicode &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;17&lt;/ins&gt;.0'' either but it is a common newline in 8-bit encodings. The detection of [[UCS-2]] text works similarly, since UCS-2 is the precursor of UTF-16, as UTF-16 introduced surrogate pairs formed by &amp;lt;code&amp;gt;0xD800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDBFF&amp;lt;/code&amp;gt; followed by &amp;lt;code&amp;gt;0xDC00&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDFFF&amp;lt;/code&amp;gt;, with other combinations of &amp;lt;code&amp;gt;0xD800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDFFF&amp;lt;/code&amp;gt; being invalid.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[ASCII|ASCII-only]] text files may be detected by verifying that the file has all &amp;lt;code&amp;gt;0x01&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x7F&amp;lt;/code&amp;gt; bytes. &amp;lt;code&amp;gt;0x80&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFF&amp;lt;/code&amp;gt; are not used in ASCII encoding, and null characters by &amp;lt;code&amp;gt;0x00&amp;lt;/code&amp;gt; are not typically found in plain text; null bytes are much more likely to be in UTF-16 or UTF-32 text.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[ASCII|ASCII-only]] text files may be detected by verifying that the file has all &amp;lt;code&amp;gt;0x01&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x7F&amp;lt;/code&amp;gt; bytes. &amp;lt;code&amp;gt;0x80&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFF&amp;lt;/code&amp;gt; are not used in ASCII encoding, and null characters by &amp;lt;code&amp;gt;0x00&amp;lt;/code&amp;gt; are not typically found in plain text; null bytes are much more likely to be in UTF-16 or UTF-32 text.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>265 993 303</name></author>	</entry>

	<entry>
		<id>http://fileformats.archiveteam.org/index.php?title=Plain_text&amp;diff=49427&amp;oldid=prev</id>
		<title>265 993 303: Unicode as of 16.0 still does not include U+0A00 or U+0A0D, so the heuristic still works</title>
		<link rel="alternate" type="text/html" href="http://fileformats.archiveteam.org/index.php?title=Plain_text&amp;diff=49427&amp;oldid=prev"/>
				<updated>2024-09-10T21:22:35Z</updated>
		
		<summary type="html">&lt;p&gt;Unicode as of 16.0 still does not include U+0A00 or U+0A0D, so the heuristic still works&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;← Older revision&lt;/td&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 21:22, 10 September 2024&lt;/td&gt;
			&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 27:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 27:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[UTF-32]] text files are arrays of 32-bit integers representing Unicode code points and are usually detected by starting with the ''Byte Order Mark'' (BOM) consisting of the bytes &amp;lt;code&amp;gt;FF FE 00 00&amp;lt;/code&amp;gt; (for little endian &amp;lt;code&amp;gt;0x0000FEFF&amp;lt;/code&amp;gt;) or &amp;lt;code&amp;gt;00 00 FE FF&amp;lt;/code&amp;gt; (for big endian &amp;lt;code&amp;gt;0x0000FEFF&amp;lt;/code&amp;gt;). In some cases UTF-32 files may occur without the BOM, however, only &amp;lt;code&amp;gt;0x00000000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x0000D7FF&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;0x0000E000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x0010FFFF&amp;lt;/code&amp;gt; are valid ranges for dwords; &amp;lt;code&amp;gt;0x0000D800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x0000DFFF&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;0x00110000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFFFFFFFF&amp;lt;/code&amp;gt; are invalid.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[UTF-32]] text files are arrays of 32-bit integers representing Unicode code points and are usually detected by starting with the ''Byte Order Mark'' (BOM) consisting of the bytes &amp;lt;code&amp;gt;FF FE 00 00&amp;lt;/code&amp;gt; (for little endian &amp;lt;code&amp;gt;0x0000FEFF&amp;lt;/code&amp;gt;) or &amp;lt;code&amp;gt;00 00 FE FF&amp;lt;/code&amp;gt; (for big endian &amp;lt;code&amp;gt;0x0000FEFF&amp;lt;/code&amp;gt;). In some cases UTF-32 files may occur without the BOM, however, only &amp;lt;code&amp;gt;0x00000000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x0000D7FF&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;0x0000E000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x0010FFFF&amp;lt;/code&amp;gt; are valid ranges for dwords; &amp;lt;code&amp;gt;0x0000D800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x0000DFFF&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;0x00110000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFFFFFFFF&amp;lt;/code&amp;gt; are invalid.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[UTF-16]] text files are arrays of 16-bit integers representing code units and are usually detected by starting with the byte order mark (BOM) consisting of the bytes &amp;lt;code&amp;gt;FF FE&amp;lt;/code&amp;gt; (for little endian &amp;lt;code&amp;gt;0xFEFF&amp;lt;/code&amp;gt;) or &amp;lt;code&amp;gt;FE FF&amp;lt;/code&amp;gt; (for big endian &amp;lt;code&amp;gt;0xFEFF&amp;lt;/code&amp;gt;). However, in some cases UTF-16 files may occur without the BOM, in which case, detection is not guaranteed to be reliable, but the line feed (&amp;lt;code&amp;gt;0x000A&amp;lt;/code&amp;gt;) in its byte reversal (&amp;lt;code&amp;gt;0x0A00&amp;lt;/code&amp;gt;) is not in ''Unicode &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;15&lt;/del&gt;.&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;1&lt;/del&gt;'', and null bytes are unlikely to occur in other text encodings, so the presence of word-aligned &amp;lt;code&amp;gt;00 0A&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;0A 00&amp;lt;/code&amp;gt; can rule out 8-bit encodings and one of the endianness and therefore may be used for UTF-16 detection. On the other hand, the bytes &amp;lt;code&amp;gt;0D 0A&amp;lt;/code&amp;gt; in little endian form &amp;lt;code&amp;gt;U+0A0D&amp;lt;/code&amp;gt; which is not in ''Unicode &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;15&lt;/del&gt;.&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;1&lt;/del&gt;'' either but it is a common newline in 8-bit encodings. The detection of [[UCS-2]] text works similarly, since UCS-2 is the precursor of UTF-16, as UTF-16 introduced surrogate pairs formed by &amp;lt;code&amp;gt;0xD800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDBFF&amp;lt;/code&amp;gt; followed by &amp;lt;code&amp;gt;0xDC00&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDFFF&amp;lt;/code&amp;gt;, with other combinations of &amp;lt;code&amp;gt;0xD800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDFFF&amp;lt;/code&amp;gt; being invalid.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[UTF-16]] text files are arrays of 16-bit integers representing code units and are usually detected by starting with the byte order mark (BOM) consisting of the bytes &amp;lt;code&amp;gt;FF FE&amp;lt;/code&amp;gt; (for little endian &amp;lt;code&amp;gt;0xFEFF&amp;lt;/code&amp;gt;) or &amp;lt;code&amp;gt;FE FF&amp;lt;/code&amp;gt; (for big endian &amp;lt;code&amp;gt;0xFEFF&amp;lt;/code&amp;gt;). However, in some cases UTF-16 files may occur without the BOM, in which case, detection is not guaranteed to be reliable, but the line feed (&amp;lt;code&amp;gt;0x000A&amp;lt;/code&amp;gt;) in its byte reversal (&amp;lt;code&amp;gt;0x0A00&amp;lt;/code&amp;gt;) is not in ''Unicode &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;16&lt;/ins&gt;.&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;0&lt;/ins&gt;'', and null bytes are unlikely to occur in other text encodings, so the presence of word-aligned &amp;lt;code&amp;gt;00 0A&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;0A 00&amp;lt;/code&amp;gt; can rule out 8-bit encodings and one of the endianness and therefore may be used for UTF-16 detection. On the other hand, the bytes &amp;lt;code&amp;gt;0D 0A&amp;lt;/code&amp;gt; in little endian form &amp;lt;code&amp;gt;U+0A0D&amp;lt;/code&amp;gt; which is not in ''Unicode &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;16&lt;/ins&gt;.&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;0&lt;/ins&gt;'' either but it is a common newline in 8-bit encodings. The detection of [[UCS-2]] text works similarly, since UCS-2 is the precursor of UTF-16, as UTF-16 introduced surrogate pairs formed by &amp;lt;code&amp;gt;0xD800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDBFF&amp;lt;/code&amp;gt; followed by &amp;lt;code&amp;gt;0xDC00&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDFFF&amp;lt;/code&amp;gt;, with other combinations of &amp;lt;code&amp;gt;0xD800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDFFF&amp;lt;/code&amp;gt; being invalid.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[ASCII|ASCII-only]] text files may be detected by verifying that the file has all &amp;lt;code&amp;gt;0x01&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x7F&amp;lt;/code&amp;gt; bytes. &amp;lt;code&amp;gt;0x80&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFF&amp;lt;/code&amp;gt; are not used in ASCII encoding, and null characters by &amp;lt;code&amp;gt;0x00&amp;lt;/code&amp;gt; are not typically found in plain text; null bytes are much more likely to be in UTF-16 or UTF-32 text.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[ASCII|ASCII-only]] text files may be detected by verifying that the file has all &amp;lt;code&amp;gt;0x01&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x7F&amp;lt;/code&amp;gt; bytes. &amp;lt;code&amp;gt;0x80&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFF&amp;lt;/code&amp;gt; are not used in ASCII encoding, and null characters by &amp;lt;code&amp;gt;0x00&amp;lt;/code&amp;gt; are not typically found in plain text; null bytes are much more likely to be in UTF-16 or UTF-32 text.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[UTF-8]] text files may be detected by presence of any bytes from &amp;lt;code&amp;gt;0x80&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFF&amp;lt;/code&amp;gt; (to avoid &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;UTF-8 for &lt;/del&gt;ASCII-only files), absence of null bytes (if UTF-16 and UTF-32 haven't been ruled out yet), and verifying that the file is valid UTF-8. UTF-8 has many error cases; the only valid bit patterns are &amp;lt;code&amp;gt;0xxxxxxx&amp;lt;/code&amp;gt; (where x forms &amp;lt;code&amp;gt;0x00&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x7F&amp;lt;/code&amp;gt;), &amp;lt;code&amp;gt;110xxxxx&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;10xxxxxx&amp;lt;/code&amp;gt; (where x forms &amp;lt;code&amp;gt;0x0080&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x07FF&amp;lt;/code&amp;gt;, but not &amp;lt;code&amp;gt;0x00&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x7F&amp;lt;/code&amp;gt;), &amp;lt;code&amp;gt;1110xxxx&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;10xxxxxx&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;10xxxxxx&amp;lt;/code&amp;gt; (where x forms &amp;lt;code&amp;gt;0x0800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xD7FF&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;0xE000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFFFF&amp;lt;/code&amp;gt;, but not &amp;lt;code&amp;gt;0x0000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x07FF&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;0xD800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDFFF&amp;lt;/code&amp;gt;), and &amp;lt;code&amp;gt;11110xxx&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;10xxxxxx&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;10xxxxxx&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;10xxxxxx&amp;lt;/code&amp;gt; (where x forms &amp;lt;code&amp;gt;0x10000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x10FFFF&amp;lt;/code&amp;gt;, but not &amp;lt;code&amp;gt;0x0000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFFFF&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;0x110000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x1FFFFF&amp;lt;/code&amp;gt;). UTF-8 text files may also start with the UTF-8 byte order mark (EF BB BF), but should still be verified for validity.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[UTF-8]] text files may be detected by presence of any bytes from &amp;lt;code&amp;gt;0x80&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFF&amp;lt;/code&amp;gt; (to avoid &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;processing &lt;/ins&gt;ASCII-only files &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;as UTF-8&lt;/ins&gt;), absence of null bytes (if UTF-16 and UTF-32 haven't been ruled out yet), and verifying that the file is valid UTF-8. UTF-8 has many error cases; the only valid bit patterns are &amp;lt;code&amp;gt;0xxxxxxx&amp;lt;/code&amp;gt; (where x forms &amp;lt;code&amp;gt;0x00&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x7F&amp;lt;/code&amp;gt;), &amp;lt;code&amp;gt;110xxxxx&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;10xxxxxx&amp;lt;/code&amp;gt; (where x forms &amp;lt;code&amp;gt;0x0080&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x07FF&amp;lt;/code&amp;gt;, but not &amp;lt;code&amp;gt;0x00&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x7F&amp;lt;/code&amp;gt;), &amp;lt;code&amp;gt;1110xxxx&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;10xxxxxx&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;10xxxxxx&amp;lt;/code&amp;gt; (where x forms &amp;lt;code&amp;gt;0x0800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xD7FF&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;0xE000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFFFF&amp;lt;/code&amp;gt;, but not &amp;lt;code&amp;gt;0x0000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x07FF&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;0xD800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDFFF&amp;lt;/code&amp;gt;), and &amp;lt;code&amp;gt;11110xxx&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;10xxxxxx&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;10xxxxxx&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;10xxxxxx&amp;lt;/code&amp;gt; (where x forms &amp;lt;code&amp;gt;0x10000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x10FFFF&amp;lt;/code&amp;gt;, but not &amp;lt;code&amp;gt;0x0000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFFFF&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;0x110000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x1FFFFF&amp;lt;/code&amp;gt;). UTF-8 text files may also start with the UTF-8 byte order mark (EF BB BF), but should still be verified for validity.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;When a file is known to be a plain text file but [[UTF-32]], [[UTF-16]], [[ASCII]], and [[UTF-8]] were already ruled out, only 8-bit encodings or mixed single byte/double byte encodings (such as [[JIS|Shift JIS]]) remain. In this case, the only thing left (other than applying complex heuristics) is to use the regional or system text encoding, such as [[Windows 1252|CP1252]], [[Windows 1250|CP1250]], [[CP437]], [[CP852]], etc..&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;When a file is known to be a plain text file but [[UTF-32]], [[UTF-16]], [[ASCII]], and [[UTF-8]] were already ruled out, only 8-bit encodings or mixed single byte/double byte encodings (such as [[JIS|Shift JIS]]) remain. In this case, the only thing left (other than applying complex heuristics) is to use the regional or system text encoding, such as [[Windows 1252|CP1252]], [[Windows 1250|CP1250]], [[CP437]], [[CP852]], etc..&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>265 993 303</name></author>	</entry>

	<entry>
		<id>http://fileformats.archiveteam.org/index.php?title=Plain_text&amp;diff=47405&amp;oldid=prev</id>
		<title>Dexvertbot: Added sample files</title>
		<link rel="alternate" type="text/html" href="http://fileformats.archiveteam.org/index.php?title=Plain_text&amp;diff=47405&amp;oldid=prev"/>
				<updated>2023-12-28T14:55:35Z</updated>
		
		<summary type="html">&lt;p&gt;Added sample files&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;← Older revision&lt;/td&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 14:55, 28 December 2023&lt;/td&gt;
			&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 40:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 40:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;== Software ==&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;== Software ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;* [http://textract.readthedocs.org/en/latest/ Textract: extract text from various document formats]&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;* [http://textract.readthedocs.org/en/latest/ Textract: extract text from various document formats]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;== Sample files ==&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;* {{DexvertSamples|text/txt}}&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;* {{DexvertSamples|text/utf16Text}}&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;== Links and References ==&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;== Links and References ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Dexvertbot</name></author>	</entry>

	<entry>
		<id>http://fileformats.archiveteam.org/index.php?title=Plain_text&amp;diff=45390&amp;oldid=prev</id>
		<title>265 993 303 at 18:24, 13 September 2023</title>
		<link rel="alternate" type="text/html" href="http://fileformats.archiveteam.org/index.php?title=Plain_text&amp;diff=45390&amp;oldid=prev"/>
				<updated>2023-09-13T18:24:21Z</updated>
		
		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;← Older revision&lt;/td&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 18:24, 13 September 2023&lt;/td&gt;
			&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 25:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 25:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;== Identification ==&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;== Identification ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[UTF-32]] text files are usually detected by starting with the ''Byte Order Mark'' (BOM) consisting of the bytes &amp;lt;code&amp;gt;FF FE 00 00&amp;lt;/code&amp;gt; (for little endian &amp;lt;code&amp;gt;0x0000FEFF&amp;lt;/code&amp;gt;) or &amp;lt;code&amp;gt;00 00 FE FF&amp;lt;/code&amp;gt; (for big endian &amp;lt;code&amp;gt;0x0000FEFF&amp;lt;/code&amp;gt;). In some cases UTF-32 files may occur without the BOM, however, only &amp;lt;code&amp;gt;0x00000000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x0000D7FF&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;0x0000E000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x0010FFFF&amp;lt;/code&amp;gt; are valid ranges for dwords; &amp;lt;code&amp;gt;0x0000D800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x0000DFFF&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;0x00110000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFFFFFFFF&amp;lt;/code&amp;gt; are invalid.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[UTF-32]] text files &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;are arrays of 32-bit integers representing Unicode code points and &lt;/ins&gt;are usually detected by starting with the ''Byte Order Mark'' (BOM) consisting of the bytes &amp;lt;code&amp;gt;FF FE 00 00&amp;lt;/code&amp;gt; (for little endian &amp;lt;code&amp;gt;0x0000FEFF&amp;lt;/code&amp;gt;) or &amp;lt;code&amp;gt;00 00 FE FF&amp;lt;/code&amp;gt; (for big endian &amp;lt;code&amp;gt;0x0000FEFF&amp;lt;/code&amp;gt;). In some cases UTF-32 files may occur without the BOM, however, only &amp;lt;code&amp;gt;0x00000000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x0000D7FF&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;0x0000E000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x0010FFFF&amp;lt;/code&amp;gt; are valid ranges for dwords; &amp;lt;code&amp;gt;0x0000D800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x0000DFFF&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;0x00110000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFFFFFFFF&amp;lt;/code&amp;gt; are invalid.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[UTF-16]] text files are usually detected by starting with the byte order mark (BOM) consisting of the bytes &amp;lt;code&amp;gt;FF FE&amp;lt;/code&amp;gt; (for little endian &amp;lt;code&amp;gt;0xFEFF&amp;lt;/code&amp;gt;) or FE FF (for big endian &amp;lt;code&amp;gt;0xFEFF&amp;lt;/code&amp;gt;). However, in some cases UTF-16 files may occur without the BOM, in which case, detection is not guaranteed to be reliable, but the line feed (&amp;lt;code&amp;gt;0x000A&amp;lt;/code&amp;gt;) in its byte reversal (&amp;lt;code&amp;gt;0x0A00&amp;lt;/code&amp;gt;) is not in ''Unicode 15.&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;0&lt;/del&gt;'', and null bytes are unlikely to occur in other text encodings, so the presence of word-aligned &amp;lt;code&amp;gt;00 0A&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;0A 00&amp;lt;/code&amp;gt; can rule out 8-bit encodings and one of the endianness and therefore may be used for UTF-16 detection. On the other hand, the bytes &amp;lt;code&amp;gt;0D 0A&amp;lt;/code&amp;gt; in little endian form &amp;lt;code&amp;gt;U+0A0D&amp;lt;/code&amp;gt; which is not in ''Unicode 15.&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;0&lt;/del&gt;'' either but it is a common newline in 8-bit encodings. The detection of [[UCS-2]] text works similarly, since UCS-2 is the precursor of UTF-16, as UTF-16 introduced surrogate pairs formed by &amp;lt;code&amp;gt;0xD800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDBFF&amp;lt;/code&amp;gt; followed by &amp;lt;code&amp;gt;0xDC00&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDFFF&amp;lt;/code&amp;gt;, with other combinations of &amp;lt;code&amp;gt;0xD800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDFFF&amp;lt;/code&amp;gt; being invalid.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[UTF-16]] text files &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;are arrays of 16-bit integers representing code units and &lt;/ins&gt;are usually detected by starting with the byte order mark (BOM) consisting of the bytes &amp;lt;code&amp;gt;FF FE&amp;lt;/code&amp;gt; (for little endian &amp;lt;code&amp;gt;0xFEFF&amp;lt;/code&amp;gt;) or &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;&lt;/ins&gt;FE FF&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;/code&amp;gt; &lt;/ins&gt;(for big endian &amp;lt;code&amp;gt;0xFEFF&amp;lt;/code&amp;gt;). However, in some cases UTF-16 files may occur without the BOM, in which case, detection is not guaranteed to be reliable, but the line feed (&amp;lt;code&amp;gt;0x000A&amp;lt;/code&amp;gt;) in its byte reversal (&amp;lt;code&amp;gt;0x0A00&amp;lt;/code&amp;gt;) is not in ''Unicode 15.&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;1&lt;/ins&gt;'', and null bytes are unlikely to occur in other text encodings, so the presence of word-aligned &amp;lt;code&amp;gt;00 0A&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;0A 00&amp;lt;/code&amp;gt; can rule out 8-bit encodings and one of the endianness and therefore may be used for UTF-16 detection. On the other hand, the bytes &amp;lt;code&amp;gt;0D 0A&amp;lt;/code&amp;gt; in little endian form &amp;lt;code&amp;gt;U+0A0D&amp;lt;/code&amp;gt; which is not in ''Unicode 15.&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;1&lt;/ins&gt;'' either but it is a common newline in 8-bit encodings. The detection of [[UCS-2]] text works similarly, since UCS-2 is the precursor of UTF-16, as UTF-16 introduced surrogate pairs formed by &amp;lt;code&amp;gt;0xD800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDBFF&amp;lt;/code&amp;gt; followed by &amp;lt;code&amp;gt;0xDC00&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDFFF&amp;lt;/code&amp;gt;, with other combinations of &amp;lt;code&amp;gt;0xD800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDFFF&amp;lt;/code&amp;gt; being invalid.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[ASCII|ASCII-only]] text files may be detected by verifying that the file has all &amp;lt;code&amp;gt;0x01&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x7F&amp;lt;/code&amp;gt; bytes.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[ASCII|ASCII-only]] text files may be detected by verifying that the file has all &amp;lt;code&amp;gt;0x01&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x7F&amp;lt;/code&amp;gt; bytes&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;. &amp;lt;code&amp;gt;0x80&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFF&amp;lt;/code&amp;gt; are not used in ASCII encoding, and null characters by &amp;lt;code&amp;gt;0x00&amp;lt;/code&amp;gt; are not typically found in plain text; null bytes are much more likely to be in UTF-16 or UTF-32 text&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[UTF-8]] text files may be detected by presence of any bytes from &amp;lt;code&amp;gt;0x80&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFF&amp;lt;/code&amp;gt;, absence of null bytes (if UTF-16 &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;hasn&lt;/del&gt;'t been ruled out yet), &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;or &lt;/del&gt;verifying that the file is valid UTF-8. UTF-8 has many error cases; the only valid bit patterns are &amp;lt;code&amp;gt;0xxxxxxx&amp;lt;/code&amp;gt; (where x forms &amp;lt;code&amp;gt;0x00&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x7F&amp;lt;/code&amp;gt;), &amp;lt;code&amp;gt;110xxxxx&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;10xxxxxx&amp;lt;/code&amp;gt; (where x forms &amp;lt;code&amp;gt;0x0080&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x07FF&amp;lt;/code&amp;gt;, but not &amp;lt;code&amp;gt;0x00&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x7F&amp;lt;/code&amp;gt;), &amp;lt;code&amp;gt;1110xxxx&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;10xxxxxx&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;10xxxxxx&amp;lt;/code&amp;gt; (where x forms &amp;lt;code&amp;gt;0x0800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xD7FF&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;0xE000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFFFF&amp;lt;/code&amp;gt;, but not &amp;lt;code&amp;gt;0x0000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x07FF&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;0xD800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDFFF&amp;lt;/code&amp;gt;), and &amp;lt;code&amp;gt;11110xxx&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;10xxxxxx&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;10xxxxxx&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;10xxxxxx&amp;lt;/code&amp;gt; (where x forms &amp;lt;code&amp;gt;0x10000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x10FFFF&amp;lt;/code&amp;gt;, but not &amp;lt;code&amp;gt;0x0000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFFFF&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;0x110000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x1FFFFF&amp;lt;/code&amp;gt;). UTF-8 text files may also start with the UTF-8 byte order mark (EF BB BF).&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[UTF-8]] text files may be detected by presence of any bytes from &amp;lt;code&amp;gt;0x80&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFF&amp;lt;/code&amp;gt; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;(to avoid UTF-8 for ASCII-only files)&lt;/ins&gt;, absence of null bytes (if UTF-16 &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;and UTF-32 haven&lt;/ins&gt;'t been ruled out yet), &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;and &lt;/ins&gt;verifying that the file is valid UTF-8. UTF-8 has many error cases; the only valid bit patterns are &amp;lt;code&amp;gt;0xxxxxxx&amp;lt;/code&amp;gt; (where x forms &amp;lt;code&amp;gt;0x00&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x7F&amp;lt;/code&amp;gt;), &amp;lt;code&amp;gt;110xxxxx&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;10xxxxxx&amp;lt;/code&amp;gt; (where x forms &amp;lt;code&amp;gt;0x0080&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x07FF&amp;lt;/code&amp;gt;, but not &amp;lt;code&amp;gt;0x00&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x7F&amp;lt;/code&amp;gt;), &amp;lt;code&amp;gt;1110xxxx&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;10xxxxxx&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;10xxxxxx&amp;lt;/code&amp;gt; (where x forms &amp;lt;code&amp;gt;0x0800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xD7FF&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;0xE000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFFFF&amp;lt;/code&amp;gt;, but not &amp;lt;code&amp;gt;0x0000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x07FF&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;0xD800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDFFF&amp;lt;/code&amp;gt;), and &amp;lt;code&amp;gt;11110xxx&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;10xxxxxx&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;10xxxxxx&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;10xxxxxx&amp;lt;/code&amp;gt; (where x forms &amp;lt;code&amp;gt;0x10000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x10FFFF&amp;lt;/code&amp;gt;, but not &amp;lt;code&amp;gt;0x0000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFFFF&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;0x110000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x1FFFFF&amp;lt;/code&amp;gt;). UTF-8 text files may also start with the UTF-8 byte order mark (EF BB BF)&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;, but should still be verified for validity&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;When a file is known to be a plain text file but [[UTF-32]], [[UTF-16]], [[ASCII]], and [[UTF-8]] were already ruled out, only 8-bit encodings or mixed single byte/double byte encodings (such as [[JIS|Shift JIS]]) remain. In this case, the only thing left (other than applying complex heuristics) is to use the regional or system text encoding, such as [[Windows 1252|CP1252]], [[Windows 1250|CP1250]], [[CP437]], [[CP852]], etc..&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;When a file is known to be a plain text file but [[UTF-32]], [[UTF-16]], [[ASCII]], and [[UTF-8]] were already ruled out, only 8-bit encodings or mixed single byte/double byte encodings (such as [[JIS|Shift JIS]]) remain. In this case, the only thing left (other than applying complex heuristics) is to use the regional or system text encoding, such as [[Windows 1252|CP1252]], [[Windows 1250|CP1250]], [[CP437]], [[CP852]], etc..&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>265 993 303</name></author>	</entry>

	<entry>
		<id>http://fileformats.archiveteam.org/index.php?title=Plain_text&amp;diff=44911&amp;oldid=prev</id>
		<title>265 993 303: 0xFEFF is byte order mark</title>
		<link rel="alternate" type="text/html" href="http://fileformats.archiveteam.org/index.php?title=Plain_text&amp;diff=44911&amp;oldid=prev"/>
				<updated>2023-08-23T05:34:39Z</updated>
		
		<summary type="html">&lt;p&gt;0xFEFF is byte order mark&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;← Older revision&lt;/td&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 05:34, 23 August 2023&lt;/td&gt;
			&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 25:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 25:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;== Identification ==&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;== Identification ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[UTF-32]] text files are usually detected by starting with the ''Byte Order Mark'' (BOM) consisting of the bytes &amp;lt;code&amp;gt;FF FE 00 00&amp;lt;/code&amp;gt; (for little endian &amp;lt;code&amp;gt;0x0000FEFF&amp;lt;/code&amp;gt;) or &amp;lt;code&amp;gt;00 00 FE FF&amp;lt;/code&amp;gt; (for big endian &amp;lt;code&amp;gt;&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;0x0000FFFE&lt;/del&gt;&amp;lt;/code&amp;gt;). In some cases UTF-32 files may occur without the BOM, however, only &amp;lt;code&amp;gt;0x00000000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x0000D7FF&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;0x0000E000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x0010FFFF&amp;lt;/code&amp;gt; are valid ranges for dwords; &amp;lt;code&amp;gt;0x0000D800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x0000DFFF&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;0x00110000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFFFFFFFF&amp;lt;/code&amp;gt; are invalid.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[UTF-32]] text files are usually detected by starting with the ''Byte Order Mark'' (BOM) consisting of the bytes &amp;lt;code&amp;gt;FF FE 00 00&amp;lt;/code&amp;gt; (for little endian &amp;lt;code&amp;gt;0x0000FEFF&amp;lt;/code&amp;gt;) or &amp;lt;code&amp;gt;00 00 FE FF&amp;lt;/code&amp;gt; (for big endian &amp;lt;code&amp;gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;0x0000FEFF&lt;/ins&gt;&amp;lt;/code&amp;gt;). In some cases UTF-32 files may occur without the BOM, however, only &amp;lt;code&amp;gt;0x00000000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x0000D7FF&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;0x0000E000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x0010FFFF&amp;lt;/code&amp;gt; are valid ranges for dwords; &amp;lt;code&amp;gt;0x0000D800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x0000DFFF&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;0x00110000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFFFFFFFF&amp;lt;/code&amp;gt; are invalid.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[UTF-16]] text files are usually detected by starting with the byte order mark (BOM) consisting of the bytes &amp;lt;code&amp;gt;FF FE&amp;lt;/code&amp;gt; (for little endian &amp;lt;code&amp;gt;0xFEFF&amp;lt;/code&amp;gt;) or FE FF (for big endian &amp;lt;code&amp;gt;&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;0xFFFE&lt;/del&gt;&amp;lt;/code&amp;gt;). However, in some cases UTF-16 files may occur without the BOM, in which case, detection is not guaranteed to be reliable, but the line feed (&amp;lt;code&amp;gt;0x000A&amp;lt;/code&amp;gt;) in its byte reversal (&amp;lt;code&amp;gt;0x0A00&amp;lt;/code&amp;gt;) is not in ''Unicode 15.0'', and null bytes are unlikely to occur in other text encodings, so the presence of word-aligned &amp;lt;code&amp;gt;00 0A&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;0A 00&amp;lt;/code&amp;gt; can rule out 8-bit encodings and one of the endianness and therefore may be used for UTF-16 detection. On the other hand, the bytes &amp;lt;code&amp;gt;0D 0A&amp;lt;/code&amp;gt; in little endian form &amp;lt;code&amp;gt;U+0A0D&amp;lt;/code&amp;gt; which is not in ''Unicode 15.0'' either but it is a common newline in 8-bit encodings. The detection of [[UCS-2]] text works similarly, since UCS-2 is the precursor of UTF-16, as UTF-16 introduced surrogate pairs formed by &amp;lt;code&amp;gt;0xD800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDBFF&amp;lt;/code&amp;gt; followed by &amp;lt;code&amp;gt;0xDC00&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDFFF&amp;lt;/code&amp;gt;, with other combinations of &amp;lt;code&amp;gt;0xD800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDFFF&amp;lt;/code&amp;gt; being invalid.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[UTF-16]] text files are usually detected by starting with the byte order mark (BOM) consisting of the bytes &amp;lt;code&amp;gt;FF FE&amp;lt;/code&amp;gt; (for little endian &amp;lt;code&amp;gt;0xFEFF&amp;lt;/code&amp;gt;) or FE FF (for big endian &amp;lt;code&amp;gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;0xFEFF&lt;/ins&gt;&amp;lt;/code&amp;gt;). However, in some cases UTF-16 files may occur without the BOM, in which case, detection is not guaranteed to be reliable, but the line feed (&amp;lt;code&amp;gt;0x000A&amp;lt;/code&amp;gt;) in its byte reversal (&amp;lt;code&amp;gt;0x0A00&amp;lt;/code&amp;gt;) is not in ''Unicode 15.0'', and null bytes are unlikely to occur in other text encodings, so the presence of word-aligned &amp;lt;code&amp;gt;00 0A&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;0A 00&amp;lt;/code&amp;gt; can rule out 8-bit encodings and one of the endianness and therefore may be used for UTF-16 detection. On the other hand, the bytes &amp;lt;code&amp;gt;0D 0A&amp;lt;/code&amp;gt; in little endian form &amp;lt;code&amp;gt;U+0A0D&amp;lt;/code&amp;gt; which is not in ''Unicode 15.0'' either but it is a common newline in 8-bit encodings. The detection of [[UCS-2]] text works similarly, since UCS-2 is the precursor of UTF-16, as UTF-16 introduced surrogate pairs formed by &amp;lt;code&amp;gt;0xD800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDBFF&amp;lt;/code&amp;gt; followed by &amp;lt;code&amp;gt;0xDC00&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDFFF&amp;lt;/code&amp;gt;, with other combinations of &amp;lt;code&amp;gt;0xD800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDFFF&amp;lt;/code&amp;gt; being invalid.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[ASCII|ASCII-only]] text files may be detected by verifying that the file has all &amp;lt;code&amp;gt;0x01&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x7F&amp;lt;/code&amp;gt; bytes.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;[[ASCII|ASCII-only]] text files may be detected by verifying that the file has all &amp;lt;code&amp;gt;0x01&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x7F&amp;lt;/code&amp;gt; bytes.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>265 993 303</name></author>	</entry>

	<entry>
		<id>http://fileformats.archiveteam.org/index.php?title=Plain_text&amp;diff=44576&amp;oldid=prev</id>
		<title>Foxtrot: /* Identification */</title>
		<link rel="alternate" type="text/html" href="http://fileformats.archiveteam.org/index.php?title=Plain_text&amp;diff=44576&amp;oldid=prev"/>
				<updated>2023-06-20T12:25:10Z</updated>
		
		<summary type="html">&lt;p&gt;‎&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Identification&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;← Older revision&lt;/td&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 12:25, 20 June 2023&lt;/td&gt;
			&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 25:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 25:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;== Identification ==&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;== Identification ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;UTF-32 text files are usually detected by starting with the &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;byte order mark &lt;/del&gt;(BOM) consisting of the bytes FF FE 00 00 (for little endian 0x0000FEFF) or 00 00 FE FF (for big endian 0x0000FFFE). In some cases UTF-32 files may occur without the BOM, however, only &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;0x00000000—0x0000D7FF &lt;/del&gt;and &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;0x0000E000—0x0010FFFF &lt;/del&gt;are valid ranges for dwords; &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;0x0000D800—0x0000DFFF &lt;/del&gt;and &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;0x00110000—0xFFFFFFFF &lt;/del&gt;are invalid.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;[[&lt;/ins&gt;UTF-32&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;]] &lt;/ins&gt;text files are usually detected by starting with the &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;''Byte Order Mark'' &lt;/ins&gt;(BOM) consisting of the bytes &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;&lt;/ins&gt;FF FE 00 00&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;/code&amp;gt; &lt;/ins&gt;(for little endian &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;&lt;/ins&gt;0x0000FEFF&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;/code&amp;gt;&lt;/ins&gt;) or &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;&lt;/ins&gt;00 00 FE FF&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;/code&amp;gt; &lt;/ins&gt;(for big endian &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;&lt;/ins&gt;0x0000FFFE&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;/code&amp;gt;&lt;/ins&gt;). In some cases UTF-32 files may occur without the BOM, however, only &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;0x00000000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x0000D7FF&amp;lt;/code&amp;gt; &lt;/ins&gt;and &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;0x0000E000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x0010FFFF&amp;lt;/code&amp;gt; &lt;/ins&gt;are valid ranges for dwords; &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;0x0000D800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x0000DFFF&amp;lt;/code&amp;gt; &lt;/ins&gt;and &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;0x00110000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFFFFFFFF&amp;lt;/code&amp;gt; &lt;/ins&gt;are invalid.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;UTF-16 text files are usually detected by starting with the byte order mark (BOM) consisting of the bytes FF FE (for little endian 0xFEFF) or FE FF (for big endian 0xFFFE). However, in some cases UTF-16 files may occur without the BOM, in which case, detection is not guaranteed to be reliable, but the line feed (0x000A) in its byte reversal (0x0A00) is not in Unicode 15.0, and null bytes are unlikely to occur in other text encodings, so the presence of word-aligned 00 0A or 0A 00 can rule out 8-bit encodings and one of the endianness and therefore may be used for UTF-16 detection. On the other hand, the bytes 0D 0A in little endian form U+0A0D which is not in Unicode 15.0 either but it is a common newline in 8-bit encodings. The detection of UCS-2 text works similarly, since UCS-2 is the precursor of UTF-16, as UTF-16 introduced surrogate pairs formed by &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;0xD800—0xDBFF &lt;/del&gt;followed by &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;0xDC00—0xDFFF&lt;/del&gt;, with other combinations of &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;0xD800—0xDFFF &lt;/del&gt;being invalid.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;[[&lt;/ins&gt;UTF-16&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;]] &lt;/ins&gt;text files are usually detected by starting with the byte order mark (BOM) consisting of the bytes &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;&lt;/ins&gt;FF FE&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;/code&amp;gt; &lt;/ins&gt;(for little endian &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;&lt;/ins&gt;0xFEFF&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;/code&amp;gt;&lt;/ins&gt;) or FE FF (for big endian &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;&lt;/ins&gt;0xFFFE&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;/code&amp;gt;&lt;/ins&gt;). However, in some cases UTF-16 files may occur without the BOM, in which case, detection is not guaranteed to be reliable, but the line feed (&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;&lt;/ins&gt;0x000A&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;/code&amp;gt;&lt;/ins&gt;) in its byte reversal (&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;&lt;/ins&gt;0x0A00&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;/code&amp;gt;&lt;/ins&gt;) is not in &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;''&lt;/ins&gt;Unicode 15.0&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;''&lt;/ins&gt;, and null bytes are unlikely to occur in other text encodings, so the presence of word-aligned &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;&lt;/ins&gt;00 0A&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;/code&amp;gt; &lt;/ins&gt;or &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;&lt;/ins&gt;0A 00&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;/code&amp;gt; &lt;/ins&gt;can rule out 8-bit encodings and one of the endianness and therefore may be used for UTF-16 detection. On the other hand, the bytes &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;&lt;/ins&gt;0D 0A&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;/code&amp;gt; &lt;/ins&gt;in little endian form &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;&lt;/ins&gt;U+0A0D&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;/code&amp;gt; &lt;/ins&gt;which is not in &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;''&lt;/ins&gt;Unicode 15.0&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;'' &lt;/ins&gt;either but it is a common newline in 8-bit encodings. The detection of &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;[[&lt;/ins&gt;UCS-2&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;]] &lt;/ins&gt;text works similarly, since UCS-2 is the precursor of UTF-16, as UTF-16 introduced surrogate pairs formed by &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;0xD800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDBFF&amp;lt;/code&amp;gt; &lt;/ins&gt;followed by &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;0xDC00&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDFFF&amp;lt;/code&amp;gt;&lt;/ins&gt;, with other combinations of &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;0xD800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDFFF&amp;lt;/code&amp;gt; &lt;/ins&gt;being invalid.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;ASCII only text files may be detected by verifying that the file has all &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;0x01—0x7F &lt;/del&gt;bytes.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;[[&lt;/ins&gt;ASCII&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;|ASCII-&lt;/ins&gt;only&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;]] &lt;/ins&gt;text files may be detected by verifying that the file has all &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;0x01&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x7F&amp;lt;/code&amp;gt; &lt;/ins&gt;bytes.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;UTF-8 text files may be detected by presence of any bytes from &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;0x80—0xFF&lt;/del&gt;, absence of null bytes (if UTF-16 hasn't been ruled out yet), or verifying that the file is valid UTF-8. UTF-8 has many error cases; the only valid bit patterns are 0xxxxxxx (where x forms &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;0x00—0x7F&lt;/del&gt;), 110xxxxx 10xxxxxx (where x forms &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;0x0080—0x07FF&lt;/del&gt;, but not &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;0x00—0x7F&lt;/del&gt;), 1110xxxx 10xxxxxx 10xxxxxx (where x forms &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;0x0800—0xD7FF 0xE000—0xFFFF&lt;/del&gt;, but not &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;0x0000—0x07FF &lt;/del&gt;or &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;0xD800—0xDFFF&lt;/del&gt;), and 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx (where x forms &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;0x10000—0x10FFFF&lt;/del&gt;, but not &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;0x0000—0xFFFF &lt;/del&gt;or &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;0x110000—0x1FFFFF&lt;/del&gt;). UTF-8 text files may also start with the UTF-8 byte order mark (EF BB BF).&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;[[&lt;/ins&gt;UTF-8&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;]] &lt;/ins&gt;text files may be detected by presence of any bytes from &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;0x80&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFF&amp;lt;/code&amp;gt;&lt;/ins&gt;, absence of null bytes (if UTF-16 hasn't been ruled out yet), or verifying that the file is valid UTF-8. UTF-8 has many error cases; the only valid bit patterns are &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;&lt;/ins&gt;0xxxxxxx&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;/code&amp;gt; &lt;/ins&gt;(where x forms &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;0x00&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x7F&amp;lt;/code&amp;gt;&lt;/ins&gt;), &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;&lt;/ins&gt;110xxxxx&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;&lt;/ins&gt;10xxxxxx&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;/code&amp;gt; &lt;/ins&gt;(where x forms &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;0x0080&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x07FF&amp;lt;/code&amp;gt;&lt;/ins&gt;, but not &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;0x00&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x7F&amp;lt;/code&amp;gt;&lt;/ins&gt;), &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;&lt;/ins&gt;1110xxxx&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;&lt;/ins&gt;10xxxxxx&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;&lt;/ins&gt;10xxxxxx&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;/code&amp;gt; &lt;/ins&gt;(where x forms &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;0x0800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xD7FF&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;0xE000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFFFF&amp;lt;/code&amp;gt;&lt;/ins&gt;, but not &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;0x0000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x07FF&amp;lt;/code&amp;gt; &lt;/ins&gt;or &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;0xD800&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xDFFF&amp;lt;/code&amp;gt;&lt;/ins&gt;), and &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;&lt;/ins&gt;11110xxx&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;&lt;/ins&gt;10xxxxxx&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;&lt;/ins&gt;10xxxxxx&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;/code&amp;gt; &amp;lt;code&amp;gt;&lt;/ins&gt;10xxxxxx&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;/code&amp;gt; &lt;/ins&gt;(where x forms &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;0x10000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x10FFFF&amp;lt;/code&amp;gt;&lt;/ins&gt;, but not &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;0x0000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0xFFFF&amp;lt;/code&amp;gt; &lt;/ins&gt;or &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;&amp;lt;code&amp;gt;0x110000&amp;lt;/code&amp;gt;—&amp;lt;code&amp;gt;0x1FFFFF&amp;lt;/code&amp;gt;&lt;/ins&gt;). UTF-8 text files may also start with the UTF-8 byte order mark (EF BB BF).&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;When a file is known to be a plain text file but UTF-32, UTF-16, ASCII, and UTF-8 were already ruled out, only 8-bit encodings or mixed single byte/double byte encodings (such as Shift JIS) remain. In this case, the only thing left (other than applying complex heuristics) is to use the regional or system text encoding, such as CP1252, CP1250, CP437, CP852, etc..&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;When a file is known to be a plain text file but &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;[[&lt;/ins&gt;UTF-32&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;]]&lt;/ins&gt;, &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;[[&lt;/ins&gt;UTF-16&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;]]&lt;/ins&gt;, &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;[[&lt;/ins&gt;ASCII&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;]]&lt;/ins&gt;, and &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;[[&lt;/ins&gt;UTF-8&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;]] &lt;/ins&gt;were already ruled out, only 8-bit encodings or mixed single byte/double byte encodings (such as &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;[[JIS|&lt;/ins&gt;Shift JIS&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;]]&lt;/ins&gt;) remain. In this case, the only thing left (other than applying complex heuristics) is to use the regional or system text encoding, such as &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;[[Windows 1252|&lt;/ins&gt;CP1252&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;]]&lt;/ins&gt;, &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;[[Windows 1250|&lt;/ins&gt;CP1250&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;]]&lt;/ins&gt;, &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;[[&lt;/ins&gt;CP437&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;]]&lt;/ins&gt;, &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;[[&lt;/ins&gt;CP852&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;]]&lt;/ins&gt;, etc..&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;== See also ==&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;== See also ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Foxtrot</name></author>	</entry>

	<entry>
		<id>http://fileformats.archiveteam.org/index.php?title=Plain_text&amp;diff=44575&amp;oldid=prev</id>
		<title>265 993 303 at 10:45, 20 June 2023</title>
		<link rel="alternate" type="text/html" href="http://fileformats.archiveteam.org/index.php?title=Plain_text&amp;diff=44575&amp;oldid=prev"/>
				<updated>2023-06-20T10:45:33Z</updated>
		
		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;← Older revision&lt;/td&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 10:45, 20 June 2023&lt;/td&gt;
			&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 22:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 22:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;The traditional extension for text files is &amp;lt;code&amp;gt;.txt&amp;lt;/code&amp;gt;, but lots of other extensions have been used. Occasionally on systems permitting extensions longer than three letters, &amp;lt;code&amp;gt;.text&amp;lt;/code&amp;gt; has been used, and &amp;lt;code&amp;gt;.asc&amp;lt;/code&amp;gt; for ASCII has also had some use; &amp;lt;code&amp;gt;.doc&amp;lt;/code&amp;gt; has also sometimes been used for files &amp;quot;documenting&amp;quot; something (like the manual accompanying a piece of downloaded software), but that went out of common use once that extension became associated with Microsoft Word's [[DOC]] format.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;The traditional extension for text files is &amp;lt;code&amp;gt;.txt&amp;lt;/code&amp;gt;, but lots of other extensions have been used. Occasionally on systems permitting extensions longer than three letters, &amp;lt;code&amp;gt;.text&amp;lt;/code&amp;gt; has been used, and &amp;lt;code&amp;gt;.asc&amp;lt;/code&amp;gt; for ASCII has also had some use; &amp;lt;code&amp;gt;.doc&amp;lt;/code&amp;gt; has also sometimes been used for files &amp;quot;documenting&amp;quot; something (like the manual accompanying a piece of downloaded software), but that went out of common use once that extension became associated with Microsoft Word's [[DOC]] format.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;== Identification ==&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;UTF-32 text files are usually detected by starting with the byte order mark (BOM) consisting of the bytes FF FE 00 00 (for little endian 0x0000FEFF) or 00 00 FE FF (for big endian 0x0000FFFE). In some cases UTF-32 files may occur without the BOM, however, only 0x00000000—0x0000D7FF and 0x0000E000—0x0010FFFF are valid ranges for dwords; 0x0000D800—0x0000DFFF and 0x00110000—0xFFFFFFFF are invalid.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;UTF-16 text files are usually detected by starting with the byte order mark (BOM) consisting of the bytes FF FE (for little endian 0xFEFF) or FE FF (for big endian 0xFFFE). However, in some cases UTF-16 files may occur without the BOM, in which case, detection is not guaranteed to be reliable, but the line feed (0x000A) in its byte reversal (0x0A00) is not in Unicode 15.0, and null bytes are unlikely to occur in other text encodings, so the presence of word-aligned 00 0A or 0A 00 can rule out 8-bit encodings and one of the endianness and therefore may be used for UTF-16 detection. On the other hand, the bytes 0D 0A in little endian form U+0A0D which is not in Unicode 15.0 either but it is a common newline in 8-bit encodings. The detection of UCS-2 text works similarly, since UCS-2 is the precursor of UTF-16, as UTF-16 introduced surrogate pairs formed by 0xD800—0xDBFF followed by 0xDC00—0xDFFF, with other combinations of 0xD800—0xDFFF being invalid.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;ASCII only text files may be detected by verifying that the file has all 0x01—0x7F bytes.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;UTF-8 text files may be detected by presence of any bytes from 0x80—0xFF, absence of null bytes (if UTF-16 hasn't been ruled out yet), or verifying that the file is valid UTF-8. UTF-8 has many error cases; the only valid bit patterns are 0xxxxxxx (where x forms 0x00—0x7F), 110xxxxx 10xxxxxx (where x forms 0x0080—0x07FF, but not 0x00—0x7F), 1110xxxx 10xxxxxx 10xxxxxx (where x forms 0x0800—0xD7FF 0xE000—0xFFFF, but not 0x0000—0x07FF or 0xD800—0xDFFF), and 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx (where x forms 0x10000—0x10FFFF, but not 0x0000—0xFFFF or 0x110000—0x1FFFFF). UTF-8 text files may also start with the UTF-8 byte order mark (EF BB BF).&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;When a file is known to be a plain text file but UTF-32, UTF-16, ASCII, and UTF-8 were already ruled out, only 8-bit encodings or mixed single byte/double byte encodings (such as Shift JIS) remain. In this case, the only thing left (other than applying complex heuristics) is to use the regional or system text encoding, such as CP1252, CP1250, CP437, CP852, etc..&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;== See also ==&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;== See also ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>265 993 303</name></author>	</entry>

	<entry>
		<id>http://fileformats.archiveteam.org/index.php?title=Plain_text&amp;diff=32052&amp;oldid=prev</id>
		<title>Dan Tobias at 00:24, 2 June 2019</title>
		<link rel="alternate" type="text/html" href="http://fileformats.archiveteam.org/index.php?title=Plain_text&amp;diff=32052&amp;oldid=prev"/>
				<updated>2019-06-02T00:24:12Z</updated>
		
		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;← Older revision&lt;/td&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 00:24, 2 June 2019&lt;/td&gt;
			&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 5:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 5:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|mimetypes={{mimetype|text/plain}}&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|mimetypes={{mimetype|text/plain}}&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|pronom={{PRONOM|x-fmt/111}}&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|pronom={{PRONOM|x-fmt/111}}&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;|wikidata={{wikidata|Q1145976}}&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;}}&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;}}&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;'''Plain text''' files (also known by the extension TXT) consist of characters encoded sequentially in some particular [[character encoding]]. Plain text files contain no formatting information other than white space characters. Some data formats (usually those intended to be human-readable) are based on plain text; see [[Text-based data]] for some structured formats that are stored in plain text (and hence can be opened in a plain text editor if no more specific program is available).&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;'''Plain text''' files (also known by the extension TXT) consist of characters encoded sequentially in some particular [[character encoding]]. Plain text files contain no formatting information other than white space characters. Some data formats (usually those intended to be human-readable) are based on plain text; see [[Text-based data]] for some structured formats that are stored in plain text (and hence can be opened in a plain text editor if no more specific program is available).&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Dan Tobias</name></author>	</entry>

	<entry>
		<id>http://fileformats.archiveteam.org/index.php?title=Plain_text&amp;diff=29680&amp;oldid=prev</id>
		<title>Jsummers at 15:41, 6 April 2018</title>
		<link rel="alternate" type="text/html" href="http://fileformats.archiveteam.org/index.php?title=Plain_text&amp;diff=29680&amp;oldid=prev"/>
				<updated>2018-04-06T15:41:24Z</updated>
		
		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;← Older revision&lt;/td&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 15:41, 6 April 2018&lt;/td&gt;
			&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 2:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 2:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|formattype=electronic&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|formattype=electronic&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|subcat=Document&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|subcat=Document&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|extensions={{ext|txt}}, {{ext|text}}, {{ext|doc}}, {{ext|asc}} {{noext}}&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|extensions={{ext|txt}}, {{ext|text}}, {{ext|doc}}, {{ext|asc}}&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;, &lt;/ins&gt;{{noext}}&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;, many others&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|mimetypes={{mimetype|text/plain}}&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|mimetypes={{mimetype|text/plain}}&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|pronom={{PRONOM|x-fmt/111}}&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;|pronom={{PRONOM|x-fmt/111}}&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Jsummers</name></author>	</entry>

	<entry>
		<id>http://fileformats.archiveteam.org/index.php?title=Plain_text&amp;diff=27409&amp;oldid=prev</id>
		<title>Dan Tobias: /* Extension */</title>
		<link rel="alternate" type="text/html" href="http://fileformats.archiveteam.org/index.php?title=Plain_text&amp;diff=27409&amp;oldid=prev"/>
				<updated>2017-03-17T23:58:44Z</updated>
		
		<summary type="html">&lt;p&gt;‎&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Extension&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;← Older revision&lt;/td&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;Revision as of 23:58, 17 March 2017&lt;/td&gt;
			&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 20:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 20:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;== Extension ==&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;== Extension ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;background: #ffa; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;The traditional extension for text files is &amp;lt;code&amp;gt;.txt&amp;lt;/code&amp;gt;, but lots of other extensions have been used. Occasionally on systems permitting extensions longer than three letters, &amp;lt;code&amp;gt;.text&amp;lt;/code&amp;gt; has been used, and &amp;lt;code&amp;gt;.asc&amp;lt;/code&amp;gt; for ASCII has also had some use&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;. &lt;/del&gt;&amp;lt;code&amp;gt;.doc&amp;lt;/code&amp;gt; has also sometimes been used for files &amp;quot;documenting&amp;quot; something (like the manual accompanying a piece of downloaded software), but that went out of common use once that extension became associated with Microsoft Word's [[DOC]] format.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;The traditional extension for text files is &amp;lt;code&amp;gt;.txt&amp;lt;/code&amp;gt;, but lots of other extensions have been used. Occasionally on systems permitting extensions longer than three letters, &amp;lt;code&amp;gt;.text&amp;lt;/code&amp;gt; has been used, and &amp;lt;code&amp;gt;.asc&amp;lt;/code&amp;gt; for ASCII has also had some use&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;; &lt;/ins&gt;&amp;lt;code&amp;gt;.doc&amp;lt;/code&amp;gt; has also sometimes been used for files &amp;quot;documenting&amp;quot; something (like the manual accompanying a piece of downloaded software), but that went out of common use once that extension became associated with Microsoft Word's [[DOC]] format.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;== See also ==&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;== See also ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Dan Tobias</name></author>	</entry>

	</feed>