Identifying Files

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
(first draft)
 
(MIME Type)
 
(3 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Once you have retrieved a file from its storage media you'll need to identify what kind of information you now have access to. In some cases like a phonograph record that has lost its label you may have to find an expert on the media contained on the device to identify what song you now have. However when working with files generated by computer there are several clues you can use to begin the process.
+
Once you have retrieved a file from its storage media, you'll need to identify what kind of information you now have access to. In some cases, like a phonograph record that has lost its label, you may have to find an expert on the media contained on the device to identify what song you now have (see [[Identifying Physical Media]]). However, when working with files generated by computer, there are several clues you can use to begin the process.
  
== File Extension ==
+
== External signatures ==
 +
=== File Extension ===
 
One of the best places to start if working with a file that is still stored under its original file name is to look at the extension. For many operating systems this will be the characters at the end of the name, often separated by a period. OFFICE.DXF would be a file of with a .DXF extension which we could use to discover it is likely an AutoCad file most likely storing a drafting drawing.
 
One of the best places to start if working with a file that is still stored under its original file name is to look at the extension. For many operating systems this will be the characters at the end of the name, often separated by a period. OFFICE.DXF would be a file of with a .DXF extension which we could use to discover it is likely an AutoCad file most likely storing a drafting drawing.
  
== Creator and Type ==
+
=== Creator and Type ===
 
The Macintosh from Apple did not use file extensions, but instead used 4 character creator and type codes.
 
The Macintosh from Apple did not use file extensions, but instead used 4 character creator and type codes.
  
== Text Editor ==
+
=== MIME Type ===
Opening the file with a text editor may yield clues as to the content, even if the contents can not be interpreted by reading visually. An example would be opening .png files with a text editor; even without the file extension the second through fourth character will be "PNG".
+
Some systems, such as [[HTTP]] and [[MIME]], use [[MIME types]] to identify a file's data type. However, Mime-types are of marginal use for identifying rare file types to humans. They may have been guessed from the file's filename extension or magic signature, which provides no new information.
  
== Hex Editor ==
+
== Internal signatures ==
Some file types have a recognizable signature when viewed with a hexadecimal editor.
+
An internal signature is a distinctive pattern of bytes in the file's contents. Most often, it takes the form of a "magic signature" near the beginning of the file.
 +
 
 +
See [[File identification software]] for utilities that can help to identify files using such signatures.
 +
 
 +
Even without sophisticated software assistance, it may be possible to guess a file's format using a simple text editor or hex editor. For example, the second through fourth byte in every PNG file spells out "PNG" in ASCII.
 +
 
 +
[[Category:File Format Identification]]

Latest revision as of 00:55, 21 February 2017

Once you have retrieved a file from its storage media, you'll need to identify what kind of information you now have access to. In some cases, like a phonograph record that has lost its label, you may have to find an expert on the media contained on the device to identify what song you now have (see Identifying Physical Media). However, when working with files generated by computer, there are several clues you can use to begin the process.

Contents

[edit] External signatures

[edit] File Extension

One of the best places to start if working with a file that is still stored under its original file name is to look at the extension. For many operating systems this will be the characters at the end of the name, often separated by a period. OFFICE.DXF would be a file of with a .DXF extension which we could use to discover it is likely an AutoCad file most likely storing a drafting drawing.

[edit] Creator and Type

The Macintosh from Apple did not use file extensions, but instead used 4 character creator and type codes.

[edit] MIME Type

Some systems, such as HTTP and MIME, use MIME types to identify a file's data type. However, Mime-types are of marginal use for identifying rare file types to humans. They may have been guessed from the file's filename extension or magic signature, which provides no new information.

[edit] Internal signatures

An internal signature is a distinctive pattern of bytes in the file's contents. Most often, it takes the form of a "magic signature" near the beginning of the file.

See File identification software for utilities that can help to identify files using such signatures.

Even without sophisticated software assistance, it may be possible to guess a file's format using a simple text editor or hex editor. For example, the second through fourth byte in every PNG file spells out "PNG" in ASCII.

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox