XLS
XLS is a family of spreadsheet formats native to Microsoft Excel.
Contents |
Microsoft Office EXCEL 97-2007
The Excel Binary File Format (.xls) Structure is the binary file format used by Microsoft Excel 97, 2000, 2002, and Office Excel 2003. It is also supported by Microsoft Office Excel 2007.
Later Excel versions use XLSX as their native format, though they still support the older format as well.
Handling of date values
Excel stores date values as floating point numbers that represent the number of days since a given start date. According to this piece, the default start dates are different for Excel for Windows (which uses January 1, 1900 and) Excel for Mac (which uses January 1, 1904). On top of this, the 1900 date system also erroneously assumes that 1900 was a leap year. This assumption was introduced on purpose in order to ensure compatibility with a bug in Lotus 1-2-3. In practice this may lead to dates that are off by 4 years and 1 day, depending on the software that is used to read/process the files.
Software
- libxls - Library for reading XLS
- xlsLib - Library for writing XLS
- The xlrd Module, a Python module for extracting data from MS Excel spreadsheet files - contains detailed information about Excel's date handling
Sample files
- National Archives (UK) datasets (includes some XLS files)
- https://telparia.com/fileFormatSamples/document/xls/
References
- Binary file format specification *.xls (97-2007) format
- Why are the Microsoft Office file formats so complicated? (And some workarounds)
- Typo in Excel spreadsheet apparently led to erroneous result in economic paper that was influential on government policy
- Abandon all hope, ye who enter dates in Excel
- XL: The 1900 Date System vs. the 1904 Date System
- MS Office 97-2003 legacy/binary formats security - article with lots of resources on MS Office formats, including analysis techniques, tools and parsing libraries