Date and time formats

From Just Solve the File Format Problem
Revision as of 12:21, 12 April 2016 by Dan Tobias (Talk | contribs)

Jump to: navigation, search
File Format
Name Date and time formats
Ontology

Date and time formats go back to antiquity, as soon as people started referring to when events occurred or were planned to occur in a manner less vague than "many moons ago". Such references could be very culturally-specific, like "in the year of the bountiful harvest" or "the third year of the reign of King Egbert". Cultural specificity persists to this day in date formats; one of the many ways Americans and Brits are "divided by a common language" is the fact that they are inclined to write months and days in opposite orders from each other. However, the information age has demanded more standardization in storing and transmitting dates and times (which are a frequently-encountered element in many of the file formats documented on this site), so there are now some well-documented ways of expressing this information. However, as some wag said, "The great thing about standards is that there are so many to choose from!"; not all file formats use the same manner of expressing points in time.

Contents

Calendar systems

The first thing that needs to be established is what calendar system the dates are in. Presently, the world has standardized on the Gregorian calendar for this purpose, but this was not always the case. In ancient and medieval times, a wide variety of calendars specific to different nations and religions were used; some persist for ceremonial purposes (such as the Hebrew calendar used to set the dates of Jewish holidays), but eventually most of the civilized world settled on the Julian calendar established by Julius Caesar in 46 BC (reforming earlier irregular Roman calendars). This calendar was further reformed by Pope Gregory in 1582 to correct a discrepancy that caused the Julian calendar to drift with respect to the seasons; the new calendar changed the leap year rules so that instead of every year that was a multiple of 4 being a leap year, years that were multiples of 100 were not leap years unless they were also multiples of 400. Also, a one-time correction was done which removed 10 days from October 1582. In the 20th and 21st centuries the divergence between the Julian and Gregorian calendars is 13 days due to the accumulated difference in leap days. Confusingly, a number of countries continued to use the Julian calendar after other countries had switched to the Gregorian calendar, so that as late as the 1920s the current date differed depending on what country you were in.

Years are numbered from the purported date of Jesus' birth, though this is currently believed to be in error and the actual birth date something like 5 BC (and not on Christmas day either). 1 AD directly followed 1 BC; there was no "year zero". (AD and BC years are sometimes referred to as CE and BCE for Common Era and Before Common Era by people not wishing to refer to a specific religion in their date reckoning.)

Time of day

Calendars count days, traditionally the length of time it takes for the Earth to rotate and bring the sun back to the same position in the sky (though this gets a bit more complex when considering seasonal variations, and even more complex in modern times as the slight variations in the Earth's rotation speed are considered). To express points in time at higher resolution, a system for timekeeping within the day is needed. Sundial time was established to divide the daylight time into 12 hours, but this caused hours to vary in length by season and not be measured at all at night. Eventually clocks were devised to measure time uniformly, dividing the day into 24 hours, and (as they got more accurate) the hours into 60 minutes and the minutes into 60 seconds. If this process had continued, perhaps seconds would be divided into 60 "thirds", but instead, when sub-second precision became necessary in the scientific community, standard metric prefixes had been established based on powers of ten, so we got "milliseconds" and "microseconds".

Times were originally based on the local solar time where the person keeping track of it was located, so times differed in different places; the town down the road might be a few seconds or minutes removed from yours. This wasn't convenient for railroad schedules in the 1800s, so a standardized time-zone system was established, theoretically with 24 zones of equal size around the world, but once politics got involved actually a crazy-quilt of zig-zaggy zone boundaries that divert greatly from their "logical" positions (China stretched out a single time zone to cover their entire large country) and sometimes use times that are a half-hour removed from the other zones which are based on integer hours. Daylight saving time (summer time) further complicates things by shifting some of the time zones by one hour for part of the year.

To standardize global timekeeping, Greenwich Mean Time (the mean solar time at the Greenwich observatory in England) was established as the official base time from which all the other time zones were calculated, and which could be used as a uniform timestamp for global commerce in preference to location-specific times. The Greenwich observatory is no longer operating now (it's a museum), but the time standard based on its location has evolved into the modern UTC time.

Eventually, with the creation of atomic clocks, it was seen that the Earth's rotation is not in fact a completely uniform basis for a time standard; the atomic clocks run at a precisely uniform rate ticking off seconds which are now scientifically defined in terms of atomic frequencies, and the Earth's days gradually get out of step with them. To reconcile this, leap seconds are occasionally added to the UTC standard time, meaning that it is not a fully uniform time scale. Other time scales such as TAI and GPS time are strictly atomic and hence have an offset from UTC that changes with time (in addition to fixed offsets between different atomic-based time scales).

Date and time formats

So with those preliminaries out of the way, one now gets to how to express a date and time that is in the Gregorian calendar and either UTC or a local time zone.

It can be spelled out, like September 11th, 2001, at 8:46 AM Eastern Daylight Time, but this is language-specific.

All-numeric formats have cultural differences; 9/11/2001 could mean September 11 (in American M/D/Y order) or November 9 (in British D/M/Y order). Other orders such as Y/M/D are also found. Delimiters can be slashes, dashes, or dots. Leading zeroes might be present or absent (09/11 versus 9/11). Years might be given in full or only by the last two digits (9/11/01; this practice led to the infamous Y2K problem where two-digit years didn't work well in electronic systems at the turn of the century).

Times of day can be given as AM/PM or 24-hour time, and the AM/PM system is ambiguous as to whether 12 AM is midnight or noon. The times might also be UTC or a local time zone, standard or daylight/summer.

To standardize time formats, the ISO 8601 standard was established, giving a basic format of YYYY-MM-DDTHH:MM:SSZ for UTC times (e.g., 2001-09-11T12:46:00Z), Z may be replaced with a local time zone offset such as "-04" for a 4-hour negative offset (U.S. Eastern Daylight Time). A number of variants are also defined in the standard. This notation has the advantage of sorting correctly in a computer system by simple ASCII string sorting (at least when all times are in the same zone).

Computer systems have a number of other ways of representing timestamps, however. A popular method is to use a numeric quantity counting the number of some time unit since some "epoch" time. This practice predates computers; the Julian Day system numbers days from an epoch way back in the BC era. Computers have a few commonly-used systems of this sort, notably the Unix timestamp which counts the number of seconds since the start of January 1, 1970 in UTC time. Pinning down this definition gets a bit tricky once you realize that the UTC standard had variable-length seconds in 1970 but was changed to add leap seconds starting in 1972, but the Unix epoch timestamps still assume 86400 seconds per day, thus omitting leap seconds (which have no consistent Unix timestamp representation). There are proposals to eliminate leap seconds altogether, which would make computer timestamps easier to deal with at the expense of having the time of day gradually drift away from the real world's day-and-night cycle over the centuries.

Besides Unix timestamps, there are other epoch-based time stamping systems in use. Some spreadsheet programs (including MS Excel) use a system that supposedly counts the number of days since January 1, 1900, but since it actually got the leap-year rule wrong for 1900 (which was not a leap year), it really counts from December 31, 1899. Old versions of MacOS had a time stamp system that counted from 1904 (sometimes causing a 4-year date discrepancy when documents are improperly converted between the systems), and MS-DOS directory timestamps counted years as an offset from 1980. Currently the Mac OS X Cocoa library uses a reference date of January 1 2001, 00:00 UTC.

Representation of a month within a date

Roman numerals

In some countries (CIS members being the major example) roman numerals are used to represent a month. This makes a date less ambiguous, since it is then only an issue of differentiating between a year and a day, and that can easily be done by writing a year using 4 digits. When doing that the separating mark is usually just a space. For example 21st of March 2015 will be written as 21 Ⅲ 2015.

Telegraph symbols

Month, day, hour can be written as a combination of the numerical value and a special Han symbol. Unicode has dedicated a character for each of these combinations:

Months
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Days
Hours

See also

References and utilities

to Dates Before 1972]

Humor

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox