Twitter

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
Line 5: Line 5:
 
'''Twitter''' is a popular social-networking and messaging service, accessible through the web and mobile device apps, allowing users to write 140-character messages publicly or privately. Often the messages include hyperlinks, which get sent through [[URL shorteners]] (so they might suffer linkrot if the shortening services go away). Some of the conventions of the service are discussed in the article on [[Hashtags, at-signs, retweets, etc.]]
 
'''Twitter''' is a popular social-networking and messaging service, accessible through the web and mobile device apps, allowing users to write 140-character messages publicly or privately. Often the messages include hyperlinks, which get sent through [[URL shorteners]] (so they might suffer linkrot if the shortening services go away). Some of the conventions of the service are discussed in the article on [[Hashtags, at-signs, retweets, etc.]]
  
Of interest to archivers is the fact that, as of late 2012, Twitter has [http://thenextweb.com/twitter/2012/12/16/twitter-has-started-rolling-out-the-option-to-download-all-your-tweets/ apparently started rolling out] a feature to permit users to save their entire tweet history as an archive file. This is, as of now, not yet available to all users, but it supposedly turns up at the bottom of the "settings" screen in accounts in which the feature has been enabled.
+
Of interest to archivers is the fact that, as of late 2012, Twitter has [http://thenextweb.com/twitter/2012/12/16/twitter-has-started-rolling-out-the-option-to-download-all-your-tweets/ started rolling out] a feature to permit users to save their entire tweet history as an archive file.
  
More documentation on how to extract these archives and what format the files take will be supplied here once that information is available.
+
== Downloaded Twitter archive ==
 +
 
 +
If you have been given the option to download your Twitter history (it has been given gradually to users, so you may or may not have this option now yourself, but probably will in the future if you don't now), it appears as a button at the bottom of the "Settings" page in your account. Pressing it causes the generation of an archive of your tweets to be queued, and when it is finished (minutes? hours? whenever?) you get e-mailed at the registered address associated with the account with a link to retrieve your archive. There, you can download it as a [[ZIP]] archive ('''tweets.zip''') containing this file and directory structure:
 +
 
 +
* '''README.txt''': an [[ASCII]] text file (with long lines that scroll way off to the right if your text viewer doesn't wrap long lines) giving some information about the format
 +
* '''index.html''': [[HTML]] file which, when loaded in a browser, lets you view your tweets. The tweets themselves aren't actually in this file, but it pulls in a bunch of [[JavaScript]] from the subdirectories, which in turn load the tweets from data files.
 +
* '''css''': Subdirectory with [[Cascading Style Sheets]].
 +
** '''application.min.css''' Stylesheet (formatted in hard-to-read manner with no line breaks)
 +
* '''data''': Subdirectory with data files.
 +
** '''csv''': Subdirectory with [[CSV]] files.
 +
*** '''''YYYY''_''MM''.csv''': A series of files named by year and month with the tweets in the form of comma-separated values ([[CSV]]). The columns are: "tweet_id", "in_reply_to_status_id", "in_reply_to_user_id", "retweeted_status_id", "retweeted_status_user_id", "timestamp", "source", "text", "expanded_urls". The timestamp is in UTC time, in the format ''YYY-MM-DD HH:MM:SS +0000''.
 +
** '''js''': Subdirectory with [[JavaScript]] (user-specific, encoding details about the tweets).
 +
*** '''payload_details.js'''
 +
*** '''tweet_index.js'''
 +
*** '''user_details.js'''
 +
*** '''tweets'''
 +
**** '''''YYYY''_''MM''.js''': A series of files named by year and month with the tweets in [[JSON]] form, with a one-line header turning each file into a JavaScript variable assignment. (Strip that line if using the JSON data elsewhere.)
 +
* '''img''': Subdirectory with [[graphics]].
 +
** '''bg.png''': A [[PNG]] graphic used as a background.
 +
** '''sprite.png''': A [[PNG]] graphic with sprites used by the scripts.
 +
* '''js''': Subdirectory with [[JavaScript]].
 +
** '''application.min.js''': Script used in displaying tweets (formatted in a hard-to-read manner with no line breaks).
 +
* '''lib''': Subdirectory with various 'library' files used by the scripts.
 +
** '''bootstrap''': various JavaScript, CSS, and graphics.
 +
** '''hogan''': Contains another JavaScript file.
 +
** '''jquery''': Contains another JavaScript file.
 +
** '''twt''': Contains some more JavaScript, CSS, and graphics.
 +
** '''underscore''': Contains another JavaScript file.
  
 
== Links and references ==
 
== Links and references ==

Revision as of 04:30, 15 February 2013

File Format
Name Twitter
Ontology

Twitter is a popular social-networking and messaging service, accessible through the web and mobile device apps, allowing users to write 140-character messages publicly or privately. Often the messages include hyperlinks, which get sent through URL shorteners (so they might suffer linkrot if the shortening services go away). Some of the conventions of the service are discussed in the article on Hashtags, at-signs, retweets, etc.

Of interest to archivers is the fact that, as of late 2012, Twitter has started rolling out a feature to permit users to save their entire tweet history as an archive file.

Downloaded Twitter archive

If you have been given the option to download your Twitter history (it has been given gradually to users, so you may or may not have this option now yourself, but probably will in the future if you don't now), it appears as a button at the bottom of the "Settings" page in your account. Pressing it causes the generation of an archive of your tweets to be queued, and when it is finished (minutes? hours? whenever?) you get e-mailed at the registered address associated with the account with a link to retrieve your archive. There, you can download it as a ZIP archive (tweets.zip) containing this file and directory structure:

  • README.txt: an ASCII text file (with long lines that scroll way off to the right if your text viewer doesn't wrap long lines) giving some information about the format
  • index.html: HTML file which, when loaded in a browser, lets you view your tweets. The tweets themselves aren't actually in this file, but it pulls in a bunch of JavaScript from the subdirectories, which in turn load the tweets from data files.
  • css: Subdirectory with Cascading Style Sheets.
    • application.min.css Stylesheet (formatted in hard-to-read manner with no line breaks)
  • data: Subdirectory with data files.
    • csv: Subdirectory with CSV files.
      • YYYY_MM.csv: A series of files named by year and month with the tweets in the form of comma-separated values (CSV). The columns are: "tweet_id", "in_reply_to_status_id", "in_reply_to_user_id", "retweeted_status_id", "retweeted_status_user_id", "timestamp", "source", "text", "expanded_urls". The timestamp is in UTC time, in the format YYY-MM-DD HH:MM:SS +0000.
    • js: Subdirectory with JavaScript (user-specific, encoding details about the tweets).
      • payload_details.js
      • tweet_index.js
      • user_details.js
      • tweets
        • YYYY_MM.js: A series of files named by year and month with the tweets in JSON form, with a one-line header turning each file into a JavaScript variable assignment. (Strip that line if using the JSON data elsewhere.)
  • img: Subdirectory with graphics.
    • bg.png: A PNG graphic used as a background.
    • sprite.png: A PNG graphic with sprites used by the scripts.
  • js: Subdirectory with JavaScript.
    • application.min.js: Script used in displaying tweets (formatted in a hard-to-read manner with no line breaks).
  • lib: Subdirectory with various 'library' files used by the scripts.
    • bootstrap: various JavaScript, CSS, and graphics.
    • hogan: Contains another JavaScript file.
    • jquery: Contains another JavaScript file.
    • twt: Contains some more JavaScript, CSS, and graphics.
    • underscore: Contains another JavaScript file.

Links and references

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox