US Library of Congress Archiving All Public Tweets – Even Yours

April 15, 2010

Due to an issue with the their official blog, the US Library of Congress posted the following note in Facebook earlier this morning, and gave followers of @librarycongress a heads up tweet. I’ve posted the note here, to prevent you from having to login to Facebook if you don’t want to/work does not allow. Speculative thoughts at the bottom.

Have you ever sent out a “tweet” on the popular Twitter social media service? Congratulations: Your 140 characters or less will now be housed in the Library of Congress.

That’s right. Every public tweet, ever, since Twitter’s inception in March 2006, will be archived digitally at the Library of Congress. That’s a LOT of tweets, by the way: Twitter processes more than 50 million tweets every day, with the total numbering in the billions.

We thought it fitting to give the initial heads-up to the Twitter community itself via our own feed @librarycongress. (By the way, out of sheer coincidence, the announcement comes on the same day our own number of feed-followers has surpassed 50,000. I love serendipity!)

We will also be putting out a press release later with even more details and quotes. Expect to see an emphasis on the scholarly and research implications of the acquisition. I’m no Ph.D., but it boggles my mind to think what we might be able to learn about ourselves and the world around us from this wealth of data. And I’m certain we’ll learn things that none of us now can possibly conceive.

Just a few examples of important tweets in the past few years include the first-ever tweet from Twitter co-founder Jack Dorsey (, President Obama’s tweet about winning the 2008 election (, and a set of two tweets from a photojournalist who was arrested in Egypt and then freed because of a series of events set into motion by his use of Twitter ( and (

Twitter plans to make its own announcement today on its blog from “Chirp,” the Official Twitter Developer Conference, in San Francisco.

So if you think the Library of Congress is “just books,” think of this: The Library has been collecting materials from the web since it began harvesting congressional and presidential campaign websites in 2000. Today we hold more than 167 terabytes of web-based information, including legal blogs, websites of candidates for national office, and websites of Members of Congress.

We also operate the National Digital Information Infrastructure and Preservation Program, which is pursuing a national strategy to collect, preserve and make available significant digital content, especially information that is created in digital form only, for current and future generations.

In other words, if you want a place where important historical information in digital form should be preserved for the long haul, we’re it!

This raises a few questions, like… who has access to the data? Who will be using it, and to what end? Does it include all the location data? It’s the US Library of Congress, so is the data only available to US citizens? If so, then what about my tweets, can I at least see those from here in Australia? I can imagine market research organisations clawing their eyes out and selling whatever souls they have left in storage to gain access to this wealth of raw opinion and conversation.

So, who has been archiving all the data from 2006? I always assumed Twitter would be keeping it somewhere, but the fact that they only give us access to 3200 of our tweets at a time made it seem less likely. Does that mean the Library of Congress have been keeping track of them all this time? If so, then why just announce it now?

More importantly, as someone who prefers to keep a back-up of my own personal Twitter stream – so that I might look back on it in later years with fondness – will I, the individual have access to this? Do I even need to bother to keep an archive of my tweets any more? Give us your thoughts, people. Is this benevolent, or kind of scary? What’s the value of what’s essentially a snapshot of the thoughts, emotions, events and opinions of the last four years – as expressed by individuals?


[Source: Original Facebook Note HT @barrysaunders]


Monitor coverage and issues online with Perspctv

November 5, 2008

Perspctv is a web service that shows and compares online coverage of up to five issues across twitter, news and the blogosphere.

Currently the site is automatically tracking coverage of the US Election going on right now, but users can make their own dashboards by clicking on the link at the top of the page.


That’s the first one I made, comparing "android”, “blackberry” and “iphone”. Coincidentally, that’s the first one that TechCrunch tried.

Check it out at