Moral Outrage
Whew! God help us!

Your Tweets and Emails in the Library of Congress

Tweets, emails and other electronic communications can be considered “government documents” and must be preserved. The National Archives handles official government materials, while the Library of Congress’ mandate is to deal with anything that may have long-term historical interest.

But how much digital information are we talking about? How about all of the tweets from Twitter’s archives?

“We have an agreement with Twitter where they have a bunch of servers with their historic archive of tweets, everything that was sent out and declared to be public,” Bill Lefurgy, digital initiatives program manager at the Library of Congress national digital information infrastructure and preservation program said. The archives don’t contain tweets that users have protected, but everything else — billions and billions of tweets — are there.”

Using new technical processes it has developed, Twitter is moving a large quantity of electronic data from one electronic source to another. “They’ve had to do some pretty nifty experimentation and invention to develop the tools and a process to be able to move all of that data over to us,” Lefurgy said.

Researchers would be able to look at the Twitter archive as a complete set of data, which they could then data-mine for interesting information.

Lefurgy said, “We firmly … anticipate that we’ll be bringing in large data sets again into the future. We don’t know specifically what, but certainly there’s no sign of data getting smaller or less complicated or less interesting.”

Left: The Library of Congress



2 Responses to “Your Tweets and Emails in the Library of Congress”

  1. On the subject, Johannes Scholtes at ZyLab writes:

    I was told once that the U.S. National Archives and Records Administration (NARA) only archives and classifies a small percentage (less than 5% I believe) of what they are offered. The rest is destroyed, and not without a reason.

    In a library one expects to find knowledge and not raw unfiltered data like Tweets. As far as I can tell, 99.9999999% or more of all Tweets have no historical relevance and lack substance, let alone knowledge.

    Archiving Twitter communication as social phenomena is something I can understand, but you do not need a prestigious library to spend its budget, energy and time to do that. A separate web-archive managed by investigators of social behavior should be able to do the job.

  2. Kind of a no-brainer, but anyone want to share any thoughts on why this massive data storage is indeed taking place?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: