Christopher Sardegna's Blog

Thoughts on technology, design, data analysis, and data visualization.


Using Your Twitter Archive to Visualize Your Tweets

Using Your Twitter Archive to Visualize Your Tweets

In December 2012, Twitter unveiled its “Archive” functionality, allowing users to download backups of all of their tweets. The downloaded files included a html interface for searching and reading tweets, which becomes very tedious and cumbersome with large amounts of data. The download also includes a .csv file that includes the raw data. We will be using this data for the following project. My data only goes through March 2015 because I used Shame Eraser to delete them all from my account.

The .csv should first be reformatted to an Excel Workbook to speed up the processing of data. The data table contains the following columns:

tweet_id
in_reply_to_status_id
in_reply_to_user_id
timestamp
source
text
retweeted_status_id
retweeted_status_user_id
retweeted_status_timestamp
expanded_urls

We will primarily be using the following, as well as 2 new ones we will be adding later:

in_reply_to_user_id
timestamp
source
text
retweeted_status_id

All of the charts below are made with PivotTables. If you do not have access to Excel 2015 or later for OS X you cannot accomplish the following.

Our first chart will track our tweet density over time.

rxcs Tweet Density.png

Our first pivot will contain 2 items, timestamp and text. Place timestamp in the column section and group how you like. I selected Months and Years. text should go under values and set to summarize as “count.” From here, you can make the chart. It should appear in the PivotTable Builder as follows:

Screen Shot 2015-08-15 at 2.59.28 AM.png

Our next chart is slightly less complicated. It will display the number of normal tweets, retweets, and replies or account has made.

rxcs Tweet Type.png

This pivot will contain 3 items, in_reply_to_user_id, retweeted_status_id, and text, all set to summarize as count. The PivotTable Builder should display the following:

Screen Shot 2015-08-15 at 2.30.18 AM.png

Our third chart is slightly more complex than the first. It will show the distribution of the length of tweets.

rxcs Tweet Length.png

To accomplish this, we must add a new column to our data file. Add a new column next to the text column and make the title something like Count chars. The cells should read =LEN(G2), where G2 is the reference to the cell directly to the left in the text column.

Make a new pivot from this data. Make the rows Count chars and the values Count chars set to summarize as count. It should appear as follows in the PivotTable Builder:

Screen Shot 2015-08-15 at 2.39.21 AM.png

Finally, we will make a chart that shows the number of tweets per source, i.e. the number of tweets that are sent to Twitter using a certain application.

rxcs Tweets Per Source.png Note: This chart excludes apps with a count of < 10 due to the volume of 1-time tweets. It also excludes Tweetbot for iOS, which, for me, summed to 68,175.

We must first sanitize the source column to make it easily readable, since it contains the HTML code for the App URL in addition to the name of the app. To accomplish this, we will add a new column directly next to the source column. I called mine source-clean. The following formula should strip the name out of the source column:

=IF(MID(LEFT(E3,FIND("",E3)-1),FIND(">",E3)+1,LEN(E3))="Tweetbot for iΟS","Tweetbot for iOS",MID(LEFT(E3,FIND("",E3)-1),FIND(">",E3)+1,LEN(E3)))

Technically you only need =MID(LEFT(E3,FIND("</",E3)-1),FIND(">",E3)+1,LEN(E3)), but my archive seemed to have some malformed unicode, displaying “Tweetbot for iOS’ as "Tweetbot for iΟS,” necessitating the =IF. In either instance, E3 represents the source cell directly to the left of the source-clean cell. This will fill down to the bottom and correctly strip the names out of the source cells.

From here, we can make our pivot. Use source-clean as the row and text as the value, summarized a count. The PivotTable Builder should display the following:

Screen Shot 2015-08-15 at 2.56.13 AM.png

And there you have it: 4 very interesting and telling graphs about your history on Twitter, all directly compiled from your Twitter Archive.