Using Your Twitter Archive to Visualize Your Tweets
In December 2012, Twitter unveiled its “Archive” functionality, allowing users to download backups of all of their tweets. The downloaded files included a html interface for searching and reading tweets, which becomes very tedious and cumbersome with large amounts of data. The download also includes a .csv file that includes the raw data. We will be using this data for the following project. My data only goes through March 2015 because I used Shame Eraser to delete them all from my account.
The .csv should first be reformatted to an Excel Workbook to speed up the processing of data. The data table contains the following columns:
tweet_id in_reply_to_status_id in_reply_to_user_id timestamp source text retweeted_status_id retweeted_status_user_id retweeted_status_timestamp expanded_urls
We will primarily be using the following, as well as 2 new ones we will be adding later:
in_reply_to_user_id timestamp source text retweeted_status_id
All of the charts below are made with PivotTables. If you do not have access to Excel 2015 or later for OS X you cannot accomplish the following.
Our first chart will track our tweet density over time.
Our first pivot will contain 2 items,
timestamp in the column section and group how you like. I selected Months and Years.
text should go under values and set to summarize as “count.” From here, you can make the chart. It should appear in the PivotTable Builder as follows:
Our next chart is slightly less complicated. It will display the number of normal tweets, retweets, and replies or account has made.
This pivot will contain 3 items,
text, all set to summarize as count. The PivotTable Builder should display the following:
Our third chart is slightly more complex than the first. It will show the distribution of the length of tweets.
To accomplish this, we must add a new column to our data file. Add a new column next to the text column and make the title something like
Count chars. The cells should read
=LEN(G2), where G2 is the reference to the cell directly to the left in the
Make a new pivot from this data. Make the rows
Count chars and the values
Count chars set to summarize as count. It should appear as follows in the PivotTable Builder:
Finally, we will make a chart that shows the number of tweets per source, i.e. the number of tweets that are sent to Twitter using a certain application.
Note: This chart excludes apps with a count of < 10 due to the volume of 1-time tweets. It also excludes Tweetbot for iOS, which, for me, summed to 68,175.
We must first sanitize the
source column to make it easily readable, since it contains the HTML code for the App URL in addition to the name of the app. To accomplish this, we will add a new column directly next to the
source column. I called mine
source-clean. The following formula should strip the name out of the
=IF(MID(LEFT(E3,FIND("",E3)-1),FIND(">",E3)+1,LEN(E3))="Tweetbot for iÎŸS","Tweetbot for iOS",MID(LEFT(E3,FIND("",E3)-1),FIND(">",E3)+1,LEN(E3)))
Technically you only need
=MID(LEFT(E3,FIND("</",E3)-1),FIND(">",E3)+1,LEN(E3)), but my archive seemed to have some malformed unicode, displaying “Tweetbot for iOS’ as "Tweetbot for iÎŸS,” necessitating the
=IF. In either instance, E3 represents the
source cell directly to the left of the
source-clean cell. This will fill down to the bottom and correctly strip the names out of the
From here, we can make our pivot. Use
source-clean as the row and
text as the value, summarized a count. The PivotTable Builder should display the following:
And there you have it: 4 very interesting and telling graphs about your history on Twitter, all directly compiled from your Twitter Archive.