Looking For Musicologists on Twitter

For the most part, Twitter is full of garbage. But I’m an optimist and a firm believer in Sturgeon’s Law so by that logic there must be some good on it. That good is academic twitter.

While this isn’t a post advocating for academic Twitter, I did want to
1. see if I could figure out how to write a post with some R code in it and 2. share how I scraped Twitter to find active users in the Musicology and Music Theory community

So here it goes…

The first thing that you have to do is get some tweets. Luckily some packages exist in the #rstats world that can help with this. For this project I used the twitteR package which lets you log into Twitter’s API via R and and search it. There are already some instructions on how to get started with it that you can find here, so I won’t go into tons of detail about setting it up. (Also please note you can’t just copy and paste my code verbatim since it requires credentials from your own Twitter account)

Let’s first load the two packages we’ll need.

library(data.table)
library(twitteR)

Next up, we need to access Twitter’s API by entering in the details from the link above. I find it’s easiest to copy and paste each of my keys and tokens into a nice little character string, assign those to an object, then call those objects in the last command in this block.

consumer_key <- 'YOUR CONSUMER KEY HERE'
consumer_secret <- 'COPY AND PASTE YOUR CONSUMER SECRET HERE'
access_token <- 'THEN PUT YOUR ACCESS TOKEN HERE'
access_secret <- 'AND YOUR ACCESS SECRET HERE'
setup_twitter_oauth(consumer_key, consumer_secret, access_token=NULL, access_secret=NULL)

Running that last line in the chunk should then direct you to your default browser. This will log you into your Twitter account and R will ask for Twitter’s permission to enter through the back door.

The next bit of code won’t run the way I have it set up because Twitter doesn’t let you download tweets older than a week old. So if you want to play with tweets from a conference’s hashtag or some event, make sure to think ahead to download them!!

amsTwitter <- searchTwitter("#smt2017", n = 700)
amsTwitter <- searchTwitter("#amsroc17", n = 1600)

This line above searches Twitter for anything matching the conference hashtags and saves the output of it in a list. You can also include an argument asking for a certain number of tweets, which I’ve also done. Luckily the twitteR package has a function that will take this list and convert it to a data frame.

amsTwitter.df <- twListToDF(amsTwitter)
smtTwitter.df <- twListToDF(smtTwitter)

With these nice data frames, we’ll soon be able to join them together and count up some tweets! In order to do this we can take advantage of the data.table package to join our two tables together. Of course there are other ways, but Ben over at Gorm Analytics sold me on data.table this past summer and since then I have really been loving its easy syntax.

amsTwitter.dt <- data.table(amsTwitter.df)
smtTwitter.dt <- data.table(smtTwitter.df)
amstweets <- amsTwitter.dt[, .(amsTweets = .N), by=screenName][order(-amsTweets)]
smttweets <- smtTwitter.dt[, .(smtTweets = .N), by=screenName][order(-smtTweets)]
totalTweets <- merge(smttweets,amstweets, on ="screenName", all = TRUE)

The first thing the above code does is swap our data frames over to data.tables. Once they are in the data.table format, we can count up the tweets by screen name, then list them from biggest to smallest all in the same line. From there we merge the two together via the shared column, making sure to grab every instance in each table since not every Tweeter tweeted with both hashtags.

We then need to clean up some of the NAs (which as a data.table are characters!) in our bigger dataset with R’s ifelse() function that basically works exactly like an ifelse statement would in Microsoft Excel. It looks over a column in your dataset, checks if a value is an NA, if it is then it gives it a 0, if not, it puts in the value that was there in the first place. After replacing NAs, I then make a new variable that adds together both columns then run our final line that prints out our final dataset from top to bottom.

totalTweets$smtTweets <- ifelse(test = is.na(totalTweets$smtTweets),
                                yes = 0,
                                no = totalTweets$smtTweets) 
totalTweets$amsTweets <- ifelse(test = is.na(totalTweets$amsTweets),
                                yes = 0,
                                no = totalTweets$amsTweets) 

totalTweets[, TotalTweets := smtTweets + amsTweets]
totalTweets[order(-TotalTweets)]

From here it was simply a matter of using an online converter to turn it our final table an html file and then ssh it up to our Music Cognition at LSU server! Since then I’ve also added both the 2017 AMS and SMT datasets that I used to generate the counts in case you want to try this for yourself.

If anyone has any questions on this, please tweet me!

David John Baker
David John Baker
Senior Research Associate

Music, Theory, Sciences.

Related