What to do when you miss conferences because of visas? – Keep an eye on Twitter
Screw up visas. It arrived last night, just two weeks late to be able to attend the ESA meeting – Ecological Society of America who this year celebrate their centenary anniversary. I missed the opportunity to present my work and get feedback from the ecologist community; and more importantly, get to know what other ecologist are doing. It’s a pity. Earlier this summer I also missed the International Conference of Computational Social Sciences and another meeting in the US regarding the Arctic Resilience Assessment. Last year I also missed the European Conference of Complex Systems, my mum got sick and I had to travel home to take care of her. Luckily my mum is better now, and with so many exiting academic events that one cannot attend either due to visa restrictions, lack of funding, or unfortunate life events; one has to come with some alternative solution to get to know what is going on. Here is my solution: I mine twitter.
Twitter is not a perfect source of data, but at least is free and gives you a flavour of what the digital conversation is about. At the end of the day, humans are sensors of that reality you’re missing and leave traces of what they find interesting on the digital world. Twitter is not a perfect source of data because it’s biased: only people with access to smartphones or internet connection tweets, twitter is mostly used on certain age groups that might not represent what is going on in the whole community (in this case of mostly ecologist), and you never have certainty on how well is your data sampled. Anyways, is free and you don’t need a visa to play with twitter data, although some restrictions do exist.
At ESA people was asked not to tweet unless speakers allowed for tweeting at the beginning of their talks. Despite the non-twitter policy of the #ESA100 meeting (that’s was the official twitter hashtag), I managed to recover over 18000 tweets from 2589 twitter users. That’s huge!! Just to put perspective to those numbers, other conferences I’ve observed on twitter without the non-twitter policy include:
- International Conference of Computational Social Science: #ICCSS2015, 2288 tweets from 570 users.
- Network Science conference 2015: #NetSci15, >2000 tweets, ~550 users (of which I analysed 801 from 195 users)
- European Conference of Complex Systems: #ECCS14, 2330 tweets, 399 users
- EAT Forum Stockholm: #EAT2015, 897 tweets, 560 users (I missed the first day of data)
- Resilience conference 2014: #Resilience2014, 2042 tweets, 442 users
- World Water Week 2014: #wwweek, 1599 tweets, 793 users
So by comparison, not only was #ESA100 huge, it was also full of virtual activity despite its non-tweeting policy. Tweeting activity is nevertheless quite predictable at least in time. You would expect burst of activity around the plenary sessions in the mornings and afternoons, less so during nights and before / after the conference. Here is how it looks:
As you can see from the figure above, I don’t have data for the tweets before the conference. That’s one of the limitation of the Twitter API, you can ask for tweets but they decide which ones and which time period you get. Previously (last year) there was a window of 4 days that you could look in the past. Now it allows you to go further in the past and harvest more data but still it is not perfect. And since I only do this as hobby, I’m not up to date with the constant API terms of use changes. As expected there is peaks of activity during the day and valleys at night, in some days you can even observe the lunch break between two peaks. Is good that people mingle together and put down the phone from time to time. But who is this people? who is talking to who and about what?
The figure below depicts a mention network, one node connects to another if the first mention the second on her/his tweet. Therefore is a directed network where 16% of the links are reciprocated. The node size is scaled by the degree aka. the number of connections in the network. One could also use the number of followers in twitter but since I’m interested on the conversation and who is the interesting people to keep an eye on while missing the conference, not on how popular they are on twitter, degree on the mention network is a good proxy of the quality of the tweets content. Although is not depicted on the graph, note that there is also link weights given that one user can mention another many times through different tweets. Thus some links are stronger than others. But besides the visual appealing of the picture, is not very informative: few people have lots of links while most of people have fewer links. This could be because some people is simply more active on twitter, or they tweet more interesting stuff that is worth mention / retweeting, or simply some other underlying process that is unknown from the data alone; for example that the person tweeting is a very famous ecologist or that it mentions president Barak Obama, or both. Anyways, extracting the core of nodes that everyone is talking about is good if you want to filter the information that the network as a whole is signalling as more important, instead of reading the whole +18000 tweets. You can extract them on a list and keep an eye on the most trending stuff.
Who are they? Plotting the names on the graph would make it just messier. So here is the list of the top 50 #ESA100 twitter users given the number of times some one mention them. The number following the name is the number of links they have in the network, so the number of people who mention them.
- PLOS 248
- JacquelynGill 230
- leafwarbler 217
- srsupp 193
- DrEmilySKlein 174
- ESA_org 173
- ethanwhite 164
- SPBombaci 162
- ucfagls 154
- katteken 137
- DJPMoore 136
- openscience 134
- matthewgburgess 126
- skmorgane 123
- jhpantel 123
- annamgroves 121
- commnatural 113
- noamross 112
- polesasunder 108
- LeahAWasser 107
- DrNitrogen 106
- sjGoring 105
- sesync 104
- treebiology 102
- algaebarnacle 102
- tpoi 101
- ElenaBennett 98
- NEONInc 90
- jonbkoch 90
- MethodsEcolEvol 88
- PLOSEcology 86
- colindonihue 86
- tewksjj 82
- INNGEcologist 82
- jessicablois 81
- ESAOpenSci 81
- JoshGalperin 79
- elitabaldridge 78
- cjlortie 78
- GrunerDaniel 75
- MorphoFun 74
- JCSvenning 73
- bjenquist 72
- PLNReynolds 70
- fluby 69
- nceas 67
- wildwonderweb 66
- esanathist 66
- RallidaeRule 64
- davidjayharris 64
What were they talking about?
Since is the 100 ESA anniversary, here are the top 100 most retweeted tweets:
[1] “Theoreticians: stop telling us not to be scared of your equations. I’m not. Explain them well, like I do my methods, then continue #ESA100”
[2] “Watch President Obama wish the Ecological Society of America a happy 1OOth birthday on @Vimeo #ESA100 https://t.co/nhyaYmkt7C”
[3] “#esa100 is a good time to announce that @uofa is looking for 5 new hires in Ecosystem Genomics | global to microbes http://t.co/OZAyoDyO86”
[4] “First speaker to #ESA100 recognises ESA’s contribution to the environment: President Obama. Am impressed! http://t.co/oDSMNXYPg4”
[5] “Know of anyone looking for a PhD in ecology? Fully funded (!) at Wisconsin to work on bats and insects http://t.co/q3rGh9roZr #esa100”
[6] “#ESA100 friends, please read and RT my article on how to live-tweet scientific conferences! http://t.co/fMhDWivy9c #SciComm”
[7] “Exciting news from @ESA_org Council Meeting: all ESA members will get free online access to ESA journals. #ESA100”
[8] “One of the nicest things you can do at meetings is to acknowledge the students trying to catch your eye and introduce themselves. #ESA100”
[9] “Tenure track job in ecological modelling with @JaneElith & the rest of us at @qaecology https://t.co/44jCNxRBiZ #ESA100”
[10] “weird that tweeting talks at #ESA100 with permission only. If you don’t want people discussing your work you should not present it.”
[11] “Scicomm resource guide to eco-communication #ESA100 http://t.co/h6nEbjaq9S http://t.co/Xv5qe0HxIS”
[12] “What were we Tweeting about at #ESA100? (H/T again @fmic_ for Twitter stats code http://t.co/SlyQHL0yDE) http://t.co/eWmUxQJDhP”
[13] “Of course Terry Pratchett already wrote everything I think about science and sci fi, and better than I could #ESA100 http://t.co/fMmM04NpHM”
[14] “A few thoughts on #SciComm at #ESA100: Sharing science, stories & art; and @ESA_org’s social media confusion: http://t.co/7crijzElJ2”
[15] “This is what students see: fewer women speaking. Imagine gender equality for ESA 2016. #ESA100 @ESA_org #WomenInSTEM http://t.co/irs3QmStKD”
[16] “Slides from my #ESA100 talk on comparing different approaches to forecasting diversity. http://t.co/LoHIxgbidc w/links to code + grant”
[17] “Our #ESA100 centennial paper out in Ecospehre: Climate change & microbial-plant interactions @ESA_org http://t.co/fTLU0xtTOL”
[18] “Top tweeps at the #ESA100 meeting. (H/T @fmic_ for Twitter stats code http://t.co/SlyQHL0yDE) Good work, team! http://t.co/zOpkQO4PaS”
[19] “#ESA100 \nThe world is big. Scientists are relatively small. Collaborate.”
[20] “Overheard conversation by a bronycon goer: I think these are ecology people, there are a lot of Hawaiian shirts. #ESA100”
[21] “Test your talk in a simulator like ColorOracle first! http://t.co/PNhsJQsApv #esa100 https://t.co/aVlYUz1TED”
[22] “#ESA100 1.2M publications in ecology (or more). A total of 40% captured by 4 terms: interactions, biodiversity, climate change, & gradients.”
[23] “Happy #ESA100 & #BronyCon! Hasbro, DM me if you want to discuss marketing. #mylittlesturgeon #mylittlestudyspecies http://t.co/GhY4KD56yb”
[24] “Too many talks that I can’t understand because the figures are not colourblind-friendly #ESA100”
[25] “Secrets to successful scientific networks: trust, time and early-career scientists. @e_seabloom @e_borer #ESA100”
[26] “Let’s make ecology in the field safer for all: come to @Drew_Lab and my free workshop on Tues: http://t.co/icSjpaQE06 #ESA100 All welcome!”
[27] “Our Postdoc Fellowship Program is now accepting applications! Pre-screening submissions due October 26: http://t.co/8EkdBzxZjX Attn: #ESA100”
[28] “Yes, that’s @POTUS! RT @LPZ_UWI: Obama celebrates the #ESA100 centennial with us! http://t.co/M1MMpbKolD”
[29] “Hello #ESA100 The @calacademy is hiring new biodiversity scientists! Lots of ’em! Do science, change the world! http://t.co/F9DvM3e1Hp”
[30] “At our blog, you can submit your own “seed” of a Good Anthropocene: http://t.co/rIBLhUGGPF #ESA100”
[31] “A surprise birthday message to @ESA_org from @POTUS \”The health of our nation depends on the health of our environment\” #thanksobama #ESA100″
[32] “Speakers: promote #openscience ! \nDon’t forget to tell your audience if you are OK live tweeting! #ESA100”
[33] “#VirginiaTech is hiring a stream ecologist! Come talk to me at #ESA100 if you have questions: https://t.co/aNOXL7l2yN”
[34] “We’re seeking time-series data for a #biodiversity study. Do you have data to share? http://t.co/6PfCHWZMNI @maadornelas @mioconnor #ESA100”
[35] “Research News at #ESA100 “Increase in red spruce growth tied to the Clean Air Act” @atkinsjeff http://t.co/yDm3w8vxUa http://t.co/rb9HO0ih9u”
[36] “One thing clear at #esa100: The Anthropocene as an idea has won.”
[37] “Dr Erwin: Change is the observable dynamic of the fossil record – there is not empirical evidence for equilibrium. \n\nYes!! #ESA100 #esapl2”
[38] “Loss: Cat predation: 2.4 billion birds killed by cats in the US every year 70% from feral cats #ESA100″
[39] “We might just need better (realistic, detailed, radical) visions of positive future. #GoodAnthropocenes #ESA100”
[40] “Sketching your notes at #ESA100 ideas for creative expression from #ESASciComm @commnatural http://t.co/22kL1reSc7 http://t.co/kPYoVV74wu”
[41] “When Science is Not Enough: Communicating the Scientific Consensus on #ClimateChange @samillingworth #scicomm #ESA100 http://t.co/eLDAzp6sr2”
[42] “Hi #ESA100, please favorite this if you are interested in finding a way to convince the society to give a budget line to support @ESA_SEEDS.”
[43] “\”we use statistics to hide the instability of our arguments\” http://t.co/R7BC2f1UMr #ESA100 #ecology #biology”
[44] “Not sure I understand the no tweet policy at #ESA100. I mean why would you want it? You are already sharing your research w professionals”
[45] “Scientists have a hard time talking about race. We also have a hard time listening. These are uncomfortable but vital conversations #ESA100”
[46] “#ESA100 program change: new COS at 1050AM Fire alarm impacts on ecologist community dynamics http://t.co/hPW60xD55K”
[47] “Beautiful data, carefully curated and presented, made available to the world in multiple formats. Surely this is the future. #ESA100”
[48] “#ESA100 slide makers: allow me to recommend this color scale for your graphs in the future: http://t.co/FoTnVldbGL”
[49] “Strong argument for allowing tweeting of conference talks & posters. #ESA100 #gsa2015 https://t.co/9bxHXga6dJ”
[50] “You never know someone’s personal pronouns unless you ask. Some folks at #ESA100 write them on their badges. It’s always worth checking.”
[51] “Access now! Functional Ecology Special Feature: Urban Ecology: http://t.co/IHEVetopyL #ESA100 #UrbanEcology”
[52] “Could #ESA100 moderators ask speakers if tweeting talk is okay?I bet most are okay with it but don’t know assent is required. @ESA_org”
[53] “Diverse group of people better solve problems. Benefit of diversity to science goes up as problems get harder #esa100 @ESA_SEEDS”
[54] “What happens when the fire alarm goes off during talks at #ESA100 http://t.co/G1zjTLJEAA”
[55] “My take from #ESA100 so far: Ecology is actually a loose collection of disciplinary silos that barely communicate.”
[56] “Ecologists with mad data skills will catapult ecology into its next 100 years! #ESA100 #hackingecology”
[57] “Coding is becoming crucial for #Ecology @MethodsEcolEvol Applications explain new software, equipment & tools #ESA100 http://t.co/FWgSo232hX”
[58] “#ESA100: I’m adding to dataset on who asks ?’s after talks. Want to help? Just note gender of speaker & ppl asking them ?’s in your program.”
[59] “.@KathiJoJo \”China alone is firing up a new coal plant every eight to 10 days\” #ESA100 https://t.co/ywkc5WSi4r”
[60] “Anyone can tweet about my #ESA100 poster if they want: it’s up on @figshare and @github too. \nhttps://t.co/fwHlmI9o5y”
[61] “Slides from my #ESA100 talk on @nceas and @DataONEorg provenance tools in #rstats for reproducibility and #opendata https://t.co/vlsHXsT4YF”
[62] “New blog post: Thoughts about #SciComm, #openscience, sharing, & social media confusion at #ESA100. http://t.co/7crijzElJ2”
[63] “Check out the highlights of my talk on Nitrogen fixation in tropical dry forests #ESA100 featured @PLOSEcology! \nhttps://t.co/i7kU68NcD9”
[64] “From the audience: calculus is the *wrong* math. We’d be better off teaching stats & probability (& computing) #ESA100 #HackingEcology”
[65] “So far very few talks have given permission to tweet. I wonder if bc they actively don’t want them shares or it’s not on their radar #ESA100”
[66] “Conducting ecosystems research? Check out our methods, models, tools, & databases: http://t.co/2ihCI5Db1R\n#ESA100”
[67] “#ESA100 save a postdoc’s self esteem, live tweet a talk.”
[68] “@BarackObama helps celebrate the @ESA_org centennial! #ESA100 #POTUS http://t.co/gz0W91hujg”
[69] “Beginning wk of special #ecology #climatechange coverage for #ESA100; get the rundown at http://t.co/FNpye6kbhm http://t.co/ekg6oi4HNW”
[70] “At #ESA100 @jagephart applies a climate change vulnerability framework to #foodsecurity in @PLOSEcology by @atkinsj http://t.co/rT5Di6yliT”
[71] “The Gund Institute in #Vermont seeks 5 PhD students. Do great work in beautiful #BTV: http://t.co/KvXdx7jVoQ #ESA100 http://t.co/vU14Qy9MT5”
[72] “Science is worthless unless it’s shared with others, yet academics incentivized to focus only on peer rev journals @JulieReynolds88 #ESA100”
[73] “Fascinating ignite talk Rachel Vannette (http://t.co/lWuzTXhE1c): microbial effects on plant-pollinator interactions #ESA100”
[74] “Another reason for #ESA100 talks to be open to live tweeting: we have a global audience unable to attend conference! https://t.co/8kVQXsC3fR”
[75] “Best #ESA100 fundraising #frisbee #secchidisk for @ESA_SEEDS by @duffy_ma @Drew_Lab @ESAAquatic @limnojess! http://t.co/wttjyh9Lm6”
[76] “All materials, slides, sources, code on @github & under CC-BY #openscience #ESA100 #rstats https://t.co/ocffOZsKL5 https://t.co/eJSMR0VmdQ”
[77] “To tweet or not to tweet at conferences? Confusion at #esa100 http://t.co/9PA66JBVTO\n\n@Drew_Lab @ewanbirney @_Jni_ @ta_wheeler @ESA_org”
[78] “Lenore Fahrig: \”All habitat has value, no matter how small\”. Major review shows habitat loss NOT fragmentation hurts biodiversity #ESA100″
[79] “Powerful to hear Susan Harrison tell us her 15-yr field site was consumed by wildfires just 30 minutes ago. Here’s to new directions #ESA100”
[80] “#BrightSpots, seeds of a #GoodAnthropocene: Pockets of a better future \nthat are already in existence today #ESA100 http://t.co/rIBLhUGGPF”
[81] “#ESA100,Pres David Inouye,Scientific Plenary during #POTUS greeting video,Whooa, ESA and US Pres, doesn’t get better! http://t.co/i1ThOuwkyS”
[82] “Can you guys at #ESA100 help me spread the good word on #sciart? https://t.co/opvO8isK70 Thanks! http://t.co/gG5S8gO2Q7”
[83] “How to educate all when we don’t value outreach &esp social justice work? When we pretend the meritocracy works? @RushHolt @ESA_org #ESA100”
[84] “Lovejoy 2 degree Warming target chosen not for its ecological merit; means a world w/out tropical reefs e.g. #esa100 http://t.co/nQsHj8jbK9”
[85] “Climate change shapes drought/flood frequency & severity @allingon on @PLOSONE @PLOSBiology papers & #ESA100 sessions http://t.co/B3EqPAvWx0”
[86] “New @PLOSEcology \”All Eyes on the Oceans: James Hansen & Sea Level Rise http://t.co/vIRFonX1Rb @sashajwright #ESA100 http://t.co/ZWmbW5OsSo”
[87] “Another fun animation of an Am Nat Classic foundations of ecology in rhyme no less! http://t.co/H6dcb2WbXb #esa100″
[88] “.@ESA_org Another good way to keep important secrets is to *not* include them in presentations to groups of strangers #ESA100”
[89] “Best live-tweet advice so far: When tweeting 2+ times per talk, reply to your 1st to create a chain of tweets. Thx @PlantTeaching! #ESA100”
[90] “The BronyCon people have red, yellow, & green tags on their name tags. These are how willing they are to talk. Chat w/ green only #ESA100″
[91] “Ecology from treetop to bedrock: human influence in earth’s critical zone #ESA100 – Ecotone (blog) http://t.co/WFSnddb6un”
[92] “@colindonihue #ESA100 Most favorited users (among users who tweeted 5+ times, excludes retweets). http://t.co/otms4ndCix”
[93] “Ecology in a Changing World: the #ESA100 centennial video http://t.co/ByaaYIhXlB”
[94] “Conservation fuels ecological discovery, not just vice versa says Bill Fagan #ESA100”
[95] “Slides from my #ESA100 ignite talk on \”Hacking ecology: Facilitating data-intensive research in ecology\” http://t.co/TD5BZYy3f0″
[96] “Fot those interested in R: a new R package called cati #ESA100 http://t.co/gVMwoKtReG”
[97] “@Drew_Lab \”We didn’t have a tardis, but we had a museum collection!\” Going back in time to look at fish diversity in Bootless Bay. #ESA100″
[98] “Time matters. Learn about temporal ecology and ecosystems at #ESA100 Thursday morning | http://t.co/dLmN9ZWESR http://t.co/e7c8BiVgsi”
[99] “.@polesasunder created an #rstats package to analyze community time series data: codyn #HackingEcology #ESA100”
[100] “.@ethanwhite on the cultural changes needed to get more scientists creating software tools:\nTrain\nHire\nCollaborate\nReward\n#esa100”
The search produce slightly different results when one look on tweets that have been previously retweeted. It include tweets that are not listed above, for example:
“RT @flypod2: Know of anyone looking for a PhD in ecology? Fully funded (!) at Wisconsin to work on bats and insects http://t.co/q3rGh9roZr …”
“RT @PLNReynolds: 50 notable papers in #Ecology, all currently #OpenAccess! #ESA100 #ReadingList http://t.co/Y1WYsYdVKA http://t.co/O6KDr3v…”
As you see, lots of job offers going on, president Obama was mentioned quite a bit, and gladly I was not the only one doing twitter analytics 🙂 The first tweet was retweeted 92 times and the last one only 11. One of the topics that got retweeted a lot was about the live-tweeting policy, see tweets 6 and 10 as example with 53 and 44 retweets respectively. Do you think people was generally happy with the conference despite this tweeting policy?
Reading the >18000 tweets to figure it out is not a pleasurable read even if you couldn’t attend the conference like me. To answer such question one can use sentiment analysis, a text mining technique that ranks pieces of text (tweets in this case) given the presence of words that have been previously labeled as common when expressing positive or negative emotions. The labelled lexicon (~6800) was developed by Minqing Hu and Bing Liu, two computer scientist from University of Illinois and Microsoft respectively. You can download their lexicon and learn more about their work here.
The figure below shows the results of the sentiment analysis for the #ESA100 dataset. If a tweet got a zero score, it’s emotion content is neutral, if the score is positive is dominated by positive words and if the score is negative the opposite. The plot shows that the distribution of tweet emotions as learned from the Hu & Liu training lexicon are skewed towards the positive side. The top 10 positive tweets are:
- Come and see us at Booth 328. Play our game to win an exciting prize and enter our prize draw for $100 worth of books! #ESA100
- Wow! What an absolute pleasure to meet @kwren88! #ESA100 keeps getting better and better!
- Super excited to see that #sketchyourscience happened again at #ESA100! Good work #ESASciComm people! (#WishIWasThere)
- Du is making it easy for us by being super clear about whether results matched his predictions. Good thing b/c it’s late on Day 4 of #ESA100
- RT @srsupp: Scanga: You need to find a strong support network. Family friendly work, backup at home (and money can help). #earlycareer #esa…
- RT @JoshGalperin: .@uedlab – Ecologists look at the way #citites work, but they can work with designers to make cities work better. #ESA100…
- Jackson: interdisciplinary work is tough! Takes time and the right attitude/aptitude – they do work but still a major challenge #ESA100
- SeJin Song’s #ESA100 ignite talk was gorgeous, w/ vivid clear visuals. #ESASciComm would love to talk w/ her re design decisions!
- Big thank you to @leafwarbler for the great #ESA100 live tweeting — SO great for those who can’t be there! (like me 😦 …)
- .@MCFitzpatrick: Realized niches overlap less and less as you go back in time. How well does this work and can it work better? #ESA100
And the top 10 + negative tweets are:
- @k_a_christopher Buddhism: suffering stems from greed, hatred, and/or delusion. Ecological problems often have same origins. #ESA100
- Comparing areas with american seagrass vs invasive asiatic sand sedge… Invasive areas are NOT more susceptible to erosion #ESA100
- Brown: sustainable development is thermodynamically unsustainable. A catastrophic crash seems almost inevitable #ESA100 very provocative.
- Cause of all environmental problems? Greed, hatred, and/or delusion says @ElBeeddha #ESA100
- #ESA100 poster 188: Alyssa Gehman #OdumSchool-Influences on infection by an invasive castrating parasite, 8:30-10:30 am 8/14 Exhibit Hall
- Scientists have a hard time talking about race. We also have a hard time listening. These are uncomfortable but vital conversations #ESA100
- Jim Brown: risk of a catastrophic earth collapse is >99.99%. I don’t see any way out. Time for ecologists to step up. #ESA100
- I see a problem with this picture: http://t.co/zMGVI42K1n hint: it’s the same problem the #ESA100 plenary suffered from …
- We lose minority STEM students after second univ year at alarming rates. What are we doing wrong? Focus on intro courses #esa100 @ESA_SEEDS
- Sorry about the fire alarm folks. The conv center sprinkler system activated; cause unknown #ESA100
15. RT @LauraEllenDee: Agreed! “@BonnieKeeler: Bummed to miss the #ESA100 Shark Tank. Hoping there will be live tweeting” https://t.co/005rayit…
As you can see (dear ESA organiser) there was not hard feelings against the policy, although tweets on both sides of the distribution point out to people sad of missing the conference and glad to see so much twitting activity. ehemmm just saying.
Another technique that I’ve used on my work to understand large amounts of unstructured data such as text is topic mining. Again, is not practical to read all tweets but thankfully there are methods out there to simplify noisy data and extract more valuable meaning. In topic mining by using the frequency distribution of words across documents one can fit the probability of a word belonging to a topic, and the probability of a topic explaining the contents of a document. A common technique to do so is called Latent Dirichlet Allocation. First I cleaned up the dataset creating a corpus without stop words, punctuation, the conference hashtags (#ESA100, #ESA2015), the twitter names of people mentioned and links to other webpages. That leave me with words that hopefully capture the topics of the twitter conversation. To better capture the variability of words I also get rid of overly popular words and extremely rare words that doesn’t contribute much when differentiating one topic from another.
Although the machine learning algorithm does its job, I’m not completely happy with the result. Each word cloud above summarises the most common words of 30 topics characterising the conversation of the 2589 twitter users. Each word is scaled according to how frequent they are in each topic. The problem with twitter data and topic models is that one ends up with more documents than words on them. Once the dataset is clean many tweets have few words or none at all, therefore the document term matrix is too sparse. A way to solve the issue would be to change the unit of analysis, the documents, from individual tweets to all tweets written by an user assuming that each person has a particular interest on the conference. If I’d have attended the conference, I’d probably look for talks related to regime shifts and methods to study them. Inherently each attendee has intentions and interest captured on their tweeting behaviour. But that’s probably for next blog post. If you want to play yourself with the topic model data, you can check this interactive visualisation.
Credits:
All this work was done in R following blogs by others and also scientific papers. If you are interested on this type of analysis just drop me a line and I can point you out towards some sources. The libraries I used are:
Jeff Gentry (2015). twitteR: R Based Twitter Client. R package version
1.1.9. http://CRAN.R-project.org/package=twitteR
Jeff Gentry and Duncan Temple Lang (2015). ROAuth: R Interface For OAuth.
R package version 0.9.6. http://CRAN.R-project.org/package=ROAuth
Ingo Feinerer and Kurt Hornik (2015). tm: Text Mining Package. R package
version 0.6-2. http://CRAN.R-project.org/package=tm
Bettina Gruen, Kurt Hornik (2011). topicmodels: An R Package for Fitting
Topic Models. Journal of Statistical Software, 40(13), 1-30. URL
http://www.jstatsoft.org/v40/i13/.
Jonathan Chang (2012). lda: Collapsed Gibbs sampling methods for topic
models.. R package version 1.3.2. http://CRAN.R-project.org/package=lda
Ian Fellows (2014). wordcloud: Word Clouds. R package version 2.5.
http://CRAN.R-project.org/package=wordcloud
Butts C (2008). “network: a Package for Managing Relational Data in R.”
_Journal of Statistical Software_, *24*(2). <URL:
http://www.jstatsoft.org/v24/i02/paper>.
Carter T. Butts (2014). sna: Tools for Social Network Analysis. R package
version 2.3-2. http://CRAN.R-project.org/package=sna
Carson Sievert and Kenny Shirley (2015). LDAvis: Interactive
Visualization of Topic Models. R package version 0.2.
http://CRAN.R-project.org/package=LDAvis
Ramnath Vaidyanathan, Karthik Ram and Scott Chamberlain (). gistr: Work
with ‘GitHub’ ‘Gists’. R package version 0.3.1.9100.
https://github.com/ropensci/gistr
Without them it wouldn’t be as fun to play with Twitter data in R. Thanks guys!
Leave a Reply