Counting words in Arab Spring tweets – People were really excited about Egypt
April 26, 2012 Leave a Comment
I’ve started working more with the Arab Spring dataset that I mentioned earlier with the Hadoop cluster my working group in the School of Journalism and Mass Communication is putting together. I started with what seems to be the “Hello World” of Hadoop — WordCount.
Anyhow, looking for mentions of “Egypt” proved to be really funny. The first item is the string that I found, and the second is the number of times it appeared.
ahanna@hadoop1 [~/project/jan25] grep -P "^(e|E)gypt" jan25_tweets-wordcount.txt | head -31 Egypt 2068994 Egypts 9 Egypt! 56275 Egypt!! 6254 Egypt!!! 7506 Egypt!!!! 1034 Egypt!!!!! 392 Egypt!!!!!! 211 Egypt!!!!!!! 64 Egypt!!!!!!!! 43 Egypt!!!!!!!!! 27 Egypt!!!!!!!!!! 22 Egypt!!!!!!!!!!! 15 Egypt!!!!!!!!!!!! 18 Egypt!!!!!!!!!!!!! 10 Egypt!!!!!!!!!!!!!! 11 Egypt!!!!!!!!!!!!!!! 9 Egypt!!!!!!!!!!!!!!!! 8 Egypt!!!!!!!!!!!!!!!!! 7 Egypt!!!!!!!!!!!!!!!!!!! 6 Egypt!!!!!!!!!!!!!!!!!!!! 7 Egypt!!!!!!!!!!!!!!!!!!!!!! 2 Egypt!!!!!!!!!!!!!!!!!!!!!!! 1 Egypt!!!!!!!!!!!!!!!!!!!!!!!! 5 Egypt!!!!!!!!!!!!!!!!!!!!!!!!! 1 Egypt!!!!!!!!!!!!!!!!!!!!!!!!!! 1 Egypt!!!!!!!!!!!!!!!!!!!!!!!!!!!! 2 Egypt!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 1 Egypt!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 1 Egypt!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 1 Egypt!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 1
Anyhow more on this as my analysis gets more sophisticated.