Counting words in Arab Spring tweets – People were really excited about Egypt

I’ve started working more with the Arab Spring dataset that I mentioned earlier with the Hadoop cluster my working group in the School of Journalism and Mass Communication is putting together.  I started with what seems to be the “Hello World” of Hadoop — WordCount.

Anyhow, looking for mentions of “Egypt” proved to be really funny.  The first item is the string that I found, and the second is the number of times it appeared.

ahanna@hadoop1 [~/project/jan25] grep -P "^(e|E)gypt" jan25_tweets-wordcount.txt | head -31
Egypt 2068994
Egypts 9
Egypt! 56275
Egypt!! 6254
Egypt!!! 7506
Egypt!!!! 1034
Egypt!!!!! 392
Egypt!!!!!! 211
Egypt!!!!!!! 64
Egypt!!!!!!!! 43
Egypt!!!!!!!!! 27
Egypt!!!!!!!!!! 22
Egypt!!!!!!!!!!! 15
Egypt!!!!!!!!!!!! 18
Egypt!!!!!!!!!!!!! 10
Egypt!!!!!!!!!!!!!! 11
Egypt!!!!!!!!!!!!!!! 9
Egypt!!!!!!!!!!!!!!!! 8
Egypt!!!!!!!!!!!!!!!!! 7
Egypt!!!!!!!!!!!!!!!!!!! 6
Egypt!!!!!!!!!!!!!!!!!!!! 7
Egypt!!!!!!!!!!!!!!!!!!!!!! 2
Egypt!!!!!!!!!!!!!!!!!!!!!!! 1
Egypt!!!!!!!!!!!!!!!!!!!!!!!! 5
Egypt!!!!!!!!!!!!!!!!!!!!!!!!! 1
Egypt!!!!!!!!!!!!!!!!!!!!!!!!!! 1
Egypt!!!!!!!!!!!!!!!!!!!!!!!!!!!! 2
Egypt!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 1
Egypt!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 1
Egypt!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 1
Egypt!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 1

Anyhow more on this as my analysis gets more sophisticated.

Switch to our mobile site