Counting words in Arab Spring tweets – People were really excited about Egypt

I’ve started working more with the Arab Spring dataset that I mentioned earlier with the Hadoop cluster my working group in the School of Journalism and Mass Communication is putting together.  I started with what seems to be the “Hello World” of Hadoop — WordCount.

Anyhow, looking for mentions of “Egypt” proved to be really funny.  The first item is the string that I found, and the second is the number of times it appeared.

ahanna@hadoop1 [~/project/jan25] grep -P "^(e|E)gypt" jan25_tweets-wordcount.txt | head -31
Egypt 2068994
Egypts 9
Egypt! 56275
Egypt!! 6254
Egypt!!! 7506
Egypt!!!! 1034
Egypt!!!!! 392
Egypt!!!!!! 211
Egypt!!!!!!! 64
Egypt!!!!!!!! 43
Egypt!!!!!!!!! 27
Egypt!!!!!!!!!! 22
Egypt!!!!!!!!!!! 15
Egypt!!!!!!!!!!!! 18
Egypt!!!!!!!!!!!!! 10
Egypt!!!!!!!!!!!!!! 11
Egypt!!!!!!!!!!!!!!! 9
Egypt!!!!!!!!!!!!!!!! 8
Egypt!!!!!!!!!!!!!!!!! 7
Egypt!!!!!!!!!!!!!!!!!!! 6
Egypt!!!!!!!!!!!!!!!!!!!! 7
Egypt!!!!!!!!!!!!!!!!!!!!!! 2
Egypt!!!!!!!!!!!!!!!!!!!!!!! 1
Egypt!!!!!!!!!!!!!!!!!!!!!!!! 5
Egypt!!!!!!!!!!!!!!!!!!!!!!!!! 1
Egypt!!!!!!!!!!!!!!!!!!!!!!!!!! 1
Egypt!!!!!!!!!!!!!!!!!!!!!!!!!!!! 2
Egypt!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 1
Egypt!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 1
Egypt!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 1
Egypt!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 1

Anyhow more on this as my analysis gets more sophisticated.

Visualizing the Polarized Discourse of “Why Do They Hate Us?”

Yesterday, the Egyptian-American columnist Mona Eltahawy (@monaeltahawy) published a provocative piece in Foreign Policy entitled Why Do They Hate Us?, a tract on repression of women’s rights in the Middle East which she attributes to, as the title implies, a certain hatred of women.  The article didn’t fail to provoke — my Twitter timeline was soon fill with responses from many Egyptian and Arab activists and writers on Twitter, and longer responses including some from Dima Khatib (@Dima_Khatib, a journalist for Al-Jazeera), Samia Errazzouki (@charquaouia), Mona Kareem (@monakareem), and Karim Malak. These responses are well worth a read and highlight the intricacies of talking about gender and feminism in the Middle East and North Africa.

Update (2012-04-25): Foreign Policy has posted several more responses to the original article. And there are many more in the works, from what I’ve seen.

Although the substance of the debate itself is highly engaging, I was particularly taken by the polarization and how it quickly emerged. And it wasn’t all against one – my Twitter timeline was clearly marked by those who were siding with Eltahawy and those resolutely against her.  There wasn’t exactly a rhyme or reason to the division. Some quipped that most American and Western readers lauded the article, while Arabs were critical. But there were some important exceptions.

I wondered if network analysis could highlight some of these divisions, so I queried aloud if Marc Smith (@marc_smith), creator of the network analysis tool NodeXL, could help me out.  He quickly produced the network visualizations below, the first based on Eltahawy’s Twitter handle (monaeltahawy) and the second on the original link to the Foreign Policy piece (  The following network visualizations emerged.

This is the visualization around monaeltahawy from about 12:21 UTC to 15:33 UTC, April 24, 2012. Nodes are Twitter users, edges are follow relationships (NodeXL only collects the first couple thousand of each user) and user mentions.

The first thing that emerges with this graph is the clear polarization between two clear groups, G1 and G2. G1 contains 435 nodes and 2402 unique edges. At the center of G1, as ranked by betweenness centrality, is Eltahawy and, curiously, @ShadiHamid, Director of Research at Brookings Doha Center.  Shadi commands a strong following and tweeted only a handful of times but had strong reach, e.g. the tweet below. Around them fan many users who do not seem connected to each other.

The second group is G2, 250 nodes strong with 3047 edges. At the center are @Dima_Khatib (mentioned above for her critique) and @Zeinobia, a long-time Egyptian blogger, and @HossamBahgat, Director of the Egyptian Initiative for Personal Rights, who criticized Eltahawy in a long series of tweets. Although from the metrics I have on edges it’s difficult to tell how dense each group is within clusters, it looks as though G2 is has denser subgraph and has many more smaller but prominent actors.

This next visualization focuses on the original link itself in its full form, collected from 18:02 UTC, April 23 to 20:21 UTC, April 24. Same edges, except edges that involve Eltahawy are in red.

The most noticeable thing about this graph is its diffuseness — there are 169 separate groups. Three emerge are the largest — G1, G2, and G5, all which are about the same size. Eltahawy is at the center of G1, and this cluster connects to G2 and G5.  However, there’s also a surprising amount of linkage between G2 and G5.  Look at the most central actors in each, in G1 are, oddly enough, both @monaeltahawy and @Zeinobia, as well as @RuwaydaMustafah, a female writer who speaks on Kurdish Rights. In G2, @AllisonKilkenny, @ShelbyKnox, and @NatlNOW — Allison Kilkenny, a writer on mass mobilizations like Occupy, Shelby Knox, a writer of a feminist blog, and the account for the National March for the War on Women, respectively. And in G5, @FP_Magazine, the account for Foreign Policy magazine, and @jricole, Juan Cole, blogger and specialist on Middle East issues.

I sort of expected the clear polarization on the first graph, but I was blown away with the second one. These groups virtually divided between those who are very involved with the Egyptian Twitterverse, American female journalists and feminist groups, and more academic/researcher types.

So what can it tell us about polarization on Twitter? Well, for one, it can illuminate who lines up on each side and give clues on how they engage with each other. There’s no sentiment analysis involved here, so we can’t tell who is spitting vitriol at whom. But there are clear patterns of mentioning and replying that indicate a bit where conversations are happening.  Furthermore, as the second graph indicates, there are some rather obvious divisions on who sees what commentary.  From I’ve read of their tweets, Zeinobia and Dima Khatib were largely critical, while Allison Kilkenny was not (note I’m inferring this from my own timeline so you are more than welcome to call bull on me if it’s not true). This lends some credence that the idea these structural links largely bound the opinions we see of a topic, the so-called “echo chamber” theory of the web. But there’s also a non-trivial number of connections between groups, so the categorization is certainly not absolute. And there’s definitely more work that needs to occur that allows us to consider content as well as structure.

