​​Helen Zille Twitter Analysis

Political figures often have an overwhelming influence on socio-political in the world over- this is after all their ‘role in society’. Presidents and world leaders have a profound impact on issues ranging from health care to the transparency and confidence of national banking system. To a lesser extent this is also applicable to leaders of influential opposition dominant parties. Socio-political issues in South Africa for example are not just shaped by the ruling, the, ‘sphere of influence’ includes key opposition parties and leaders. Despite our political inclinations we cannot deny the influence key opposition figures such as; Mmusi Maimane, Julius Sello Malema, Helen Zille and Mosiuoa Lekota have had, and continue to have on socio-political issues and the political landscape in its entirety.

On the other end of the spectrum, digital media platforms, particularly Twitter, have acted as a sounding board for new ideas, giving the already influential figures a digital ‘loudspeaker’ to amplify the voices of this figures and put their views under further scrutiny from the general public. For the purposes of this analysis I decided to focus on an interesting and somewhat controversial figure, Helen Zille. The controversy surrounding some of her tweets and the very ‘un-politically correct’ stance coupled with her being active on Twitter was enough reason for me to choose this figure. I sought to extract as many tweets as I could from Helen Zille’s timeline (limited to 3200 tweets) and;

i.) analyse the tone of her tweets using IBM Watson’s Tone Analyzer- this would be analysis of the emotions and tones used in Helen Zille’s tweets. The Tone Analyzer analyses sentiment polarity, n-grams (combination of adjacent words), linguistic analysis and other factor to classify the text into categories of emotions (anger, disgust, fear, joy and sadness) and language styles (analytical, confidence and tentative). This is tool is generally used to help businesses better understand customer and brand sentiment on social media and other digital outlets.

Additional Readinghttps://console.bluemix.net/docs/services/tone-analyzer/science.html#the-science-behind-the-service

ii.) a word cloud showing Helen Zille’s most frequently used words- to give the reader an idea of the context of her last 3200 tweets

iii.) the frequency her tweets and the days and the hours she has more been more inclined to tweet on over her last 3200 tweets.

Carrying out this analysis required a mixture of Python, the TwitterAPI and the IBM Watson Tone AnalyzerAPI (all of which I connected to using Python). For anyone interested in reading the actual code, I included links to my Github repository with the relevant code.

IBM Watson Tone Analyzer

Clean Up Process

Carrying this analysis required some level of text cleaning to remove random non english characters. Since the idea of this analysis was to simply get the tone and writing style of Helen Zille’s tweets, I did not see it necessary to lemmatize or stem this corpus of text. I instead focused on removing URLs, emojis and other non-english characters to clean up the text for analysis. It should be noted however that punctuation marks are factored into the analysis made by the IBM Watson Tone Analyzer-

Limitations arose in terms of the number of API calls you are permitted to make, I could only analyse around 1000 sentences before hitting my limit. However, this equated to about a third of the tweets extracted from Helen Zille’s timeline.


The results of this analysis seemed to portray an analytical writing style and emotions leaning towards job. It should be noted that a bulk of these ±3200 tweets are responses to other people’s comments (76%)

To get a bit more context on her tweets I carried out a very simple word frequency analysis, looking at the words she most frequently uses in her tweets to get a bit of an understanding of the types of conversations she is having on Twitter

Word Cloud

Word clouds are a fast and effective way to show readers the main themes and trends emerging in a given corpus of text. In this instance, I am using the word cloud to get a visual presentation of the top 400 words Helen Zille has used over the last 3200 tweets. To see what pops out in terms of trends and themes.

Analysing this text required stemming it (because this was a faster options than lemmatization), removing stop words and removing non-english words. I defined non-english words as words that did not feature in an english dictionary I had identified (or the equivalent-ie a body corpus of commonly used english words.

In this analysis, we do not see any startling themes popping up. The only real thing that stands out is the presence of words related to flowers, plants and moths (e.g polyandria, pterocera, astilbe, gynandria, smilax). This could just be an indicator a conversation related to plants that possibly consisted of numerous replies (from Helen Zille). I would not pay too much attention to individual words in isolation, unless there are similar words in the frequency cloud.

Tweet frequency by day and hour

The purpose of this analysis was to get an understanding of the days when Helen Zille is most inclined to tweet. Although this is based off 3200 tweets it appears to be a large enough sample size to make solid abstractions Helen Zille’s general tweet frequency by weekday with a fair degree of accuracy.

The analysis itself involved converting the creation date for each tweet and converting the time into South African time (Twitter API returns these dats in GMT/UTC time) and computing the week day in which each tweet was made.

In general the frequency is relatively consistent the week, however Helen Zille does appear to be more active on Twitter on Mondays, with her tweet frequency slowing down to its lowest on Thursdays.

From an hourly perspective Helen Zille appears to be quite a morning person. Her Twitter activity peaks at around 7-7:30, ebbs at around 10, and remains fairly consistent throughout the day before rallying to night time peak at 8 and quickly receding at around 9-9:30pm. Given how frequent Helen Zille tweet (about 63000 tweets, retweets and replies since 2009- amounting to an average of a little over 20 tweets a day) this may be an indicator of her daily cycle- possibly giving you the ability to infer few insights of her daily online habits.

While this analysis may not have revealed anything startling, it gives you a picture of how you can use the same tools to analyse the online behaviour of potential customers and competitors to understand when they are most likely to be engaged (days and hours), the words they often use when tweeting and the general tone of their tweet. You can, using the IBM Watson Tone Analyzer, get an idea of what sentiments people portray when tweeting about or interacting with your brand.


IBM Tone Analyzer: https://github.com/EmmS21/TwitterAnalysis/blob/master/Analysing%20using%20IBM%20Tone%20Analyzer%20-%20Python


Tweet frequency by weekday: https://github.com/EmmS21/TwitterAnalysis/blob/master/Tweet-frequency%20by%20week%20days-%20Python

Tweet frequency by hour:https://github.com/EmmS21/TwitterAnalysis/blob/master/Tweet%20frequency%20by%20hour%20-%20Python

Facebook Comments

Back to Top