Emotion Detection (Iranian Propaganda)

This project was completed as part of the MRes Technology (Computer Science) course at the University of Portsmouth.

Summary

This project labelled Iranian state-sponsored propaganda tweets for their emotion automatically and evaluated the ability for five machine learning algorithms to accurately classify the emotion in the tweets.

The Data

This project used tweets from Twitter's election integrity datasets. These datasets consist of tweets from accounts that have been permanently suspended from the Twitter platform as they were identified as belonging to state-sponsored actors and pushed narratives on their behalf.

Data Extraction

This project was performed on tweets about the Iranian nuclear deal (JCPOA) as it was an internationally controversial issue, that Iran would likely be spreading propaganda about. To extract tweets about the nuclear deal, several keyterms were used (see here).

These keyterms were chosen after analysting the most frequent terms per month across the three (at the time) Iranian releases. Additionally, the extracted tweets were restricted to English and published between August 2013 and December 2018. Retweets were excluded to prevent the machine learning algorithms from developing a bias towards tweets that appear often as retweets.

Preprocessing

To prepare the tweets for labelling, steps were taken to transform the text in order to achieve the best possible results. These steps included: lowercase conversion, normalising accented letters, removing usernames, transforming hashtags, removing URL's, expanding contractions, removing special characters and removing stopwords. These steps were taken to improve the chance of matching words between the lexicon and the tweets, and to speed up processing.

Emotion Labelling

To label the tweets for their emotions, the NRC Emotion Lexicon was used. This lexicon covers 8 emotions: anger, anticipation, disgust, fear, joy, sadness, surprise and trust.

Emotion Detection (Machine Learning Task)

The machine learning task of emotion detection was split into a series of binary tasks. This meant that experiments were conducted to detect the presence or absence of a specific emotion. The machine learning tasks were performed on all emotions and across 5 features: unigrams, bigrams, trigrams, unigrams + bigrams, unigrams + bigrams + trigrams. The 5 algorithms used for these experiments were: K-Nearest Neighbours, Decision Tree, Naive Bayes, Support Vector Machine (with a linear kernel), and Random Forest.

Results

Not yet available.