Sentiment analysis on Narendra Modi’s tweets using Python

In this blog, we will learn how to use social and other open data sources to do sentiment analysis. For this blog, we will use Mr Narendra Modi’s tweets using tweeter API.

DESCRIPTION:
In this article we will:

  1. Extract twitter data using tweepy and learn how to handle it using pandas.
  2. Do some basic statistics and visualizations with numpy, matplotlib and seaborn.
  3. Do sentiment analysis of extracted (Narendra Modi’s) tweets using textblob.

What will we need?

We will need to have python installed in our system. In this blog, I will be using Jupyter Notebooks. I highly recommend installing Anaconda, which is a very useful Python distribution to manage packages that include a lot of useful tools.

The requirements that we’ll need to install are:

  • NumPy: This is the fundamental package for scientific computing with Python. Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data.
  • Pandas: This is an open source library providing high-performance, easy-to-use data structures and data analysis tools.
  • Tweepy: This is an easy-to-use Python library for accessing the Twitter API.
  • Matplotlib: This is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.
  • Seaborn: This is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics.
  • Textblob: This is a Python library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks.

All of them are “pip installable“. At the end of this article, you’ll be able to find more references about this Python libraries.

Extracting twitter data (tweepy + pandas)

Importing our libraries

In order to Install this tweepy package with conda run:

Creating a Twitter App

In order to extract tweets for a posterior analysis, we need to access to our Twitter account and create an app. The website to do this is https://apps.twitter.com/.

From this app that we’re creating we will save the following information in a script called credentials.py:

  • Consumer Key (API Key)
  • Consumer Secret (API Secret)
  • Access Token
  • Access Token Secret

An example of this script is the following:

I have replaced a few letters with X, You will get the exact keys from your application login.

Tweets extraction

Now that we’ve created a function to setup the Twitter API, we can use this function to create an “extractor” object. After this, we will use Tweepy’s function toextractor.user_timeline(screen_name, count) extract from screen_name‘s user the number of tweetscount.

As it is mentioned in the title, I’ve chosen @narendramodi as the user to extract data for a posterior analysis.

Creating a (pandas) DataFrame

We now have initial information to construct pandasDataFrame, in order to manipulate the info in a very easy way.

IPython’s functiondisplay plots an output in a friendly way, and the headmethod of a data frame allows us to visualize the first 5 elements of the data frame (or the first number of elements that are passed as an argument).

So, using Python’s list comprehension:

This will create an output similar to this:

The interesting part from here is the quantity of metadata contained in a single tweet. If we want to obtain data such as the creation date, or the source of creation, we can access the info with this attributes. An example is the following:

the length’s average in tweets: 129.065

*** Output ***

the tweet with more likes is:
वीरता, दृढ़ता, साहस और देशभक्ति के प्रतीक महान योद्धा महाराणा प्रताप को उनकी जयंती पर सादर नमन।
Number of likes: 33452
95 character.

The tweet with more retweets is:
Urging my sisters and brothers of Karnataka to vote in large numbers today. I would particularly like to call upon… https://t.co/hQXpnjnoY2
Number of retweets: 8018
139 character.

This creates the following output:

And to plot the likes versus the retweets in the same chart:

This will create the following output:

Pie charts of sources

Now we will plot the sources in a pie chart, since we realized that not every tweet is tweeted from the same source (😱🤔). We first clean all the sources:

With the following output, we realize that basically this twitter account has two sources:
Creation of content sources:
* Twitter for iPhone
* Media Studio

We now count the number of each source and create a pie chart. You’ll notice that this code cell is not the most optimized one

*** Output ***

Creation of content sources:
* Twitter Web Client
* Media Studio
* Twitter for iPhone

With this we obtain an output like this one:

Sentiment analysis

Importing textblob

As we discussed at the beginning of this post, textblob will allow us to do sentiment analysis in a very simple and hassle free way. We will also use the re library from Python, which is used to work with regular expressions. For this, I’ll provide you two utility functions to:

a) clean text (which means that any symbol distinct to an alphanumeric value will be remapped into a new one that satisfies this condition), and

b) create a classifier to analyze the polarity of each tweet after cleaning the text in it. Please refer the documentation in the official redocumentation.

As we can see, the last column contains the sentiment analysis (SA). We now just need to check the results.

Analyzing the results

To have a simple way to verify the results, we will count the number of neutral, positive and negative tweets and extract the percentages.

Now that we have the lists, we just print the percentages:

Percentage of positive tweets:    39.5%
Percentage of neutral tweets:     51.0%
Percentage de negative tweets:  9.5%

We have to consider that we’re working only with the 200 most recent tweets from Mr Narendra Modi. For more accurate results we can consider more tweets. An interesting thing (an invitation to the readers) is to analyze the polarity of the tweets from different sources, it might be deterministic that by only considering the tweets from one source the polarity would result more positive/negative. Anyway, I hope this resulted interesting.

Thanks,

Mohammad Sajid

 

You might also like More from author