Naive Bayes Algorithm In Python

In spite of the greatest advancement in machine learning in last few years, Naive Bayes classifier has proved out to be one of the most simple, accurate and reliable algorithms which are widely used in industrial applications. It works exceptionally well for applications like natural language processing problems. The whole idea of the Naive Bayes algorithm is based on the Bayes theorem.

Table of Contents

  • A quick introduction to Bayes Theorem
  • Naive Bayes Introduction
  • Types of Naive Bayes Algorithm
  • How Naive Bayes work?
  • Pros and Cons
  • Implementing Naive Bayes in Python
  • Applications

A Quick Introduction to Bayes Theorem

To put it in simplest words, you can consider it as evidence theorem which tells you how much one should believe in evidence. For example, let’s consider a dog barking in the middle of the night. If the dog barks for no good reason, you may become desensitized towards it and may not check if anything is fishy.

This is called false positive. But what if the dog barks only when someone enters your premises? You are more likely to be alert and will check if anything is wrong. Hence it becomes more reliable. So Bayes theorem is a mathematical concept which tells you how much you should trust an evidence.

Bayes theorem is given by:

P(A) is the priori of A (the prior probability, i.e. Probability of event before evidence is seen). The evidence is an attribute value of an unknown instance (here, it is event B).

P(A|B) is a posteriori probability of B, i.e. probability of an event after evidence is seen.

Basically, we are trying to find the probability of event A, given the event B is true. Event B is also termed as evidence. Now, in terms of our dataset, Bayes theorem is given as: 

where y is class variable and X is a dependent feature vector (of size n) where

Bayesian method of Probability

Let’s look at commonly used terminologies.
A is called proposition and B is called the evidence.
P(A) is called the prior probability of a proposition
P(B) is called the prior probability of evidence.
P(A|B) is called the Posterior.
P(B|A) is called the likelihood.

Naive Bayes Introduction

Naive Bayes algorithm is the algorithm that learns the probability of an object with certain features belonging to a particular group/class. In short, it is a probabilistic classifier. Naive Bayes model is easy to build and works well particularly for large datasets.

It is a classification technique which is based on the principle of Bayes Theorem. It is assumed that the presence of a particular feature in a class is unrelated to the presence of any other feature i.e, they are independent. Hence it is called “Naive” Bayes algorithm.

Types of Naive Bayes Algorithm

  1. Gaussian Naive Bayes: When attribute values are continuous, it is assumed that the values associated with each class are distributed according to Normal Distribution (Gaussian). An attribute says “x” contains continuous data. We first segment the data by the class and then compute the mean & Variance of each class
  2. Multinomial Naive Bayes: Feature vectors represent the frequencies with which certain events have been generated by a multinomial distribution. This is the event model typically used for document classification.
  3. Bernoulli Naive Bayes: In the multivariate Bernoulli event model, features are independent booleans (binary variables) describing inputs. Like the multinomial model, this model is popular for document classification tasks, where binary term occurrence (i.e. a word occurs in a document or not) features are used rather than term frequencies(i.e. the frequency of a word in the document).

How does Naive Bayes work?
Let us take an example to understand the functionality of the algorithm. Consider a dataset of weather with target variable “Play”. We need to find out whether the players will play the game depending upon the condition of the weather. Given below is a small dataset.

P(Yes) = 9/14

P(No) = 5/14

Now we need to calculate the individual probability with respect to each feature as shown below.

  1. Outlook
  2. Temperature
  3. Humidity
  4. Wind

Probability that they can play the game:

P(outlook=Sunny | play=Yes) = 2/9
P(temperature=Cool | play=Yes) = 3/9
P(humidity=high | play=Yes) = 3/9
P(wind=strong | play=Yes) = 3/9
P(play=Yes) = 9/14

Probability that they can play the game:

P(outlook=Sunny | play=No) = 3/5
P(temperature=Cool | play=No) = 1/5
P(humidity=high | play=No) = 4/5
P(wind=strong | play=No) = 3/5
P(play=No) = 5/14

P(X | play=Yes).P(play=Yes)=(2/9)*(3/9) *(3/9) *(3/9)*(9/14) = 0.0053
P(X | play=No).P(play=No)=(3/5)*(1/5) *(4/5) *(3/5)*(5/14) = 0.0206

Finally, divide both the results by the evidence P(X) to normalize.
P(X) = P(outlook=Sunny)*P(temperature=Cool)*P(Humidity=high)*P(wind=strong)
P(X) = (5/14) * (4/14) * (7/14) * (6/14) = 0.02186

Dividing the result by this value.
P(Play=Yes | X) = 0.0053/0.02186 = 0.2424
P(Play=No | X) = 0.0206/0.02186 = 0.9421

So given the probabilities, can you guess whether they can play the game or not?
Comparing the probabilities, you can see that P(Play=No | X) > P(Play=Yes | X). Hence they can’t play.

Pros and Cons of Naive Base Algorithm


  • Easy to implement.
  • Can be used for binary and multiclass classification.
  • can be easily trained on small datasets.
  • Works exceptionally well for text classification problems.
  • Performs well for categorical input variables as compared to numerical variables.


  • Can not learn the relationship between features as they are independent.
  • Strong feature independence assumptions.
  • Zero frequency problems

Implementing Naive Bayes in Python

Applications of Naive Base Algorithm

  • Spam filtering: Naive Bayes is used to identifying the spam e-mails.
  • Text classification: it is the popular algorithm used to classify text. For example, it is used to build a model which says whether the text is about sports or not.
  • Hybrid recommender system: Recommender system apply machine learning algorithm and data mining techniques for filtering the unseen information and to predict whether a user would like a given resource.
  • Online applications: simple emotion modeling

I hope you found this article helpful? Please do share your suggestions/thoughts in the comment section below.

You might also like More from author