# Which Statistical Distribution Should Be Applied When

An experiment may be performed once, twice or any number of times. We might be always confused with Probability Distribution and think of Normal Distribution or Binomial Distribution. An experiment may be performed with many individuals to get a more accurate result.

For example, in an opinion poll, we might decide to ask 50 people whether they agree or disagree with a certain issue.

If we record 1 for agree and 0 for disagree, the sample space of this experiment will have elements each consisting of strings of 1’s and 0 ’s. Here, if the quantity of interest is the number of people who agree, out of 50, then we may assign some variable X with the required quantity.

This would decrease our original sample space to a reasonably smaller sample space of integers {0,1,2,.., 50}. This random quantity X is known as a Random Variable.

Hence, to perform an experiment, and for finding the probabilities associated with various experiments, a random variable comes into play every time.

Let us first understand the very basic idea of a Random Variable because our whole discussion ahead is based on it.

A Random Variable or a random quantity or a stochastic variable is a variable whose possible values are outcomes of a random phenomenon. A random variable’s possible values might represent the possible outcomes of a yet-to-be-performed experiment, or the possible values of a past experiment whose already existing value are uncertain.

Whenever we perform an experiment, its various possible outcomes may lie in a certain range or may

follow a certain pattern when plotted on a graph, the outcomes may have discrete values or they may

be very close to each other ( in a continuous manner). This is when the discrete and continuous probability

distributions come into existence, to make certain interpretations about the outcomes of our experiment.

For example, we need to find the number of flights taking off and landing during a given time in an airport. We know that the number of flights is non-negative integers, but our possible number of outcomes can be many, so from the experiment, we can attach a discrete random variable X which would take values from 0,1,2,…This solves our purpose of using a random variable in our daily life.

Now, let us first look at some Discrete Type of Distributions and see how can we use these distributions

in our daily life problems.

**DISCRETE TYPE OF DISTRIBUTIONS**

**Bernoulli Distribution**

Well, we all must have seen cricket matches starting with the toss of a coin. One can only get one of

the two possible results, a head or a tail. There’s no midway!While answering a true or false question, we have only two choices, either to mark the statement as TRUE or as FALSE.Such data, which has only two possible results, i.e. a true or a false, a success (1) or a failure (0)

comes under Bernoulli Distribution.Remember that a Bernoulli Distribution has only one trial. If the trials are finite and more than 1,

then we get Binomial Distribution.Now, we may mark any one of the two possible outcomes as a Success and the other as a Failure.

The probability mass function (pmf) of a Binomial Distribution is given by: Here, p is the probability of success and (1-p) is the probability of failure.The Expected Value of a random variable X following Bernoulli Distribution is in a fight between me and my friend, the probability of my success is p=0.40 and probability of my

friend’s success (my failure) is (1-p)=0.60 then, the chart below shows the Bernoulli distribution of

our fightThere are many examples of Bernoulli Distribution in our daily life as if it is going to rain tomorrow

(success) or not (failure), if one is going to pass or fail in a test and many more.**Binomial Distribution**Getting back to Cricket, suppose my team won the toss today and this indicates a successful

event. We toss again but we lose this time. If we win a toss today, this does not necessitate that

we will win the toss tomorrow.Such a distribution where there are more than one trials and only two possible outcomes (success and failure) and each trial is independent is called a Binomial Distribution. Also, the probability of success and failure is the same in each trial.The pmf of a Binomial Distribution is

Where n is the number of trials, and p is the probability of success. Suppose you’re playing a Tic Tac Toe game with your friend Jen. Here, you’ll either win or lose.

Suppose the probability of you winning the game is 0.45 and that of losing is 0.55.You play the game 10 times. The probability of you winning the game 7 out of 10 times is given

by the pmf of Binomial Distribution.Similarly, we can also use Binomial Distribution in Biology too. Suppose a certain medicine is

given to 10 identical people and we have to find the probability of success of that medicine, i.e.

whether the medicine cures the given disease or not.Here, we can easily obtain the number of patients who are cured by the medicine and put this value in the mean of Binomial Distribution and obtain the value of p. Remember, the conditions must be identical for each patient to apply Binomial Distribution.

**Poisson Distribution**Suppose you work in a call center, how many calls do you get in a day? It could be any number.The number of calls in a day is modeled by Poisson Distribution.

Poisson Distribution is Binomial Distribution with a number of trials becoming very large and the probability of success becoming very small. The conditions of independence of trials and homogeneity of the probability of success are the same as that of Binomial Distribution.The pmf of Poisson Distribution is given byHere, λ is the rate at which an event occurs, t is the time duration of the event and µ is the mean

the number of events in the interval of length t. We can use Poisson distribution in calculating the number of diners in a restaurant on a certain day.

Here, λ will be the rate at which the diners come. If the number of diners for seven days is 500, then we can predict the probability of having more customers on a certain day.Because of this application, Poisson distributions are used by businessmen to make forecasts about the

number of customers or sales on certain days or seasons of the year. In business, overstocking will

sometimes lead to losses if the goods are not sold.Likewise, understocking would lead to a lost business opportunity. By using this tool, businessmen are able to estimate the time when demand is unusually higher, so they can purchase more stock. Also, the waste of resources is prevented.

**CONTINUOUS DISTRIBUTIONS**

**Uniform Distribution**

When the probability of occurrence of each possible outcome is equally likely, then we apply Uniform Distribution.For example, choosing a real number from 1 to 100.There are infinite possible outcomes of such an experiment and therefore, the probability of occurrence of each real number is the same.The probability distribution function of Uniform Distribution isThe graph of a Uniform distribution looks like

Suppose there are 30 participants in a quiz. Each participant gets 25 seconds to answer a question. The probability of answering the question within a certain time can be approximated using Uniform Distribution. Or suppose you are told that the train you were going to travel in is delayed by 60 minutes.Then you can find the probability of the train arriving between 57 to 60 minutes using Uniform distribution again.

**Normal Distribution**This distribution has the widest range of applications. If a random variable has the following properties,

then it is said to follow Normal Distribution**a)**The mean, median and mode of the distribution coincide.

**b)**The curve of the distribution is bell-shaped and symmetrical about the line

**c)**The total area under the curve is 1.

**d)**Exactly half of the values are to the left of the center and the other half to the right.We can also check if a given dataset follows Normal Distribution using a Q-Q Plot. It is a probability plot to check if the given data set follows the assumed distribution. Here, we compute the expected value for each data point based on the assumed distribution. If the original data follows the assumed distribution, then the points on a Q-Q plot will fall approximately on a straight line. The given two graphs show that variable 1 is normally distributed whereas variable 2 is not.

The pdf of a variable X following normal distribution is given byHere µ is the mean of the random variable X and is the variance of X.**Exponential Distribution**Let us consider the call center example another time. What about the time interval between the calls? Here, Exponential Distribution models the Time interval between the calls.We can also find the time interval between metro arrivals, the life of an air conditioner using Exponential Distribution.

Usually, the data which includes the life of a certain object or the time interval between consequent tasks follow an exponential distribution. Exponential Distribution is used in Survival Analysis where λ is the failure rate of a device at time t given that it has survived up to t.

Also, the greater the rate, the faster the curve drops and lower the rate, flatter the curve. Given below is the exponential curve explaining the above sentence.

Hello Ginni,

First – thank you for an insightful article.

Second – I’d greatly appreciate it if you could be so kind and assist me with a side-question regarding the mentioned article: If one was to perform an experiment with two possible outcomes (success and failure). Probability of each is yet unknown. What would be the minimum number of repeats of the same experiment in order to achieve a certain amount of certainty in the results? Is there any way to pronounce that certainty?