Central Tendency or Average

Central Tendency or Average.

Last time I discussed about Data Types. Today I am going to discuss about “Central Tendency” or “Average”.
The word “Average” denotes a ‘representative’ or ‘typical value’ of a whole set of observations. Average usually occupies a central position, so it is also known as measures of central tendency.
There are 3 measures of central tendency:
 
Central Tendency Pic
Central Tendency Pic
Mean:  This is the most commonly used Measures of Central Tendency. It is further divided into three categories viz. Arithmetic Mean, Geometric Mean and Harmonic Mean.
Arithmetic Mean:   A.M of a set of observations is defined as their sum, divided by the no. of observations. Let’s say, there are 5 students in a class. Their heights are 5 ft, 4.8 ft, 5.2 ft, 4.5 ft and 5.5 ft respectively.
So, the average height or A.M = (5+4.8+5.2+4.5+5.5) ft / 5 = 5 ft.
Arithmetic mean is the most important concept from “Mean” category. In analytics we often use the A.M concept in various stages.
Geometric Mean: Suppose we have n positive values, X1, X2, X3 …….. Xn. So, G.M is the nth root of the product of those values. Let’s say three values are 2, 4 and 8. Now the G.M will be:
G.M = (2X4X8)1/3 =641/3 = 4
Harmonic Mean: It is the reciprocal of arithmetic mean of the reciprocals of the values. Suppose 1, 5, 8, 10 these four values are there. In the following way we can calculate the Harmonic Mean:
H.M=1/1+1/5+1/8+1/10 = 1.425
Median: Till now we discussed about ‘Mean and its various types’. There is another important measurement of Average which is frequently used in Analytics viz. Median.
Median of a set of observations is the value of the middle most item when they are arranged in order of magnitude. Suppose 5 people came to see the doctor. Their age was 60 yrs, 35 yrs, 45 yrs, 70 yrs , 20 yrs respectively. If we want to know the average age of these set of patients using median concept, following will be the steps:

  1. Firstly we need to arrange these values in order of its magnitudes. Let’s arrange in ascending order, 20,35,45,60,70
  2. Then pick up the middle most value, here it is 45.
  3. So the median is 45.

Mode:  It refers to that value which occurs with maximum frequency. Let’s say, The shop keeper wants to know which shoe number has the highest sell in last month. This is the ideal situation where mode is used. If there are shoes of five different sizes viz. 3,4, 5 ,6, 7 and last month 10 shoes were sold. The last month’s sell’s data is as follows: 3,7,6,6,5,5,3,4,4,4. The following is the method of mode calculation:

  1. Firstly, arrange the data set in ascending or descending order. So, 3,3, 4,4,4,5,5,6,6,7.
  2. Now we can see that 4 has been occurred maximum times.
  3. So 4 is the mode, and this is our answer.

R code:

  • Mean () function is used to calculate the mean of a data set.
  • Median () function is to calculate median of a data set.
  • Mode() gives you the internal storage mode of the object which means it does not give the number with highest frequency. There is one package named (modeest) which can give you the mode value or we can write a function as following to calculate mode:

Mode<-function(x) {data<-unique(x) data [which.max(tabulate(match,(x,data)))]}

Hope you find the post useful. In my next post I will discuss the implications of Average concept in outlair detection and missing value treatment. For this, everyone should have the basic knowledge of mean, median and mode concept. I will be back very soon. Till then “Keep on Learning, Keep on Practicing”.
 
Author
MOUTRISHA CHAKRABORTY

You might also like More from author

Leave A Reply

Your email address will not be published.