# Dispersion

### Dispersion

Dispersion means the variability, spread in the data. Average gives a single representative of the data however reliability of average is more if dispersion is less. Consider the following example, suppose there are three screw manufacturing machines each of them is supposed to produce screw with length 3 cm. A sample of size 5 is drawn from the screws produced by each of the 3 machines and the length of each screw is measured. The results obtained are tabulated as below

 Machine no. Length of screws in each sample Sample average 1 1.5 2.5 3 3.5 4.5 3 2 1.95 2.95 3 3.05 4.05 3 3 1.99 2.99 3 3.01 4.01 3

Each of the machines produced screws of average length 3 cm. but there is much dispersion in the length of screws produced by machine 1. Also variation in the length of screws produced by machine 2 is significant. So from above example it can be understood why measuring dispersion along with averages is important.
To facilitate comparison between the variability of two different groups having different units we must have unit less measures. Measures of dispersion aren’t unit less with every measure of dispersion we have a corresponding measure for comparison which are known as coefficients of dispersion.

### Measures of dispersion

#### Range

The simplest measure of dispersion is range. Suppose L is the largest observation and S is the smallest observation in the data then range is defined as
Range = L-S
Though range is easy to calculate it depends on only two observations in the data so it’s not a reliable measure of dispersion.
Suppose the marks of students of two divisions in the subject statistics are as follows:

 Division Marks A 10 25 30 30 36 30 98 B 10 45 56 75 85 90 98

Range of marks of students in division A is 98-10=88 and that of students in division B is 98-10=88. Both divisions have same ranges but it can’t be said that they possess the same pattern in variation. The main drawback of range is that it does not depend on all observations.
a) For ungrouped data,
Range =L-S
Coefficient of range = (L-S)/(L+S)
For example,
The height (in cm) of 10 students in class 10th is
5.35, 6.01, 4.59, 4.98, 4.10, 5.02, 6.08, 5.69, 4.84, 5.31
Here,  range = 6.08 – 4.10=1.98
Coefficient of range = (6.08-4.10)/(6.08+4.10) = 0.1945

b) For discrete frequency distribution, Subtract the last observation from the first observation

 X Frequency 10 2 12 6 22 8 26 5

Range =26-10=16

c) For continuous frequency distribution, Calculate the class marks and subtract last mid value from first

 Class interval Class mark Frequency 10-20 15 5 20-30 25 8 30-40 35 4

Range =35-15=20

#### Quartile deviation or semi interquartile range

Quartile deviation is based on middle 50% data and middle 50% data lies between first and third quartile. Quartile deviation is given by:
Quartile deviation = (Q3-Q1)/2
Coefficient of quartile deviation = (Q3-Q1) / (Q3+Q1).
Q.D. does not depend on all observations and that’s its demerit.

a) For ungrouped data,
suppose the number of misprints on 11 randomly selected pages in a book are as 2, 2, 4, 0, 8, 8, 6, 9, 2, 5, 2
First arrange these numbers in ascending order, we get 0, 2, 2, 2, 2, 4, 5, 6, 8, 8, 9
Now lets find the first and third quartiles
Q1= the value of ((n+1)/4) th observation in the ordered arrangement of observations therefore Q1=3rd observation =2

Q3= the value of ((3(n+1))/4) th observation in the ordered arrangement of observations
Q3=9th observation =8
Quartile deviation = (8-2)/2 =3
Coefficient of quartile deviation =(8-2)/(8+2)=0.6

b) For discrete frequency distribution, the quartiles can be found by the same way as we found those for ungrouped data by writing each observation corresponding frequency times.

 X Frequency 1 8 2 3 8 1 18 3 12 8

A) Calculate first and third quartiles using the following procedure
First quartile
find N/4
find cumulative frequencies (c.f.)
Find the lower quartile class. It is the class in which (N/4)th observation falls. In other words it is the class whose c.f.
exceeds N/4 for the first time.
Apply the formula
Q1=l+((N/4-c.f.)/f)*h
Where,
l= lower limit of the lower quartile class
N= sum of all frequencies
c.f.= c.f. of the class preceding lower quartile class
f= frequency of lower quartile class
B) Third quartile
1) find 3N/4
2) c.f. we have already calculated above so there is no need to calculate it again
3) Find the upper quartile class. It is the class in which (3N/4)th observation falls. In other words it is the class whose c.f. exceeds 3N/4 for the first time.
4) Apply the formula
Q3=l+((3N/4-c.f.)/f)*h
Where,
l= lower limit of the upper quartile class
N= sum of all frequencies
c.f.= c.f. of the class preceding upper quartile class
f= frequency of upper quartile class

 Class interval frequency c.f. 10-20 65 65 20-30 56 121 q1 class 30-40 50 171 40-50 93 264 q3 class 50-60 21 285

Q1       = 20+ ((71.25-65)/56)*10 = 21.11607
Q3      = 40+ ((213.75-171)/93)*10 = 44.59677
Q.D. =(44.59677-21.11607)/2 = 11.74035

#### Mean deviation

Mean deviation about an average gives the arithmetic mean of absolute deviations from the average. It depends on all observations. Since it neglects the signs of deviations by taking absolute values it is not capable of further mathematical treatment.
1. For ungrouped data
The arithmetic mean of absolute deviations from arithmetic mean is called as mean deviation about arithmetic mean.
Suppose Xi ,i=1,2,..,n are n observations.
Step 1.calculate A.M.
Step 2. Find |X-mean|
Step 3. Find Σ|X-mean|
Step 4.obtain M.D. about mean using formula,
Example,
Suppose the Observations are 6,10,29,20,25,20,14,17,26,21
Mean=ΣX/n=18.8

 X X-mean |X-mean| 6 -12.8 12.8 10 -8.8 8.8 29 10.2 10.2 20 1.2 1.2 25 6.2 6.2 20 1.2 1.2 14 -4.8 4.8 17 -1.8 1.8 26 7.2 7.2 21 2.2 2.2 Total 56.4

Thus mean deviation about mean = 56.4/10=5.64
Similarly mean deviation about median and mode and their respective coefficients can be obtained by replacing mean by corresponding average.

b) For discrete frequency distribution
Suppose Xi ,i=1,2,..,n are n observations and fi,i=1,2,…n are their corresponding frequencies.
Step 1.calculate A.M.
Step 2. Find f|X-mean|
Step 3. Find Σf|X-mean|
Step 4.obtain M.D. about mean using formula,
Or simply the discrete frequency distribution can be converted into ungrouped data by repeating observations the corresponding frequency times as shown below.

 X Frequency 1 2 2 3 8 9 18 5 12 4

c) For continuous frequency distribution
Suppose Xi ,i=1,2,..,n are the class marks and fi,i=1,2,…n are the corresponding frequencies.
Step 1.calculate A.M.
Step 2. Find f|X-mean|
Step 3. Find Σf|X-mean|
Step 4.obtain M.D. about mean using formula,
The data is as tabulated below

 Class interval Class mark(Xi)(lower limit+upper limit)/2 Frequency (fi) xifi f|X-mean| 10-20 15 65 975 1183.68421 20-30 25 56 1400 459.78947 30-40 35 50 1750 89.47368 40-50 45 93 4185 1096.42105 50-60 55 21 1155 457.57895 Total N=285 9465 3286.947

M.D. about mean = Σf|X-mean|/N=3286.947/285= 11.53315

#### Variance (σ) and coefficient of variation (C.V.)

Variance
Variance is the arithmetic mean of the squares of deviations taken from arithmetic mean. Though variance is quite hard to understand and calculate it satisfies almost all requisites of an ideal measure of dispersion. Amongst all measures of dispersion variance is least affected by sampling fluctuations. Standard deviation is the positive square root of variance
Coefficient of variation (C.V.)
Whenever we want to compare the variability in two different data sets we cannot use the measures of dispersion as they have units same as that of the quantity being measured whereas we need a unit less measure for the purpose of comparison. In this case one can use the coefficient of measures of dispersion. One more coefficient of dispersion based on sd and mean is coefficient of variation.
C.V. = (standard deviation)/|mean| *100
While comparing variability of two data sets the one which has less C.V. is said to be more consistent.
a) For ungrouped data
Suppose Xi ,i=1,2,..,n are n observations.
Then,
variance=(Σ(x-mean)^2)/n
Example,
Suppose the Observations are 6,10,29,20,25,20,14,17,26,21
Mean=ΣX/n=18.8

 X (x-mean)^2 6 163.84 10 77.44 29 104.04 20 1.44 25 38.44 20 1.44 14 23.04 17 3.24 26 51.84 21 4.84 Total 469.6

Variance = 469.6/10=46.96

Note that the “var” command in R uses the formula
variance = (Σ(x-mean)^2)/(n-1)
b) For discrete frequency distribution
Suppose Xi ,i=1,2,..,n are n observations and fi,i=1,2,…n are their corresponding frequencies.
variance=(Σf(x-mean)^2)/N

 X Frequency (f) X*f f(x-mean)^2 1 2 2 143.76 2 3 6 167.77 8 9 72 19.67 18 5 90 363.10 12 4 48 25.44 Total N=23 218 719.74

Mean= 218/23=9.4783
Variance= 719.74/23 =31.29304
Sd = √variance = 5.594019
C.V. = sd*100/|mean| = 59.0192

c) For continuous frequency distribution
For given class intervals find corresponding class marks. Remaining procedure is same as that of discrete frequency distribution. Take class marks as X observations.

variance=(Σf(x-mean)^2)/N

 Class interval Class mark(Xi)(lower limit+upper limit)/2 Frequency (fi) xifi f(x-mean)^2 10-20 15 65 975 21555.5212 20-30 25 56 1400 3775.1170 30-40 35 50 1750 160.1101 40-50 45 93 4185 12926.2191 50-60 55 21 1155 9970.4011 Total N=285 9465 48387.37

Mean = (Σx*f)/N = 33.21053
Variance = 48387.37/285 =169.7802
Sd = √variance = 13.02997

Author : Click