# Theory of Estimation Or What is Estimation

One of the major applications of statistics is to estimate the unknown population parameter. For example what is Estimation, a poll may seek to estimate the portion of adult residents of a city who are unemployed. The process of providing numerical values to the unknown parameter of the population is known as estimation. The theory of estimation was founded by Prof. R.A. Fisher in a series of fundamental papers round about 1930 and includes some important characteristics and methods of finding good estimators of the unknown population parameter.

#### Basic Terminologies

Population: A group of individuals under study is called population. The population may be finite or infinite. Eg. All the registered voters in India.

Sample: A finite subset of statistical individuals in a population. Eg. Selecting some voters from all registered voters.

Parameter: The statistical constants of the population such as mean (μ), variance (σ2) etc. Eg. Mean of income of all the registered voters.

Statistic: The statistical constants of the sample such as mean (X̄), variance (s2) etc. In other words, any function of the random sample x1, x2,…, xn that are being observed say, Tn is called a statistic. Eg. Mean of income of selected voters.

Estimator: If a statistic is used to estimate an unknown parameter θ of the distribution, then it is called an estimator. Eg. Sample mean is an estimator of population mean.

Estimate: A particular value of the estimator is called an estimate of an unknown parameter. Eg. Mean income of selected voters is ₹25000 which represents mean income of all the registered voters.

Sampling Distribution: When the total probability is distributed according to the value of statistic then the distribution is said to be sampling distribution. Eg. If we want the average height of a voter, we can randomly select some of them and use the sample mean to estimate the population mean.

Standard Error: The standard deviation of the sampling distribution of a statistic is known as its standard error and is denoted by ‘s.e.’ Eg. If we want to know the variability of the height of voters, then standard error is used.

Now, before discussing about different methods of finding estimates of unknown population parameter, it is important to know the characteristics of a good estimator. Here, “a good estimator” is one which is close to the true value of the parameter as much as possible. The following are some of the criterion that should be satisfied by a good estimator:

1. Unbiasedness
2. Consistency
3. Efficiency
4. Sufficiency

Unbiasedness
This is a desirable property of a good estimator. An estimator Tn is said to be an unbiased estimator of γ (θ), where γ (θ) is a function of unknown parameter θ, if the expectation of the estimator is equal to the population parameter, i.e.,

E [Tn] = γ (θ)

Example: If X ~ N (μ,σ2), the sample mean is always an unbiased estimator of the population mean, i.e., E [X̄] = μ

Consistency
An estimator is said to be consistent if increasing the sample size produces an estimate with smaller standard error (standard deviation of sampling distribution of a statistic). In other words, if the sample size increases, it becomes almost certain that the value of a statistic will be very close to the true value of the parameter. Example: Sample mean is a consistent estimator of the population mean, since as sample size n→∞, the sample means converges to the population mean in probability and variability of the sample mean tends to 0.

Efficiency
There is a necessity of some further criterion which will enable us to choose between the estimators, with the common property of consistency. Such a criterion which is based on the variances of the sampling distribution of estimators is usually known as efficiency.

It refers to the size of the standard error of the statistic. If two statistic are compared from a sample of same size and we try to decide which one a good estimator is, the statistic that has a smaller standard error or standard deviation of the sampling distribution will be selected.

If T1 is the most efficient estimator with variance V1 and T2, any other estimator with variance V2, then the efficiency E of T2 is given by:

[∵ Efficiency and Variances are inversely proportional]

Sufficiency
An estimator is said to be sufficient for a parameter, if it contains all the information in the sample regarding the parameter.

If Tn is an estimator of parameter θ, based on a sample x1, x2,…, xn of size n from the population with density f(x,θ), such that the conditional distribution of x1, x2,…, xn given Tn, is independent of θ, then Tn is sufficient estimator for θ.

Methods of Point Estimation
So far we have been discussing the requisites of a good estimator. Now we shall briefly outline some of the important methods of obtaining such estimators. Commonly used methods are:

• Method of Moments
• Method of Maximum Likelihood Estimation
• Method of Minimum Variance
• Method of Least Squares

Method of Moments (MoM)

The basic principle is to equate population moments (i.e. the means, variances, etc. of the theoretical model) to the corresponding sample moments (i.e. the means, variances, etc. of the sample data observed) and solve for the parameter(s).

Let x1, x2, …, xn be a random sample from any distribution f(x,θ) which has m unknown parameters θ1, θ2, …, θm, where m ≤ n. Then the moment estimators θ ̂ 1, θ ̂ 2, …, θ ̂ m are obtained by equating the first m sample moments to the corresponding m population moments and then solving for θ1, θ2, …, θm.

Method of Maximum Likelihood Estimation (MLE)
MLE is widely regarded as the best general method of finding estimators. In particular, MLE’s usually have easily determined asymptotic properties and are especially good in the large sample situations. “Asymptotic’’ here just means when the samples are very large.

Let x1, x2, …, xn be a random sample from a population with density f(x,θ). The likelihood function of the observed sample at the function of θ is given by:

Notice that the likelihood function is a function of the unknown parameter θ. So different values of θ would give different values for the likelihood. The maximum likelihood approach is to find the value of θ that would have been most likely to give us the particular sample we got. In other words, we need to find the value of θ that maximizes the likelihood function. In most cases, taking logs greatly simplifies the determination of the MLE θ ̂. Differentiating the likelihood or log likelihood with respect to the parameter and setting the derivative to 0 gives the MLE for the parameter.

It is necessary to check, either formally or through simple logic, that the turning point is a maximum. The formal approach would be to check that the second derivative is negative.

Method of Minimum Variance
It is also known as Minimum Variance Unbiased Estimator (MVUE). As the name itself depicts, estimator which is unbiased as well as having minimum variance.

If a statistic Tn based on a sample of size n is such that:

• Tn is unbiased
• It has the smallest variance among the class of all unbiased estimators
• then Tn is called MVUE of θ.

Method of Least Squares
The principle of least squares is used to fit a curve of the form:

where θi’s are unknown parameters, to a set of n sample observations (xi, yi); i=1,2,…,n from a bivariate population. It consists of minimizing the sum of squares of residuals,

subject to variations in θ1, θ2, …, θn. The normal equations for estimating θ1, θ2, …, θn are given by:

Confidence Intervals and Confidence Limits
Confidence interval provides an ‘interval estimate’ for an unknown population parameter. It is designed to contain the parameter’s value with some stated probability. The width of the interval provides a measure of the precision accuracy of the estimator involved.

Let xi, i = 1, 2, … n be a random sample of size n from f(x,θ). If T1(x) and T2(x) be any two statistics such that T1(x) ≤ T2(x) then,

P(T1(x) < θ < T2(x)) = 1 – α

where α is level of significance, then the random interval (T1(x), T2(x)) is called 100(1-α)% confidence interval for θ.

Here, T1 is called lower confidence limit and T2 is called upper confidence limit. (1-α) is called the confidence coefficient.

Usually, the value of α is taken as 5% in the testing of hypothesis. Thus, if α = 5%, then there is a 95% chance of the estimate to be in the confidence interval.

Interval estimate = Point estimate ± Margin of Error

The margin of error is the amount of random sampling error. In other words, the range of values above and below the sample statistic.

Margin of Error = Critical Value * Standard Error of the statistic

Here, a critical value is the point (or points) on the scale of the test statistic beyond which we reject the null hypothesis, and is derived from the level of significance α of a particular test into consideration.

Confidence intervals are not unique. In general, they should be obtained via the sampling distribution of a good estimator, in particular, the MLE. Even then there is a choice between one-sided and two-sided intervals and between equal-tailed and shortest length intervals although these are often the same.

So, we have learned what the estimation is, i.e., the process of providing numerical value to unknown population parameter. To test whether an estimate is a good estimator of the population parameter, an estimate should have the following characteristics:

1. Unbiasedness
2. Consistency
3. Efficiency
4. Sufficiency

There are different methods of finding estimates such as method of moments, MLE, minimum variance and least squares. Of these methods, MLE is considered as the best general method of finding estimates.

Also, there are two types of estimations, point and interval estimation. Point estimation provides a single value to the estimate, whereas, interval estimation provides confidence interval which is likely to include the unknown population parameter.

Hence, now you have the basic understanding about the theory of estimation.