Population or Universe
Before giving the notion of sampling and its various types like Stratified Sampling and its application, let us first define the population. A population is the full set of all the possible units of analysis. It is also sometimes called the universe of observations.
For example, if we want to find the impact of a medicine on patients of tuberculosis in an area, our
the population will include all the patients of tuberculosis in that area.
When all the members of the population are explicitly identified, the resulting list is called a sampling frame. The sampling frame is a document that can be used with the different selection procedures described below to create a subset of the population for study. This subset is the sample. For example, a sampling frame for voters in a precinct would be the voter registration listing.
A sample is a collection of certain values chosen from the population. The sample size, usually denoted
by n, is the number of these values. If these values are chosen at random, the sample is called a random
sample. Each entry on the sampling frame is called a sampling unit.
A census is a study of every unit, everyone, or everything, in a population. It is known as a complete enumeration, which means a complete count.
Suppose you wish to study the impact of corporate image advertising in large corporations. You might define the unit of analysis as the corporation, and the population as “Fortune 500 Corporations” (a listing of the 500 largest corporations in the United States compiled by Fortune magazine). If we actually measure the amount of advertising for each of the 1000 corporations, we will be conducting a census of the variable.
If the population is infinite, the complete enumeration is not possible. Also, if the units are destroyed in the
course of inspection (e.g., inspection of crackers, explosive materials, etc.), 100% inspection, though
possible , is not at all desirable. Besides these problems, there may be time constraints to complete our
research, or administrative and financial implications.
In such cases, sampling is used to study the population. The sample characteristics are utilized to approximately determine or estimate the population. The error involved in such approximations are called sampling error and are inherent and usually unavoidable in any and every sampling scheme.
Sampling is quite often used in day-to-day practical life. For example, in a shop, we assess the quality of sugar, wheat or any other commodity by taking a handful of it from the bag and then decide to purchase it or not. A housewife normally tests the cooked products to find out if they are properly cooked and contain the proper quantity of salt.
Simple Random Sampling
It is the technique of drawing a sample in such a way that each unit of the population has an equal and
independent chance of being included in the sample.
If a sampling frame is available, drawing a representative probability sample is quite easy. You simply
select units from the list by using some truly random process like a random numbers table or computer
the program, so that every entry on the list has exactly the same probability of being chosen.
A clear example of a simple random probability sample is drawing a name from a hat: the sampling frame (a list
of names defining the universe) is torn up into equally-sized slips of paper, placed in a hat, mixed up
(randomized), and then a name is picked from the hat. All names have an equal probability of being picked,
and the mixing process ensures that there is no systematic bias in selecting a name.
Once again, there is no way to predict whose name will be drawn. Any name can be chosen, and all names have the same chance probability of being drawn (1 divided by the number of names in the hat).
Researchers can create a simple random sample using a couple of methods. With a lottery method, each member of the population is assigned a number, after which numbers are selected at random. The example in which the names of 25 employees out of 250 are chosen out of a hat is an example of the lottery method at work. Each of the 250 employees would be assigned a number between 1 and 250, after which 25 of those numbers would be chosen at random.
For larger populations, a manual lottery method can be quite time-consuming. In such a case we can use
a ‘Random Number Table’, which have been so constructed that each of the digits 0, 1, 2,.., 9 appear with approximately the same frequency and independently of each other.
The method of drawing the random sample consists of the following steps:
- Identify the N units in the population with the numbers from 1 to N.
- Select at random, any page of the ‘random number table’ and pick up the numbers in any row or column or diagonal at random.
- The population units corresponding to the numbers selected in step (ii) constitutes the random sample.
For Example: If we randomly sample 4 people (or labels) from 8, we need 4 random digits without replacement from 1, 2, 3,…, 8.
From the above table, we will simply read off random digits ignoring those that are out of range or recur (we are sampling without replacement) until we get four of them. Going from left to right across the top row of Table 1 we get 1 2 4   6 3 5 ….(Here, we have placed square brackets around numbers that are repeats of previously appearing numbers or are out of range.) Taking the first four usable numbers we get 1, 2, 4, 6 and the random sample consists of the individuals with those labels.
Simple Random Sampling With and Without Replacement
Consider a population of potato sacks, each of which has either 12, 13, 14, 15, 16, 17, or 18 potatoes, and all the values are equally likely. Suppose that, in this population, there is exactly one sack with each number.
So the whole population has seven sacks. If I sample two with replacement, then I first pick one (say 14). I had a 1/7 probability of choosing that one. Then I replace it. Then I pick another. Every one of them still has 1/7 probability of being chosen. And there are exactly 49 different possibilities here (assuming we distinguish between the first and second.) They are: (12,12), (12,13), (12, 14), (12,15), (12,16), (12,17), (12,18), (13,12), (13,13), (13,14), etc.
Consider the same population of potato sacks. If I sample two without replacement, then I first pick one (say 14). I had a 1/7 probability of choosing that one. Then I pick another. At this point, there are only six possibilities: 12, 13, 15, 16, 17, and 18. So there are only 42 different possibilities here (again assuming that we distinguish between the first and the second.) They are: (12,13), (12,14), (12,15), (12,16), (12,17), (12,18), (13,12), (13,14), (13,15), etc.
When we sample with replacement, the two sample values are independent. Practically, this means that what we get on the first one doesn’t affect what we get on the second. Mathematically, this means that the covariance between the two is zero.
In sampling without replacement, the two sample values aren’t independent. Practically, this means that what we got on the first one affects what we can get for the second one. Mathematically, this means that the covariance between the two isn’t zero. That complicates the computations. In particular, if we have an SRS (simple random sample) without replacement, from a population with variance, then the covariance of two of the different sample values is
where N is the population size. If the population is very large, this covariance is very close to zero. In that case, sampling with replacement isn’t much different from sampling without replacement.
Stratified random sampling is a method of sampling that involves the division of a population into smaller groups known as strata. In stratified random sampling or stratification, the strata are formed based on members’ shared attributes or characteristics.
Stratified random sampling is also called proportional random sampling or quota random sampling. The sample size of each stratum in this technique is proportionate to the population size of the stratum when viewed against the entire population. This means that each stratum has the same sampling fraction (n/N),
Stratified random sampling is a better method than simple random sampling. Stratified random sampling divides a population into subgroups or strata, and random samples are taken, in proportion to the population, from each of the strata created. The members in each of the stratum formed have similar attributes and characteristics. This method of sampling is widely used and very useful when the target population is heterogeneous. A simple random sample should be taken from each stratum.
For example, suppose we want to draw a probability sample of 100 undergraduates from a university for a study of the effect of professor-student communication patterns on student grades. Figures from the university records state that 65% of the students are majoring in Liberal Arts. We know, then, that a representative sample should include 65 Liberal Arts majors and 35 majors in other fields.
So we begin by separating the registrar’s student list (the sampling frame) into two strata: the Liberal Arts majors and the non-Liberal Arts majors. We then draw a random sample of 65 from the Liberal Arts stratum and
another random sample of 35 from the non-Liberal Arts stratum. The result is an unbiased sample in which there is no sampling error on the stratifying variable (academic major).
The sample has exactly the same proportion of Liberal Arts/non-Liberal Arts majors as does the population. Of course, other unstratified variables in the sample are still subject to sampling error.
The same method used above can be applied to the polling of elections, the income of varying populations, and income for different jobs across a nation.
In school, while selecting the captain of sports teams, most of our coaches asked us to call out numbers such as 1-5 and the students with a random number decided by the coach, for this instance, 3, would be called out to be the captains of different teams.
It would be a non-stressful selection process for both the coach as well as the players. Such a method of sampling is called Systematic Sampling.
Systematic sampling is a probability sampling method where the elements are chosen from a target population by selecting a random starting point and selecting other members after a fixed ‘sampling interval’. The sampling interval is calculated by dividing the entire population size by the desired sample size.
Systematic Sampling Formula for the interval (i) = N/n
The bias introduced by systematic random sampling is usually small, for practical situations, so this procedure is frequently used. However, if it is possible to draw a simple random sample rather than a systematic random sample, one should always do so.
The process of selection can interact with a hidden periodic trait within the population. If the sampling technique coincides with the periodicity of the trait, the sampling technique will no longer be random and representativeness of the sample is compromised.
The researcher must be certain that the chosen constant interval between subjects does not reflect a certain pattern of traits present in the population. If a pattern in the population exists and it coincides with the interval set by the researcher, the randomness of the sampling technique is compromised.
Cluster sampling is a sampling technique that divides the main population into various sections (clusters).
In this sampling technique, the analysis is carried out on a sample which consists of multiple sample parameters such as demographics, habits, background – or any other population attribute which may be the focus of conducted research. It is usually used when groups that are similar yet internally diverse form a statistical population. Instead of selecting the entire population of data, cluster sampling allows the researchers to collect data by bifurcating the data into small, more effective groups.
We first divide the population under study into some recognizable sub-divisions or clusters and then a
simple random sample of these clusters is drawn.
For example, if we are interested in obtaining the income of opinion data in a city, the whole city may be divided into N different blocks or localities (which determine the clusters) and a simple random sample of n blocks is drawn. The individuals in the selected blocks determine the cluster sample.
In most cases, sampling by clusters happens over multiple stages. A stage is considered to be the steps taken to get to a desired sample and cluster sampling is divided into a single-stage, two-stage, and multiple stages.
In single-stage cluster sampling, we divide the entire sample frame into clusters, usually based on some naturally occurring geographic grouping (e.g. city, town village, hospital). Then we sample these clusters and measure every element within the selected clusters.
Two-stage sampling is the same thing as single-stage sampling, but instead of taking all the elements found in the selected clusters (called the first stage of sampling), we take a random sample of elements from the cluster.
For example, in single-stage sampling, we might take an SRS of cities. Within each city, we would measure
characteristics of all hospitals. In a two-stage sampling plan, we would take an SRS of cities, and then within each city, we would list out every hospital. Then you would take an SRS of hospitals.
If we are interested in obtaining a sample of, say, n households from a particular State, the first stage units may be a district, the second stage units may be villages in the districts and the third stage units will be households in the villages. Each stage thus results in a reduction of the sample size. This is called multi-stage sampling. In this, a sample of first stage units is done by a suitable method of sampling. From among the selected first stage units, a sub-sample of secondary stage units is drawn. Further stages may be added to arrive at a sample of the desired sampling units.
Cluster Sampling is preferred when it is hard or expensive to visit each group/stratum (as is required in
stratified sampling). However, variance increases in this procedure.