We introduce an abbreviation i.i.d. which stands for Independent Identically Distributed.
Given a sequence $X_1, X_2, … $ of i.i.d. random variables with mean $\mu $ and variance $\sigma^2 $. Let $S_n = X_1 + … + X_n $ be the sum of the first $n $ of them.
Sample mean $M_n $ is defined as follows
$$M_n = \frac{ X_1, X_2, …, X_n }{ n } = \frac{ S_n }{ n }$$
And it follows that
$$E[M_n] = \mu, var(M_n) = \frac{ \sigma^2 }{ n } $$
Notice that the variance of $M_n $ decreases to zero as $n $ increases, and the bulk of the distribution of $M_n $ must be very close to the mean $\mu $.
Given a sequence $X_1, X_2, … $ of i.i.d. random variables with mean $\mu $ and variance $\sigma^2 $. Let $S_n = X_1 + … + X_n $ be the sum of the first $n $ of them.
We consider a random variable $Z_n $.
$$Z_n = \frac{ S_n - n \mu }{ \sigma \sqrt[ ]{ n }} $$
It can be seen that
$$\forall n, E[Z_n] = 0, var(Z_n) = 1$$
As $n $ grows, its distribution neither spreads, nor shrinks to a point.
The central limit theorem is concerned with the asymptotic shape of the distribution of $Z_n $ and asserts that it becomes the standard normal distribution.
Limit theorems are useful for several reasons:
-
Conceptually providing an interpretation of expectations in terms of a long sequence of identical independent experiments.
-
They allow for an approximation analysis of the properties of random variables such as $S_n $. This is to be contrasted with an exact analysis which would require a formula for the PMF or PDF of $S_n $, a complicated and tedious task when $n $ is large.
-
They play a major role in inference and statistics in the presence of large data set.