The Central Limit Theorem - Jim Zenn

$\newcommand{\br}{\\}$ $\newcommand{\R}{\mathbb{R}}$ $\newcommand{\Q}{\mathbb{Q}}$ $\newcommand{\Z}{\mathbb{Z}}$ $\newcommand{\N}{\mathbb{N}}$ $\newcommand{\C}{\mathbb{C}}$ $\newcommand{\P}{\mathbb{P}}$ $\newcommand{\F}{\mathbb{F}}$ $\newcommand{\L}{\mathcal{L}}$ $\newcommand{\spa}[1]{\text{span}(#1)}$ $\newcommand{\dist}[1]{\text{dist}(#1)}$ $\newcommand{\max}[1]{\text{max}(#1)}$ $\newcommand{\min}[1]{\text{min}(#1)}$ $\newcommand{\supr}[0]{\text{sup}}$ $\newcommand{\infi}[0]{\text{inf}}$ $\newcommand{\ite}[1]{\text{int}(#1)}$ $\newcommand{\ext}[1]{\text{ext}(#1)}$ $\newcommand{\bdry}[1]{\partial #1}$ $\newcommand{\argmax}[1]{\underset{#1}{\text{argmax }}}$ $\newcommand{\argmin}[1]{\underset{#1}{\text{argmin }}}$ $\newcommand{\set}[1]{\left\{#1\right\}}$ $\newcommand{\emptyset}{\varnothing}$ $\newcommand{\tilde}{\text{~}}$ $\newcommand{\otherwise}{\text{ otherwise }}$ $\newcommand{\if}{\text{ if }}$ $\newcommand{\proj}{\text{proj}}$ $\newcommand{\union}{\cup}$ $\newcommand{\intercept}{\cap}$ $\newcommand{\abs}[1]{\left| #1 \right|}$ $\newcommand{\norm}[1]{\left\lVert#1\right\rVert}$ $\newcommand{\pare}[1]{\left(#1\right)}$ $\newcommand{\brac}[1]{\left[#1\right]}$ $\newcommand{\t}[1]{\text{ #1 }}$ $\newcommand{\head}{\text H}$ $\newcommand{\tail}{\text T}$ $\newcommand{\d}{\text d}$ $\newcommand{\limu}[2]{\underset{#1 \to #2}\lim}$ $\newcommand{\limd}[3]{\underset{#1 \to #2; #3}\lim}$ $\newcommand{\der}[2]{\frac{\d #1}{\d #2}}$ $\newcommand{\derw}[2]{\frac{\d #1^2}{\d^2 #2}}$ $\newcommand{\pder}[2]{\frac{\partial #1}{\partial #2}}$ $\newcommand{\pderw}[2]{\frac{\partial^2 #1}{\partial #2^2}}$ $\newcommand{\pderws}[3]{\frac{\partial^2 #1}{\partial #2 \partial #3}}$ $\newcommand{\inv}[1]{{#1}^{-1}}$ $\newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $\newcommand{\nullity}[1]{\text{nullity}(#1)}$ $\newcommand{\rank}[1]{\text{rank }#1}$ $\newcommand{\nullspace}[1]{\mathcal{N}\pare{#1}}$ $\newcommand{\range}[1]{\mathcal{R}\pare{#1}}$ $\newcommand{\var}[1]{\text{var}\pare{#1}}$ $\newcommand{\cov}[2]{\text{cov}(#1, #2)}$ $\newcommand{\tr}[1]{\text{tr}(#1)}$ $\newcommand{\oto}{\text{ one-to-one }}$ $\newcommand{\ot}{\text{ onto }}$ $\newcommand{\ipto}{\overset{\text{i.p.}}\longrightarrow}$ $\newcommand{\expdist}[1]{\text{ ~ Exp}(#1)}$ $\newcommand{\unifdist}[1]{\text{ ~ Unif}(#1)}$ $\newcommand{\normdist}[2]{\text{ ~ N}(#1,#2)}$ $\newcommand{\poissondist}[1]{\text{ ~ Poisson}(#1)}$ $\newcommand{\berndist}[1]{\text{ ~ Bernoulli}(#1)}$ $\newcommand{\binomdist}[2]{\text{ ~ B}(#1, #2)}$ $\newcommand{\ceil}[1]{\lceil#1\rceil}$ $\newcommand{\floor}[1]{\lfloor#1\rfloor}$ $\newcommand{\Re}[1]{\text{Re}(#1)}$ $\newcommand{\Im}[1]{\text{Im}(#1)}$ $\newcommand{\dom}[1]{\text{dom}(#1)}$ $\newcommand{\fnext}[1]{\overset{\sim}{#1}}$ $\newcommand{\transpose}[1]{{#1}^{\text{T}}}$ $\newcommand{\b}[1]{\boldsymbol{#1}}$ $\newcommand{\None}[1]{}$ $\newcommand{\Vcw}[2]{\begin{bmatrix} #1 \br #2 \end{bmatrix}}$ $\newcommand{\Vce}[3]{\begin{bmatrix} #1 \br #2 \br #3 \end{bmatrix}}$ $\newcommand{\Vcr}[4]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \end{bmatrix}}$ $\newcommand{\Vct}[5]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \br #5 \end{bmatrix}}$ $\newcommand{\Vcy}[6]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \br #5 \br #6 \end{bmatrix}}$ $\newcommand{\Vcu}[7]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \br #5 \br #6 \br #7 \end{bmatrix}}$ $\newcommand{\vcw}[2]{\begin{matrix} #1 \br #2 \end{matrix}}$ $\newcommand{\vce}[3]{\begin{matrix} #1 \br #2 \br #3 \end{matrix}}$ $\newcommand{\vcr}[4]{\begin{matrix} #1 \br #2 \br #3 \br #4 \end{matrix}}$ $\newcommand{\vct}[5]{\begin{matrix} #1 \br #2 \br #3 \br #4 \br #5 \end{matrix}}$ $\newcommand{\vcy}[6]{\begin{matrix} #1 \br #2 \br #3 \br #4 \br #5 \br #6 \end{matrix}}$ $\newcommand{\vcu}[7]{\begin{matrix} #1 \br #2 \br #3 \br #4 \br #5 \br #6 \br #7 \end{matrix}}$ $\newcommand{\Mqw}[2]{\begin{bmatrix} #1 & #2 \end{bmatrix}}$ $\newcommand{\Mqe}[3]{\begin{bmatrix} #1 & #2 & #3 \end{bmatrix}}$ $\newcommand{\Mqr}[4]{\begin{bmatrix} #1 & #2 & #3 & #4 \end{bmatrix}}$ $\newcommand{\Mqt}[5]{\begin{bmatrix} #1 & #2 & #3 & #4 & #5 \end{bmatrix}}$ $\newcommand{\Mwq}[2]{\begin{bmatrix} #1 \br #2 \end{bmatrix}}$ $\newcommand{\Meq}[3]{\begin{bmatrix} #1 \br #2 \br #3 \end{bmatrix}}$ $\newcommand{\Mrq}[4]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \end{bmatrix}}$ $\newcommand{\Mtq}[5]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \br #5 \end{bmatrix}}$ $\newcommand{\Mqw}[2]{\begin{bmatrix} #1 & #2 \end{bmatrix}}$ $\newcommand{\Mwq}[2]{\begin{bmatrix} #1 \br #2 \end{bmatrix}}$ $\newcommand{\Mww}[4]{\begin{bmatrix} #1 & #2 \br #3 & #4 \end{bmatrix}}$ $\newcommand{\Mqe}[3]{\begin{bmatrix} #1 & #2 & #3 \end{bmatrix}}$ $\newcommand{\Meq}[3]{\begin{bmatrix} #1 \br #2 \br #3 \end{bmatrix}}$ $\newcommand{\Mwe}[6]{\begin{bmatrix} #1 & #2 & #3\br #4 & #5 & #6 \end{bmatrix}}$ $\newcommand{\Mew}[6]{\begin{bmatrix} #1 & #2 \br #3 & #4 \br #5 & #6 \end{bmatrix}}$ $\newcommand{\Mee}[9]{\begin{bmatrix} #1 & #2 & #3 \br #4 & #5 & #6 \br #7 & #8 & #9 \end{bmatrix}}$

Definition: The Central Limit Theorem

Let $X_1, X_2, … $ be a sequence of i.i.d. random variables with mean $\mu $ and variance $\sigma^2 $, and define

$$Z_n = \frac{ X_1, X_2, …, X_n - n \mu }{ \sigma \sqrt[ ]{ n }} $$

Then, the CDF of $Z_n $ converges to the standard normal CDF

$$\Phi(z) = \frac{ 1 }{ \sqrt[ ]{ 2 \pi }} \int_{ - \infty }^{ z } e^{- x^2 /2} \d x $$

in the sense that

$$\limu{ n }{ \infty } P(Z_n \leq z) = \Phi(z), \forall z$$

Note

The central limit theorem is surprisingly general.

Besides independence and the implicit assumption that the mean and variance are finite, there are really no other restrictions on the distribution of $X_i$.

$X_i $ can be discrete, continuous, or mixed.

On the conceptual side, central limit theorem indicates that the sum of a large number of independent random variables is approximately normal.

On the practical side, central limit theorem eliminates the need for detailed probabilistic models.

Theorem : Normal Approximation Based on the Central Limit Theorem

Let $S_n = X_1, X_2, …, X_n $, where the $X_i $ are i.i.d. random variables with mean $\mu $ and variance $\sigma^2 $. If $n $ is large, the probability $P(S_n \leq C) $ can be approximated by treating $S_n $ as if it were normal, according to the following procedure.

Calculate the mean $n \mu $ and the variance $n \sigma^2 $ of $S_n $.
Calculate the normalized value $z = (c - n \mu) / \sigma \sqrt[ ]{ n } $
Use the approximation

$$P(S_n \leq c) \approx \Phi(z) $$

where $\phi(z) $ is available from standard normal CDF tables.

Example

We load on a plane $100 $ packages whose weights are independent random variables that are uniformly distributed between $5 $ and $50 $ pounds. What is the probability that the total weight will exceed $3000$ pounds?

Example

A machine processes parts, one at a time. The processing times we wish to approximate the probability that the number of parts processed within $320$ time units, denoted by $N_{320}$, is at least $100$.

Let $S_{100} = X_1, X_2, …, X_100 $ be the total processing time of the first $100$ parts. The event $\set{ N_{320} \geq 100}$ is the same as the event $\set{ S_100 \leq 320 }$, and we can now use a normal approximation to the distribution o $S_100 $. Note that

$$\mu = E[X_i] = 3, \sigma^2 = \var{ X_i } = \frac{ 4 }{ 3 }$$

We calculate the normalized value

$$z = \frac{ 320 - n \mu }{ \sigma \sqrt[ ]{ n }} = \frac{ 320 - 300 }{ \sqrt[ ]{ 100 \cdot 4 / 3 }} = 1.73$$

and use the approximation

$$P(S_{100} \leq 320) \approx \Phi(1.73) = 0.9582$$

Example: Evaluating bounds of the probabilities of interest based on bounds

We poll $n$ voters and record the fraction $M_n$ of those polled who are in favor of a particular candidate.

If $p$ is the fraction of the entire voter population that supports this candidate, then

$$M_n = \frac{ X_1, X_2, …, X_n }{ n }$$

where the $X_i \berndist{ p }$. In particular, $M_n $ has mean $p$ and variance $p(1- p)/ n $.

By the normal approximation, $X_1, X_2, …, X_n $ is approximately normal, and therefore $M_n $ is also approximately normal.

We are interested in probability $P(\abs{ M_n - p } \geq \epsilon) $ that the polling error is larger than some desired accuracy $\epsilon $.

Because of the symmetry of the normal PDF around the mean, we have

$$P(\abs{ M_n - p } \geq \epsilon) \approx 2P(M_n - p \geq \epsilon) $$

The variance $p(1 - p) / n$ of $M_n - p$ depends on $p $ and is therefore unknown. We note that the probability of a large deviation from the mean increases with the variance. Thus, we can obtain an upper bound on $P(M_n - p \geq \epsilon) $ by assuming that $M_n - p $ has the largest possible variance, namely $\frac{ 1 }{ 4n } $ which corresponds to $p = \frac{ 1 }{ 2 } $. To calculate this upper bound, we evaluate the standardized value

$$z = \frac{ \epsilon }{ 1/ (2 \sqrt[ ]{ n })}$$

and use the normal approximation

$$P(M_n - p \geq \epsilon) \leq 1 - \Phi(z) = 1 - \Phi(2 \epsilon \sqrt[ ]{ n }) $$

For instance, consider the case where $n = 100$ and $\epsilon= 0.1$. Assuming the worst-case variance, and treating $M_n$ as if it were normal, we obtain

$$\begin{align*} P(\abs{ M_{100} - p } \geq 0.1) &\approx 2P(M_n - p \geq 0.1) \br & \leq 2 - 2 \Phi(2 \cdot 0.1 \cdot \sqrt[ ]{ 100 }) = 2 - 2 \Phi(2) = 0.046 \end{align*}$$

This is much smaller (and more accurate) than the estimate of $0.25$ that was obtained using Chebychev inequality.

We now consider a reverse problem. How large a sample size $n $ is needed if we wish our estimate $M_n $ to be within $0.01 $ of $p $ with probability at least $0.95 $?

Assuming again the wort possible variance, we are led to the condition

$$ 2- 2 \Phi(2 \cdot 0.01 \cdot \sqrt[ ]{ n }) \leq 0.05 $$

$$\Phi(2 \cdot 0.01 \cdot \sqrt[ ]{ n }) \geq 0.975 $$

From the normal tables, we see that $\Phi(1.96) = 0.975 $ which leads to

$$ 2 \cdot 0.01 \cdot \sqrt[ ]{ n } \geq 1.96 $$

$$ n \geq \frac{(1.96)^2 }{ 4 \cdot (0.01)^2 } = 9604 $$

This is significantly better than the sample size of $ 50,000 $ that we found using Chebyshev’s inequality.