$\newcommand{\br}{\\}$ $\newcommand{\R}{\mathbb{R}}$ $\newcommand{\Q}{\mathbb{Q}}$ $\newcommand{\Z}{\mathbb{Z}}$ $\newcommand{\N}{\mathbb{N}}$ $\newcommand{\C}{\mathbb{C}}$ $\newcommand{\P}{\mathbb{P}}$ $\newcommand{\F}{\mathbb{F}}$ $\newcommand{\L}{\mathcal{L}}$ $\newcommand{\spa}[1]{\text{span}(#1)}$ $\newcommand{\dist}[1]{\text{dist}(#1)}$ $\newcommand{\max}[1]{\text{max}(#1)}$ $\newcommand{\min}[1]{\text{min}(#1)}$ $\newcommand{\supr}[0]{\text{sup}}$ $\newcommand{\infi}[0]{\text{inf}}$ $\newcommand{\ite}[1]{\text{int}(#1)}$ $\newcommand{\ext}[1]{\text{ext}(#1)}$ $\newcommand{\bdry}[1]{\partial #1}$ $\newcommand{\argmax}[1]{\underset{#1}{\text{argmax }}}$ $\newcommand{\argmin}[1]{\underset{#1}{\text{argmin }}}$ $\newcommand{\set}[1]{\left\{#1\right\}}$ $\newcommand{\emptyset}{\varnothing}$ $\newcommand{\tilde}{\text{~}}$ $\newcommand{\otherwise}{\text{ otherwise }}$ $\newcommand{\if}{\text{ if }}$ $\newcommand{\proj}{\text{proj}}$ $\newcommand{\union}{\cup}$ $\newcommand{\intercept}{\cap}$ $\newcommand{\abs}[1]{\left| #1 \right|}$ $\newcommand{\norm}[1]{\left\lVert#1\right\rVert}$ $\newcommand{\pare}[1]{\left(#1\right)}$ $\newcommand{\brac}[1]{\left[#1\right]}$ $\newcommand{\t}[1]{\text{ #1 }}$ $\newcommand{\head}{\text H}$ $\newcommand{\tail}{\text T}$ $\newcommand{\d}{\text d}$ $\newcommand{\limu}[2]{\underset{#1 \to #2}\lim}$ $\newcommand{\limd}[3]{\underset{#1 \to #2; #3}\lim}$ $\newcommand{\der}[2]{\frac{\d #1}{\d #2}}$ $\newcommand{\derw}[2]{\frac{\d #1^2}{\d^2 #2}}$ $\newcommand{\pder}[2]{\frac{\partial #1}{\partial #2}}$ $\newcommand{\pderw}[2]{\frac{\partial^2 #1}{\partial #2^2}}$ $\newcommand{\pderws}[3]{\frac{\partial^2 #1}{\partial #2 \partial #3}}$ $\newcommand{\inv}[1]{{#1}^{-1}}$ $\newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $\newcommand{\nullity}[1]{\text{nullity}(#1)}$ $\newcommand{\rank}[1]{\text{rank }#1}$ $\newcommand{\nullspace}[1]{\mathcal{N}\pare{#1}}$ $\newcommand{\range}[1]{\mathcal{R}\pare{#1}}$ $\newcommand{\var}[1]{\text{var}\pare{#1}}$ $\newcommand{\cov}[2]{\text{cov}(#1, #2)}$ $\newcommand{\tr}[1]{\text{tr}(#1)}$ $\newcommand{\oto}{\text{ one-to-one }}$ $\newcommand{\ot}{\text{ onto }}$ $\newcommand{\ceil}[1]{\lceil#1\rceil}$ $\newcommand{\floor}[1]{\lfloor#1\rfloor}$ $\newcommand{\Re}[1]{\text{Re}(#1)}$ $\newcommand{\Im}[1]{\text{Im}(#1)}$ $\newcommand{\dom}[1]{\text{dom}(#1)}$ $\newcommand{\fnext}[1]{\overset{\sim}{#1}}$ $\newcommand{\transpose}[1]{{#1}^{\text{T}}}$ $\newcommand{\b}[1]{\boldsymbol{#1}}$ $\newcommand{\None}[1]{}$ $\newcommand{\Vcw}[2]{\begin{bmatrix} #1 \br #2 \end{bmatrix}}$ $\newcommand{\Vce}[3]{\begin{bmatrix} #1 \br #2 \br #3 \end{bmatrix}}$ $\newcommand{\Vcr}[4]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \end{bmatrix}}$ $\newcommand{\Vct}[5]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \br #5 \end{bmatrix}}$ $\newcommand{\Vcy}[6]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \br #5 \br #6 \end{bmatrix}}$ $\newcommand{\Vcu}[7]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \br #5 \br #6 \br #7 \end{bmatrix}}$ $\newcommand{\vcw}[2]{\begin{matrix} #1 \br #2 \end{matrix}}$ $\newcommand{\vce}[3]{\begin{matrix} #1 \br #2 \br #3 \end{matrix}}$ $\newcommand{\vcr}[4]{\begin{matrix} #1 \br #2 \br #3 \br #4 \end{matrix}}$ $\newcommand{\vct}[5]{\begin{matrix} #1 \br #2 \br #3 \br #4 \br #5 \end{matrix}}$ $\newcommand{\vcy}[6]{\begin{matrix} #1 \br #2 \br #3 \br #4 \br #5 \br #6 \end{matrix}}$ $\newcommand{\vcu}[7]{\begin{matrix} #1 \br #2 \br #3 \br #4 \br #5 \br #6 \br #7 \end{matrix}}$ $\newcommand{\Mqw}[2]{\begin{bmatrix} #1 & #2 \end{bmatrix}}$ $\newcommand{\Mqe}[3]{\begin{bmatrix} #1 & #2 & #3 \end{bmatrix}}$ $\newcommand{\Mqr}[4]{\begin{bmatrix} #1 & #2 & #3 & #4 \end{bmatrix}}$ $\newcommand{\Mqt}[5]{\begin{bmatrix} #1 & #2 & #3 & #4 & #5 \end{bmatrix}}$ $\newcommand{\Mwq}[2]{\begin{bmatrix} #1 \br #2 \end{bmatrix}}$ $\newcommand{\Meq}[3]{\begin{bmatrix} #1 \br #2 \br #3 \end{bmatrix}}$ $\newcommand{\Mrq}[4]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \end{bmatrix}}$ $\newcommand{\Mtq}[5]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \br #5 \end{bmatrix}}$ $\newcommand{\Mqw}[2]{\begin{bmatrix} #1 & #2 \end{bmatrix}}$ $\newcommand{\Mwq}[2]{\begin{bmatrix} #1 \br #2 \end{bmatrix}}$ $\newcommand{\Mww}[4]{\begin{bmatrix} #1 & #2 \br #3 & #4 \end{bmatrix}}$ $\newcommand{\Mqe}[3]{\begin{bmatrix} #1 & #2 & #3 \end{bmatrix}}$ $\newcommand{\Meq}[3]{\begin{bmatrix} #1 \br #2 \br #3 \end{bmatrix}}$ $\newcommand{\Mwe}[6]{\begin{bmatrix} #1 & #2 & #3\br #4 & #5 & #6 \end{bmatrix}}$ $\newcommand{\Mew}[6]{\begin{bmatrix} #1 & #2 \br #3 & #4 \br #5 & #6 \end{bmatrix}}$ $\newcommand{\Mee}[9]{\begin{bmatrix} #1 & #2 & #3 \br #4 & #5 & #6 \br #7 & #8 & #9 \end{bmatrix}}$

We derive some important inequalities in this section, which use the mean and sometimes the variance of a random variable to draw conclusions on the probabilities of certain events.

They are primarily useful in situations where exact values or bounds for the mean and variance of a random variable $X $ are easily computable, but the distribution of $X $ is either unavailable or hard to calculate. Essentially, they provide us with a range approximation of the probability.

Definition: Markov's Inequality

Markov inequality asserts that if a nonnegative random variable has a small mean, then the probability that it takes a large value must also be small.

If a random variable $X $ can only take nonnegative values, then

$$P(X \geq a) \leq \frac{ E [X] }{ a }, \forall a > 0$$

Proof
  </span>
</span>
<span class="proof__expand"><a>[expand]</a></span>

Fix a positive number $a$ and consider the random variable $Y_a$ defined by

$$Y_a = \begin{cases} 0 &,\text{ if }X < a \br a &,\text{ if }X \geq a \end{cases} $$

It is seen that the relation

$$Y_a \leq X $$

always holds and therefore,

$$E[Y_a] \leq E [ X ]$$.

Notice the fact that by definition

$P(Y_a = 0) = P(X < a)$

$P(Y_a = a) = P(X \geq a)$

Therefore,

$$E [ Y_a ] = 0P(Y_a = 0) + aP(Y_a = a) = aP(Y_a = a) = a P(X \geq a)$$

from which we obtain

$$aP(X \geq a) \leq E[X] $$

Corollary

Let $X$ be a random variable. Then $\forall a > 0$

$$\begin{align*} P(\abs{ X } \geq a) &\leq \frac{ E [ \abs{ X } ] }{ a } \br P(\abs{ X } \geq a) &\leq \frac{ E [ \abs{ X }^n ] }{ a^n } \end{align*}$$

Proof
  </span>
</span>
<span class="proof__expand"><a>[expand]</a></span>
  1. follows directly from Markov’s inequality on the random variable $\abs{X}$.

  2. Let $X = \abs{ X }^n, a = a^n $. Markov implies that

$$P(\abs{ X } \geq a) = P(\abs{ X }^n \geq a^n) \leq \frac{ E [ \abs{ X }^n ] }{ a^n }$$

Definition: Chebyshev Inequality

If $X$ is a random variable with mean $\mu$ and variance $\sigma^2$, then

$$\forall c > 0, P(\abs{ X - \mu } \geq c) \leq \frac{ \sigma^2 }{ c^2 }$$

Proof
  </span>
</span>
<span class="proof__expand"><a>[expand]</a></span>

Set $Y = X - \mu$, let $n = 2$, using Markov inequality corollary 2.

$$P(\abs{ X - E [ X ] } \geq c) \leq \frac{E[(X-E[X])^2]}{c^2}$$

Corollary

If $X$ is a random variable with mean $\mu$ and variance $\sigma^2$, then

$$\forall s>0, P(\abs{X- \mu} \geq s \sigma) \leq \frac{1}{s^2}$$

Proof
  </span>
</span>
<span class="proof__expand"><a>[expand]</a></span>

set $c = s \sigma $ in Chebyshev inequality.

Note

The Chebyshev inequality tends to be more powerful than the Markov inequality, since it provides more accurate bounds using the information on the variance of $X$.

Still, the mean and the variance of a random variable are only a rough summary of its properties, and we cannot expect the bounds to be closed aproximations of the exact probabilities.

Example: Uninformative Chebyshev Bounds

Let $X$ be uniformly distributed in $[0, 4] $. Use Chebyshev inequality to bound the probability that $\abs{ X - 2 } \geq 1 $. We have

$$P(\abs{ X - 2 } \geq 1) \leq \frac{ 4 }{ 3 } $$

This example shows that Chebyshev inequality sometimes provides bounds that are so loose that it provides no information at all.

Example

Let $X $ be exponentially distributed with parameter $\lambda = 1 $, so that $E[X] = \var{ X } = 1 $. For $c > 1 $, using the Chebyshev inequality, we obtain

$$P(X \geq c) = P(X - 1 \geq c - 1) \leq P(\abs{ X - 1 } \geq c - 1) \leq \frac{ 1 }{(c - 1)^2 }$$

Comparing to the exact answer $P(X \geq c) = e^{-c} $, this is a very conservative estimation.

Corollary : Upper Bounds in the Chebyshev Inequality

When $X $ is known to take values in a range $[a, b] $ we claim that

$$\sigma^2 \leq (b-a)^2 / 4$$

Thus if $\sigma^2$ is unknown, we may use the bound $(b -a)^2 /4 $ in place of $\sigma^2 $ in the Chebyshev inequality, and obtain

$$P(\abs{ X - \mu } \geq c) \leq \frac{(b - a)^2 }{ 4 c^2 }, \forall c > 0 $$

Proof
  </span>
</span>
<span class="proof__expand"><a>[expand]</a></span>

Note that for any constant $\gamma $ we have

$$E[(X - \gamma)^2] = E[X^2] -2 E[X] \gamma + \gamma^2 $$

The quadratic is minimized when $\gamma = E[X]$. It follows that

$$\sigma^2 = E[(X - E[X])^2] \leq E[(X - \gamma)^2], \forall \gamma$$

By letting $\gamma = (a + b) / 2 $, we obtain

$$\begin{align*} \sigma^2 &\leq E[(X - \frac{ a+ b }{ 2 })^2] \br &= E[X^2 - (a+b)X + \frac{(a+b)^2 }{ 4 }] \br &= E[X^2 - (a+b)X + ab] + \frac{(b-a)^2 }{ 4 } \br &= E[(X -a)(X -b)] + \frac{(b-a)^2 }{ 4 } \br &\leq \frac{(b-a)^2 }{ 4 } \end{align*}$$

Since $x \in [a, b], E[(X -a)(X - b)] \leq 0 $.

The bound $\sigma^2 \leq (b -a)^2 /4 $ may be quite conservative, but in the absence of further information about $X$, it cannot be improved.

When $X $ is the random variable that takes the $a $ and $b $ with equal probability $\frac{ 1 }{ 2 } $, $\sigma^2 = \frac{(b -a)^2 }{ 4 } $.

Definition: The Chernoff Bound

Let $X $ be a random variable. Then, $\forall s > 0, \forall a \in R $.

$$P(X \geq a) \leq e^{ - sa } M_X(s)$$

Proof
  </span>
</span>
<span class="proof__expand"><a>[expand]</a></span>

Substitute $X = e^{sX} $ and $a = e^{sa} $ in the Markov inequality

$$P(X \geq a) = P(e^{sx} \geq e^{sa}) \leq \frac{ E [ e^{sX} ] }{ e^{sa}} = \frac{ M_X(s)}{ e^{sa}} $$