$\newcommand{\br}{\\}$ $\newcommand{\R}{\mathbb{R}}$ $\newcommand{\Q}{\mathbb{Q}}$ $\newcommand{\Z}{\mathbb{Z}}$ $\newcommand{\N}{\mathbb{N}}$ $\newcommand{\C}{\mathbb{C}}$ $\newcommand{\P}{\mathbb{P}}$ $\newcommand{\F}{\mathbb{F}}$ $\newcommand{\L}{\mathcal{L}}$ $\newcommand{\spa}[1]{\text{span}(#1)}$ $\newcommand{\dist}[1]{\text{dist}(#1)}$ $\newcommand{\max}[1]{\text{max}(#1)}$ $\newcommand{\min}[1]{\text{min}(#1)}$ $\newcommand{\supr}[1]{\text{sup}(#1)}$ $\newcommand{\infi}[1]{\text{inf}(#1)}$ $\newcommand{\ite}[1]{\text{int}(#1)}$ $\newcommand{\ext}[1]{\text{ext}(#1)}$ $\newcommand{\bdry}[1]{\partial #1}$ $\newcommand{\argmax}[1]{\underset{#1}{\text{argmax }}}$ $\newcommand{\argmin}[1]{\underset{#1}{\text{argmin }}}$ $\newcommand{\set}[1]{\left\{#1\right\}}$ $\newcommand{\emptyset}{\varnothing}$ $\newcommand{\otherwise}{\text{ otherwise }}$ $\newcommand{\if}{\text{ if }}$ $\newcommand{\proj}{\text{proj}}$ $\newcommand{\union}{\cup}$ $\newcommand{\intercept}{\cap}$ $\newcommand{\abs}[1]{\left| #1 \right|}$ $\newcommand{\norm}[1]{\left\lVert#1\right\rVert}$ $\newcommand{\pare}[1]{\left(#1\right)}$ $\newcommand{\brac}[1]{\left[#1\right]}$ $\newcommand{\t}[1]{\text{ #1 }}$ $\newcommand{\head}{\text H}$ $\newcommand{\tail}{\text T}$ $\newcommand{\d}{\text d}$ $\newcommand{\limu}[2]{\underset{#1 \to #2}\lim}$ $\newcommand{\der}[2]{\frac{\d #1}{\d #2}}$ $\newcommand{\derw}[2]{\frac{\d #1^2}{\d^2 #2}}$ $\newcommand{\pder}[2]{\frac{\partial #1}{\partial #2}}$ $\newcommand{\pderw}[2]{\frac{\partial^2 #1}{\partial #2^2}}$ $\newcommand{\pderws}[3]{\frac{\partial^2 #1}{\partial #2 \partial #3}}$ $\newcommand{\inv}[1]{{#1}^{-1}}$ $\newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $\newcommand{\nullity}[1]{\text{nullity}(#1)}$ $\newcommand{\rank}[1]{\text{rank }#1}$ $\newcommand{\nullspace}[1]{\mathcal{N}\pare{#1}}$ $\newcommand{\range}[1]{\mathcal{R}\pare{#1}}$ $\newcommand{\var}[1]{\text{var}(#1)}$ $\newcommand{\cov}[2]{\text{cov}(#1, #2)}$ $\newcommand{\tr}[1]{\text{tr}(#1)}$ $\newcommand{\oto}{\text{ one-to-one }}$ $\newcommand{\ot}{\text{ onto }}$ $\newcommand{\ceil}[1]{\lceil#1\rceil}$ $\newcommand{\floor}[1]{\lfloor#1\rfloor}$ $\newcommand{\Re}[1]{\text{Re}(#1)}$ $\newcommand{\Im}[1]{\text{Im}(#1)}$ $\newcommand{\dom}[1]{\text{dom}(#1)}$ $\newcommand{\fnext}[1]{\overset{\sim}{#1}}$ $\newcommand{\transpose}[1]{{#1}^{\text{T}}}$ $\newcommand{\b}[1]{\boldsymbol{#1}}$ $\newcommand{\None}[1]{}$ $\newcommand{\Vcw}[2]{\begin{bmatrix} #1 \br #2 \end{bmatrix}}$ $\newcommand{\Vce}[3]{\begin{bmatrix} #1 \br #2 \br #3 \end{bmatrix}}$ $\newcommand{\Vcr}[4]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \end{bmatrix}}$ $\newcommand{\Vct}[5]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \br #5 \end{bmatrix}}$ $\newcommand{\Vcy}[6]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \br #5 \br #6 \end{bmatrix}}$ $\newcommand{\Vcu}[7]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \br #5 \br #6 \br #7 \end{bmatrix}}$ $\newcommand{\vcw}[2]{\begin{matrix} #1 \br #2 \end{matrix}}$ $\newcommand{\vce}[3]{\begin{matrix} #1 \br #2 \br #3 \end{matrix}}$ $\newcommand{\vcr}[4]{\begin{matrix} #1 \br #2 \br #3 \br #4 \end{matrix}}$ $\newcommand{\vct}[5]{\begin{matrix} #1 \br #2 \br #3 \br #4 \br #5 \end{matrix}}$ $\newcommand{\vcy}[6]{\begin{matrix} #1 \br #2 \br #3 \br #4 \br #5 \br #6 \end{matrix}}$ $\newcommand{\vcu}[7]{\begin{matrix} #1 \br #2 \br #3 \br #4 \br #5 \br #6 \br #7 \end{matrix}}$ $\newcommand{\Mqw}[2]{\begin{bmatrix} #1 & #2 \end{bmatrix}}$ $\newcommand{\Mqe}[3]{\begin{bmatrix} #1 & #2 & #3 \end{bmatrix}}$ $\newcommand{\Mqr}[4]{\begin{bmatrix} #1 & #2 & #3 & #4 \end{bmatrix}}$ $\newcommand{\Mqt}[5]{\begin{bmatrix} #1 & #2 & #3 & #4 & #5 \end{bmatrix}}$ $\newcommand{\Mwq}[2]{\begin{bmatrix} #1 \br #2 \end{bmatrix}}$ $\newcommand{\Meq}[3]{\begin{bmatrix} #1 \br #2 \br #3 \end{bmatrix}}$ $\newcommand{\Mrq}[4]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \end{bmatrix}}$ $\newcommand{\Mtq}[5]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \br #5 \end{bmatrix}}$ $\newcommand{\Mqw}[2]{\begin{bmatrix} #1 & #2 \end{bmatrix}}$ $\newcommand{\Mwq}[2]{\begin{bmatrix} #1 \br #2 \end{bmatrix}}$ $\newcommand{\Mww}[4]{\begin{bmatrix} #1 & #2 \br #3 & #4 \end{bmatrix}}$ $\newcommand{\Mqe}[3]{\begin{bmatrix} #1 & #2 & #3 \end{bmatrix}}$ $\newcommand{\Meq}[3]{\begin{bmatrix} #1 \br #2 \br #3 \end{bmatrix}}$ $\newcommand{\Mwe}[6]{\begin{bmatrix} #1 & #2 & #3\br #4 & #5 & #6 \end{bmatrix}}$ $\newcommand{\Mew}[6]{\begin{bmatrix} #1 & #2 \br #3 & #4 \br #5 & #6 \end{bmatrix}}$ $\newcommand{\Mee}[9]{\begin{bmatrix} #1 & #2 & #3 \br #4 & #5 & #6 \br #7 & #8 & #9 \end{bmatrix}}$
Definition: The Expected Value of $X$ Given $Y=y$

Let $X$, $Y$ be two random variables, $E[ X | Y = y]$ is a function of $y$, $E[ X|Y ] $ is a function of $Y$. Its distribution is determined by the distribution of $Y$.

Theorem : Law of Iterated Expectation

This is essentially a reformulation of the total expectation theorem.

Since $E[ X | Y ]$ is a random variable, it has an expectation $E[ E[ X|Y ]] $ of its own, which can be calculated using the expected value rule:

$$E[ E[ X | Y ]] = \begin{cases} \sum_{ y } E[ X | Y = y ] p_Y(y), & Y\text{ discrete} \br \int_{ -\infty }^{ +\infty } E[ X | Y = y ] f_Y(y) \d y, &Y\text{ continuous} \end{cases}$$

which leads us to the law of iterated expectations

$$E[ E[ X | Y ]] = E[ X ] $$

Lemma

Let $X$ and $Y$ be two random variables. For any function $g$, we have

$$E[ X g(Y) | Y ] = g(Y) E[ X | Y ]$$

Definition: Estimator and Estimation Error

Let $X$, $Y$ be two random variables.

$$\hat X = E[X|Y] $$

is called an estimator of $X$ given $Y$. As noted before, the estimator is a random variable function over $Y$.

The estimation error is a random variable defined as

$$\tilde X = \hat X - X $$

Properties: Properties of the Estimation Error

$\tilde X$ is a random variable satisfying

$$E[ \tilde X | Y ] = 0$$

That is,

$$\forall y, E[ \tilde X | Y = y ] = 0 $$

Proof
  </span>
</span>
<span class="proof__expand"><a>[expand]</a></span>

$$E[ \tilde X | Y ] = E[ \hat X - X | Y ] = E[ \hat X | Y ] - E[ X | Y ] = \hat X - \hat X = 0$$

Using the law of iterated expectations, we also have

$$E[ \tilde X ] = E[ E[ \tilde X | Y ]] = 0 $$

The estimation error is uncorrelated with the estimation error $\hat X$.

$$E[ \hat X \tilde X ] = 0 $$

$$\cov{ \hat X} { \tilde X } = 0$$

It follows that

$$\var{ X } = \var{ \tilde X } + \var{ \hat X } $$

Proof
  </span>
</span>
<span class="proof__expand"><a>[expand]</a></span>

Using the law of iterated expectations, we have

$$E[ \hat X \tilde X ] = E[ E[ \hat X \tilde X | Y ]] = E[ \hat X E[ \tilde X | Y ]] = 0$$

It follows that

$$\cov{ \hat X} { \tilde X } = E [ \hat X \tilde X ] - E [ \hat X ] E [ \tilde X ] = 0 - E [ X ] \cdot 0 = 0$$

and $\hat X $ and $\tilde X $ are uncorrelated.

It follows that

Definition: The Conditional Variance

We introduce the random variable

$$\var{X|Y} = E [ (X - E [ X|Y ])^2 | Y ] = E [ \tilde X^2 | Y ] $$

This is a function of $Y$ whose value is the conditional variance of $X$ when $Y$ takes the value $y$:

$$\var{ X|Y = y } = E [ \tilde X^2 | Y = y ] $$

Using the fact $E [ \tilde X ] = 0 $ and the law of iterated expectations, we can write the variance of estimation error as

$$\var{ \tilde X } = E [ \tilde X ^2 ] = E [ E [ \tilde X^2 | Y ] ] = E [ \var{ X|Y } ] $$

and rewrite the equation $\var{ X } = \var{ \tilde X } + \var{ \hat X } $ as follows.

Theorem : Law of Total Variance

Let $X$ and $Y$ be two random variables, we have

$$\var{X} = E [ \var{X|Y} ] + \var{ E [X|Y] } $$