Probabilistic Models - Jim Zenn

$\newcommand{\br}{\\}$ $\newcommand{\R}{\mathbb{R}}$ $\newcommand{\Q}{\mathbb{Q}}$ $\newcommand{\Z}{\mathbb{Z}}$ $\newcommand{\N}{\mathbb{N}}$ $\newcommand{\C}{\mathbb{C}}$ $\newcommand{\P}{\mathbb{P}}$ $\newcommand{\F}{\mathbb{F}}$ $\newcommand{\L}{\mathcal{L}}$ $\newcommand{\spa}[1]{\text{span}(#1)}$ $\newcommand{\dist}[1]{\text{dist}(#1)}$ $\newcommand{\max}[1]{\text{max}(#1)}$ $\newcommand{\min}[1]{\text{min}(#1)}$ $\newcommand{\supr}[1]{\text{sup}(#1)}$ $\newcommand{\infi}[1]{\text{inf}(#1)}$ $\newcommand{\argmax}[1]{\underset{#1}{\text{argmax }}}$ $\newcommand{\argmin}[1]{\underset{#1}{\text{argmin }}}$ $\newcommand{\set}[1]{\left\{#1\right\}}$ $\newcommand{\emptyset}{\varnothing}$ $\newcommand{\otherwise}{\text{ otherwise }}$ $\newcommand{\if}{\text{ if }}$ $\newcommand{\proj}{\text{proj}}$ $\newcommand{\union}{\cup}$ $\newcommand{\intercept}{\cap}$ $\newcommand{\abs}[1]{\left| #1 \right|}$ $\newcommand{\norm}[1]{\left\lVert#1\right\rVert}$ $\newcommand{\pare}[1]{\left(#1\right)}$ $\newcommand{\t}[1]{\text{ #1 }}$ $\newcommand{\head}{\text H}$ $\newcommand{\tail}{\text T}$ $\newcommand{\d}{\text d}$ $\newcommand{\limu}[2]{\underset{#1 \to #2}\lim}$ $\newcommand{\der}[2]{\frac{\d #1}{\d #2}}$ $\newcommand{\derw}[2]{\frac{\d #1^2}{\d^2 #2}}$ $\newcommand{\pder}[2]{\frac{\partial #1}{\partial #2}}$ $\newcommand{\pderw}[2]{\frac{\partial^2 #1}{\partial #2^2}}$ $\newcommand{\pderws}[3]{\frac{\partial^2 #1}{\partial #2 \partial #3}}$ $\newcommand{\inv}[1]{{#1}^{-1}}$ $\newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $\newcommand{\nullity}[1]{\text{nullity}(#1)}$ $\newcommand{\rank}[1]{\text{rank }#1}$ $\newcommand{\nullspace}[1]{\mathcal{N}\pare{#1}}$ $\newcommand{\range}[1]{\mathcal{R}\pare{#1}}$ $\newcommand{\var}[1]{\text{var}(#1)}$ $\newcommand{\tr}[1]{\text{tr}(#1)}$ $\newcommand{\oto}{\text{ one-to-one }}$ $\newcommand{\ot}{\text{ onto }}$ $\newcommand{\ceil}[1]{\lceil#1\rceil}$ $\newcommand{\floor}[1]{\lfloor#1\rfloor}$ $\newcommand{\Re}[1]{\text{Re}(#1)}$ $\newcommand{\Im}[1]{\text{Im}(#1)}$ $\newcommand{\dom}[1]{\text{dom}(#1)}$ $\newcommand{\fnext}[1]{\overset{\sim}{#1}}$ $\newcommand{\transpose}[1]{{#1}^{\text{T}}}$ $\newcommand{\b}[1]{\boldsymbol{#1}}$ $\newcommand{\None}[1]{}$ $\newcommand{\Vcw}[2]{\begin{bmatrix} #1 \br #2 \end{bmatrix}}$ $\newcommand{\Vce}[3]{\begin{bmatrix} #1 \br #2 \br #3 \end{bmatrix}}$ $\newcommand{\Vcr}[4]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \end{bmatrix}}$ $\newcommand{\Vct}[5]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \br #5 \end{bmatrix}}$ $\newcommand{\Vcy}[6]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \br #5 \br #6 \end{bmatrix}}$ $\newcommand{\Vcu}[7]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \br #5 \br #6 \br #7 \end{bmatrix}}$ $\newcommand{\vcw}[2]{\begin{matrix} #1 \br #2 \end{matrix}}$ $\newcommand{\vce}[3]{\begin{matrix} #1 \br #2 \br #3 \end{matrix}}$ $\newcommand{\vcr}[4]{\begin{matrix} #1 \br #2 \br #3 \br #4 \end{matrix}}$ $\newcommand{\vct}[5]{\begin{matrix} #1 \br #2 \br #3 \br #4 \br #5 \end{matrix}}$ $\newcommand{\vcy}[6]{\begin{matrix} #1 \br #2 \br #3 \br #4 \br #5 \br #6 \end{matrix}}$ $\newcommand{\vcu}[7]{\begin{matrix} #1 \br #2 \br #3 \br #4 \br #5 \br #6 \br #7 \end{matrix}}$ $\newcommand{\Mqw}[2]{\begin{bmatrix} #1 & #2 \end{bmatrix}}$ $\newcommand{\Mqe}[3]{\begin{bmatrix} #1 & #2 & #3 \end{bmatrix}}$ $\newcommand{\Mqr}[4]{\begin{bmatrix} #1 & #2 & #3 & #4 \end{bmatrix}}$ $\newcommand{\Mqt}[5]{\begin{bmatrix} #1 & #2 & #3 & #4 & #5 \end{bmatrix}}$ $\newcommand{\Mwq}[2]{\begin{bmatrix} #1 \br #2 \end{bmatrix}}$ $\newcommand{\Meq}[3]{\begin{bmatrix} #1 \br #2 \br #3 \end{bmatrix}}$ $\newcommand{\Mrq}[4]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \end{bmatrix}}$ $\newcommand{\Mtq}[5]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \br #5 \end{bmatrix}}$ $\newcommand{\Mqw}[2]{\begin{bmatrix} #1 & #2 \end{bmatrix}}$ $\newcommand{\Mwq}[2]{\begin{bmatrix} #1 \br #2 \end{bmatrix}}$ $\newcommand{\Mww}[4]{\begin{bmatrix} #1 & #2 \br #3 & #4 \end{bmatrix}}$ $\newcommand{\Mqe}[3]{\begin{bmatrix} #1 & #2 & #3 \end{bmatrix}}$ $\newcommand{\Meq}[3]{\begin{bmatrix} #1 \br #2 \br #3 \end{bmatrix}}$ $\newcommand{\Mwe}[6]{\begin{bmatrix} #1 & #2 & #3\br #4 & #5 & #6 \end{bmatrix}}$ $\newcommand{\Mew}[6]{\begin{bmatrix} #1 & #2 \br #3 & #4 \br #5 & #6 \end{bmatrix}}$ $\newcommand{\Mee}[9]{\begin{bmatrix} #1 & #2 & #3 \br #4 & #5 & #6 \br #7 & #8 & #9 \end{bmatrix}}$

Definition: Probabilistic Model

A probabilistic model is a mathematical description of an uncertain situation.

A probabilistic model have two main ingredients: a sample space and a probability law.

Definition: Experiment and Outcomes

Every probabilistic model involves an underlying process, called the experiments that will produce exactly one out of several possible outcomes.

Definition: Sample Space

A sample space $\Omega$ is the set of all possible outcomes of an experiment.

A collection of possible outcomes is called an event, which is a subset of the sample space.

Definition: Probability Law

The probability law assigns a set $A$ of possible outcomes (also called an event) a nonnegative number $\b{P}(A)$(called the probability of $A$).

Axioms: Probability Axioms

Any probability law must satisfies the following axioms:

(Nonnegativity) $\b{P}(A)\geq 0$, for every event $A$.
(Additivity) if $A_1,A_2,…$ is a sequence of disjoint events, then the probability of their union satisfies $\b{P}(A_1\cup A_2 \cup …)=\b{P}(A_1) + \b{P}(A_2)+…$.
(Normalization) $\b{P}(\Omega)=1$.

Definition: Collectively Exhaustive

To represent the experiment well, $\Omega $ must be collectively exhaustive, In other words, no matter what happens, all outcomes are in $\Omega$, and $\Omega$ consists only of outcomes.

Definition: Probability Law

probability law $P$ assigns to each event $A$ a number $\b{P}(A)$, called the probability of $A$.

Definition: Disjoint

Two sets $A,B$ are disjoint if $A\cap B=\emptyset$.

Note

We denote the number of elements in a set $A$ as $\abs{A}$.

Properties: Discrete Probability Law

If a sample space consists of a finite or even just countable number of outcomes, then the probability law is specified by the probabilities of the events that consist of a single element.

In particular, the probability of any event $\set{s_1,s_2,…,s_n}$ is the sum of the probabilities of its elements:

$$\b{P}(\set{x_1, x_2, …, x_n})=\b{P}(\set{x_1})+\b{P}(\set{x_2})+…+\b{P}(\set{x_n})$$

Properties: Discrete Uniform Probability Law

If a sample space consists of $n$ possible outcomes which are equally likely, then $\forall$ event $A$, $\b{P}(A) = \abs A \cdot \frac{1}{n}=\frac{\abs{A}}{\abs \Omega}$.

Properties: Properties of Probability Laws

Consider a probability law, and let $A, B$, and $C$ be events.

if $A \subset B$, then $\b{P}(A) \leq \b{P}(B)$.
$\b{P}(\emptyset)=0$
$\b{P}(A^c) = 1 - \b{P}(A)$
$\b{P}(A\cup B) = \b{P}(A) + \b{P}(B) - \b{P}(A\cap B)$
$A_1 \subset A_2 \subset … \subset \Omega \implies \b{P}(\cup ^ \infty _ {i = 1} A_i) = \limu{ i }{ \infty } \b{P}(A_i)$
$\Omega \supset A_1 \supset A_2 \supset … \implies \b{P}(\cap ^ \infty _ {i = 1} A_i) = \limu{ i }{ \infty } \b{P}(A_i)$

Proof

  </span>
</span>
<span class="proof__expand"><a>[expand]</a></span>

Proof

  </span>
</span>
<span class="proof__expand"><a>[expand]</a></span>