$\newcommand{\br}{\\}$ $\newcommand{\R}{\mathbb{R}}$ $\newcommand{\Q}{\mathbb{Q}}$ $\newcommand{\Z}{\mathbb{Z}}$ $\newcommand{\N}{\mathbb{N}}$ $\newcommand{\C}{\mathbb{C}}$ $\newcommand{\P}{\mathbb{P}}$ $\newcommand{\F}{\mathbb{F}}$ $\newcommand{\L}{\mathcal{L}}$ $\newcommand{\spa}[1]{\text{span}(#1)}$ $\newcommand{\dist}[1]{\text{dist}(#1)}$ $\newcommand{\max}[1]{\text{max}(#1)}$ $\newcommand{\min}[1]{\text{min}(#1)}$ $\newcommand{\supr}{\text{sup}}$ $\newcommand{\infi}{\text{inf}}$ $\newcommand{\ite}[1]{\text{int}(#1)}$ $\newcommand{\ext}[1]{\text{ext}(#1)}$ $\newcommand{\bdry}[1]{\partial #1}$ $\newcommand{\argmax}[1]{\underset{#1}{\text{argmax }}}$ $\newcommand{\argmin}[1]{\underset{#1}{\text{argmin }}}$ $\newcommand{\set}[1]{\left\{#1\right\}}$ $\newcommand{\emptyset}{\varnothing}$ $\newcommand{\tilde}{\text{~}}$ $\newcommand{\otherwise}{\text{ otherwise }}$ $\newcommand{\if}{\text{ if }}$ $\newcommand{\proj}{\text{proj}}$ $\newcommand{\union}{\cup}$ $\newcommand{\intercept}{\cap}$ $\newcommand{\abs}[1]{\left| #1 \right|}$ $\newcommand{\norm}[1]{\left\lVert#1\right\rVert}$ $\newcommand{\pare}[1]{\left(#1\right)}$ $\newcommand{\brac}[1]{\left[#1\right]}$ $\newcommand{\t}[1]{\text{ #1 }}$ $\newcommand{\head}{\text H}$ $\newcommand{\tail}{\text T}$ $\newcommand{\d}{\text d}$ $\newcommand{\limu}[2]{\underset{#1 \to #2}\lim}$ $\newcommand{\limsupu}[2]{\underset{#1 \to #2}{\lim\supr}}$ $\newcommand{\liminfu}[2]{\underset{#1 \to #2}{\lim\infi}}$ $\newcommand{\limd}[3]{\underset{#1 \to #2; #3}\lim}$ $\newcommand{\der}[2]{\frac{\d #1}{\d #2}}$ $\newcommand{\derw}[2]{\frac{\d #1^2}{\d^2 #2}}$ $\newcommand{\pder}[2]{\frac{\partial #1}{\partial #2}}$ $\newcommand{\pderw}[2]{\frac{\partial^2 #1}{\partial #2^2}}$ $\newcommand{\pderws}[3]{\frac{\partial^2 #1}{\partial #2 \partial #3}}$ $\newcommand{\inv}[1]{{#1}^{-1}}$ $\newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $\newcommand{\nullity}[1]{\text{nullity}(#1)}$ $\newcommand{\rank}[1]{\text{rank }#1}$ $\newcommand{\nullspace}[1]{\mathcal{N}\pare{#1}}$ $\newcommand{\range}[1]{\mathcal{R}\pare{#1}}$ $\newcommand{\var}[1]{\text{var}\pare{#1}}$ $\newcommand{\cov}[2]{\text{cov}(#1, #2)}$ $\newcommand{\tr}[1]{\text{tr}(#1)}$ $\newcommand{\oto}{\text{ one-to-one }}$ $\newcommand{\ot}{\text{ onto }}$ $\newcommand{\ipto}{\overset{\text{i.p.}}\longrightarrow}$ $\newcommand{\asto}{\overset{\text{a.s.}}\longrightarrow}$ $\newcommand{\expdist}[1]{\text{ ~ Exp}(#1)}$ $\newcommand{\binomdist}[2]{\text{ ~ B}(#1, #2)}$ $\newcommand{\unifdist}[1]{\text{ ~ Unif}(#1)}$ $\newcommand{\normdist}[2]{\text{ ~ N}(#1,#2)}$ $\newcommand{\berndist}[2]{\text{ ~ Bernoulli}(#1,#2)}$ $\newcommand{\geodist}[1]{\text{ ~ Geometric}(#1)}$ $\newcommand{\poissondist}[1]{\text{ ~ Poisson}(#1)}$ $\newcommand{\ceil}[1]{\lceil#1\rceil}$ $\newcommand{\floor}[1]{\lfloor#1\rfloor}$ $\newcommand{\Re}[1]{\text{Re}(#1)}$ $\newcommand{\Im}[1]{\text{Im}(#1)}$ $\newcommand{\dom}[1]{\text{dom}(#1)}$ $\newcommand{\fnext}[1]{\overset{\sim}{#1}}$ $\newcommand{\transpose}[1]{{#1}^{\text{T}}}$ $\newcommand{\b}[1]{\boldsymbol{#1}}$ $\newcommand{\None}[1]{}$ $\newcommand{\Vcw}[2]{\begin{bmatrix} #1 \br #2 \end{bmatrix}}$ $\newcommand{\Vce}[3]{\begin{bmatrix} #1 \br #2 \br #3 \end{bmatrix}}$ $\newcommand{\Vcr}[4]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \end{bmatrix}}$ $\newcommand{\Vct}[5]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \br #5 \end{bmatrix}}$ $\newcommand{\Vcy}[6]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \br #5 \br #6 \end{bmatrix}}$ $\newcommand{\Vcu}[7]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \br #5 \br #6 \br #7 \end{bmatrix}}$ $\newcommand{\vcw}[2]{\begin{matrix} #1 \br #2 \end{matrix}}$ $\newcommand{\vce}[3]{\begin{matrix} #1 \br #2 \br #3 \end{matrix}}$ $\newcommand{\vcr}[4]{\begin{matrix} #1 \br #2 \br #3 \br #4 \end{matrix}}$ $\newcommand{\vct}[5]{\begin{matrix} #1 \br #2 \br #3 \br #4 \br #5 \end{matrix}}$ $\newcommand{\vcy}[6]{\begin{matrix} #1 \br #2 \br #3 \br #4 \br #5 \br #6 \end{matrix}}$ $\newcommand{\vcu}[7]{\begin{matrix} #1 \br #2 \br #3 \br #4 \br #5 \br #6 \br #7 \end{matrix}}$ $\newcommand{\Mqw}[2]{\begin{bmatrix} #1 & #2 \end{bmatrix}}$ $\newcommand{\Mqe}[3]{\begin{bmatrix} #1 & #2 & #3 \end{bmatrix}}$ $\newcommand{\Mqr}[4]{\begin{bmatrix} #1 & #2 & #3 & #4 \end{bmatrix}}$ $\newcommand{\Mqt}[5]{\begin{bmatrix} #1 & #2 & #3 & #4 & #5 \end{bmatrix}}$ $\newcommand{\Mwq}[2]{\begin{bmatrix} #1 \br #2 \end{bmatrix}}$ $\newcommand{\Meq}[3]{\begin{bmatrix} #1 \br #2 \br #3 \end{bmatrix}}$ $\newcommand{\Mrq}[4]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \end{bmatrix}}$ $\newcommand{\Mtq}[5]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \br #5 \end{bmatrix}}$ $\newcommand{\Mqw}[2]{\begin{bmatrix} #1 & #2 \end{bmatrix}}$ $\newcommand{\Mwq}[2]{\begin{bmatrix} #1 \br #2 \end{bmatrix}}$ $\newcommand{\Mww}[4]{\begin{bmatrix} #1 & #2 \br #3 & #4 \end{bmatrix}}$ $\newcommand{\Mqe}[3]{\begin{bmatrix} #1 & #2 & #3 \end{bmatrix}}$ $\newcommand{\Meq}[3]{\begin{bmatrix} #1 \br #2 \br #3 \end{bmatrix}}$ $\newcommand{\Mwe}[6]{\begin{bmatrix} #1 & #2 & #3\br #4 & #5 & #6 \end{bmatrix}}$ $\newcommand{\Mew}[6]{\begin{bmatrix} #1 & #2 \br #3 & #4 \br #5 & #6 \end{bmatrix}}$ $\newcommand{\Mee}[9]{\begin{bmatrix} #1 & #2 & #3 \br #4 & #5 & #6 \br #7 & #8 & #9 \end{bmatrix}}$
Definition: Stochastic Process

A stochastic process is a mathematical model of a probabilistic experiment that evolves in time and generates a sequence of numerical values. Each numerical value in the sequence is modeled by a random variable, so a stochastic process is simply a (finite or infinite) sequence of random variables.

However, stochastic process involve some change in emphasis over our earlier models. In particular:

  1. more focus on the dependencies in the sequence of values generated by the process.
    e.g. how do future prices of a stock depend on past values?
  2. more interest in long-term averges involving the entire sequence of generated values.
    e.g. what is the fraction of time that a machine is idle?
  3. interest in characterizing the likelihood or frequency of certain boundary events.
    e.g. what is the probability that within a given hour all circuits of some telephone system become simultaneously busy?
Example: Stochastic processes
  1. the sequence of daily prices of a stock
  2. the sequence of scores in a football game
  3. the sequence of failure times of a machine
  4. the sequence of hourly traffic loads at a node of a communication network
Definition: Arrival-Type Processes

Arrival-Type Processes are processes where occurrences have the character of an “arrival”.

We will focus on models where the interarrival times (the times between successive arrivals) are independent random variables.

The case where arrivals occur in discrete time and the interarrival times are geometrically distributed is the Bernoulli process;

The case where arrivals occur in continuous time and the interarrival times are exponentially distributed is the Poisson process;

Poisson Bernoulli
Times of Arrival Continuous Discrete
PMF of # of Arrivals Poisson Binomial
Interarrival Time CDF Exponential Geometric
Arrival Rate $\lambda$ per unit time $p$ per trial
Definition: Bernoulli Process

A sequence of $X_1, X_2, …, X_n$ of independent Bernoulli random variables $X_i $ with

$$P(X_i = 1) = P(\text{success at the }i\text{th trial}) = p$$ $$P(X_i = 0) = P(\text{faiure at the }i\text{th trial}) = 1 - p$$

Recall: Binomial Distribution

The distribution of the number $S $ of successes in $n $ independent trials.

$$p_S(k) = \binom{ n }{ k } p^k (1- pk)^{ n - k }, k = 0, 1, …, n $$

$$E[S] = np, \var{ S } = np(1 - p)$$

Recall: Geometric Distribution

The distribution of the number $T $ of trial up to (and including) the first success.

$$p_T(t) = (1 - p)^{t -1} p, t = 1,2, … $$

$$E[T] = \frac{ 1 }{ p }, \var{ T } = \frac{ 1-p }{ p^2 } $$

Remarks: Independence and Memorylessness

The independence assumption underlying the Bernoulli process implies the memorylessness property, which, in a nutshell, means:

Whatever has happened in past trials provide no information on the outcomes of future trials.

Recall

Two random variables $U, V $.

$$U, V\text{ independent } \implies g(U), h(V)\text{ independent}$$

Example

Define random variable $Y = (X_1 + X_3) X_6 X_7, Z = (X_2 + X_4) (X_5 + X_8) $. Because $X_i $ are independent for all $i $. Since the two collection $\set{ X_1, X_3, X_6, X_7 } $ and $\set{ X_2, X_4, X_5, X_8 } $, $Y $ and $Z $ are also independent.

Properties: Fresh-Start Property of the Bernoulli Process

Suppose a Bernoulli process has been running for $n $ times steps, and that we have observed the value of $X_1, X_2, …, X_n $. Notice the fact that:

the sequence of future trials $X_{n+1}, X_{n+2}, … $ are independent Bernoulli trials and therefore form a Bernoulli process. In addition, these future trials are independent from the past ones.

We conclude that starting from any given point in time, the future is also modeled by a Bernoulli process, which is independent of the past. We refer to this loosely as the fresh-start property of the Bernoulli process.

More concisely,
for any given time $n $, the sequence of random variable $X_{n+1}, X_{n+2}, …$(the future of the process) is also a Bernoulli process, and is independent from $X_1, X_2, …, X_n $(the past of the process).

Properties: Memorylessness Property of Geometric Distributions

Recall that the time $T $ until the first success is a geometric random variable. Suppose that we have been watching the process for $n $ time steps, and no success has been recorded. What can we say about the number $T-n $ of remaining trials until the first success?

Since the future of the process (after time $n $) is independent of the past and constitutes a fresh-starting Bernoulli process, the number of future trials until the first success is described by the same geometric PMF.

$$P(T-n = t | T> n) = (1-p)^{t-1} p = P(T = t), t = 1, 2, … $$

We refer to this as the memorylessness property.

Note: this can also be derived algebraically, using the definition of conditional probabilities.

More concisely,
Let $n $ be a given time and let $\overline{ T } $ be the time of the first success after time $n $. Then $\overline{ T } - n $ has a geometric distribution with parameter $p $, and is independent of the random variables $X_1, X_2, …, X_n $.

Example

A computer executes two types of tasks, priority and non priority, and operates in discrete time units (slots). Each task requires one full slot to complete.
A priority task arrives with probability $p $ at the beginning of each slot, independent of other slots, a nonpriority task is always available and is executed at a given slot if no priority task is available.
We call a slot busy if within the slot, the computer executes a priority task, and otherwise we all it idle.
We call a string of idle slots, idle period; and a string of busy slots, busy period.
Want to know the PMF, mean, and variance of the following random variables.

  1. $T=$ the time index of the first idle slot
  2. $B=$ the length (number of slots) of the first busy period
  3. $I=$ the length of the first idle period
  4. $Z=$ the number of slots after the first slot of the first busy period up to and including the first subsequent idle slot

$T\geodist{1-p}$

Consider the first busy period. It starts with the first busy slot, call it slot $L$. $Z = B\geodist{1-p}$.

$I \geodist{p} $

Remarks

Start watching a Bernoulli process at a random time $N$, what we see is indistinguishable from a Bernoulli process that has just started, as long as $N$ is determined only by the past history of the process and does not convey any information on the future.

Example: Fresh-start at a Random Time

Let $N $ be the first time that we have a success immediately following a previous success. (That is, $N $ is the first $i $ for which $X_{i - 1} = X_i = 1 $). What is the probability $P(X_{N+1} = X_{N+2} = 0) $ that there are no successes in the two trials that follow?

Intuitively, once the condition $X_{N-1} = X_N = 1 $ is satisfied, and from then on, the future of the process consists of independent Bernoulli trials. Therefore, the probability of an event that refers to the future of the process is the same as in a fresh-starting Bernoulli process, so that $P(X_{N+1} = X_{N+2} = 0) = (1 -p)^2 $.

Remarks: Interarrival Times

An important random variable associated with the Beronoulli process is the time of the $k $th success (or arrival), which we denote by $Y_k $. A related random variable is the $k $th interarrival time, denoted by $T_k $. It is defined by

$$T_1 = Y_1, T_k = Y_k - Y_{k-1}, k = 2, 3, …$$

and represents the number of trials following the $(k-1) $st success until the next success. Note that

$$Y_k = T_1 + T_2 + … + T_k $$

Note that $\forall i, j, T_i, T_j $ independent.

Definition
  1. Start with a sequence of independent geometric random variables $T_1, T_2, …$ with common parameter $p $, and let these stand for the interval times.

  2. Record a success (or arrival) at times, $T_1, T_1 + t_2, T_1 + T_2 + T_3$, etc.

Example

It is observed that after a rainy day, the number of days until it rains again is geometrically distributed with parameter $p $, independent of the past. Find the probability that it rains on both the $5$th and the $8$th day of the month.

View rainy days as “arrivals”, then the description of the weather conforms to the alternative description of the Bernoulli process given above.

Any given day is rainy with probability $p$, independent of other days.

In particular, the probability that day $5 $ and $8 $ are rainy is equal to $p^2 $.

Definition: The Kth Arrival Time

$Y_k$ of the $k$th success (or arrival) is equal to the sum $Y_k = T_1 + T_2 + … + T_k$ of $k$ i.i.d. geometric random variables.

With that observation, we can then derive formulas for the mean, variance, and PMF of $Y_k $, which are given in the table that follow.

Properties: Properties of the Kth Arrival Time

$$E[Y_k] = E[T_1] + … + E[ T_k ] = \frac{ k }{ p }$$

$$\var{ Y_k } = \var{ T_1 } + … + \var{ T_k } = \frac{ k(1-p)}{ p^2 } $$

$$p_{Y_k}(t) = \binom{ t-1 }{ k-1 } p^k(1-p)^{t-k} , t = k, k+1, …$$

This is known as the Pascal PMF of order $k$.

Definition: Splitting of a Bernoulli Process

Start with a Bernoulli process in which there is a probability $p$ of an arrival at each time, consider splitting it as follows. Whenever there is an arrival. We choose to either keep it (with probability $q$), or to discard it (with probability $1-q$).

Then, the probability of a kept arrival is $pq$, and the probability of a discarded arrival is $p(1-q)$. The sequence of kept arrivals and discarded arrivals are both Bernoulli sequences.

Definition: Merging of Two Bernoulli Process

Start with two Bernoulli processes with parameter $p $ and $q $ respectively, consider merging as follows. An arrival is recorded in the merged process if and only if there is an arrival in at least one of the two original process. The probability $P = 1 - (1-p)(1-q) = p + q - pq$.

Since different time slots in either of the original processes are independent, different slots in the merged process are also independent. Thus the merged process is Bernoulli with success probability $p+q-pq $ at each time step.

Definition: The Poisson Approximation to the Binomial

The number of successes in $n$ independent Bernoulli trials is a binomial random variable with parameters $n$ and $p$, and its mean is $np$.

When $n$ is large but $p$ is small, the mean $np$ has a moderate value. This kind of situations can be addressed by keeping the product $np$ at a constant value $\lambda$. To do so we let $n$ grow while simultaneously let $p$ decrease.

In the limit, it turns out that the formula for the binomial PMF simplifies to Poisson PMF, that is

$\forall k > 0 $ $$\begin{align*} \limu{ n }{ \infty } p_S(k) &= \frac{ n! }{(n-k)!k! } \cdot p^k (1-p)^{ n-k } \br &= e^{- \lambda} \frac{ \lambda^k }{ k! } = p_Z(k) \end{align*}$$

where $S \binomdist{ n }{ p }, Z \poissondist{ \lambda }, p = \lambda / n$

In general, the Poisson PMF is a good approximation to the binomial as long as $\lambda=np $, $n $ is very large, $p $ is very small.

Proof
  </span>
</span>
<span class="proof__expand"><a>[expand]</a></span>

Let $S \binomdist{ n }{ p }, p = \lambda / n $, then

$$\begin{align*} p_S(k) &= \frac{ n! }{(n-k)!k! } \cdot p^k(1-p)^{n-k} \br &= \frac{n(n-1) …(n-k+1)}{k!}\cdot\frac{\lambda^k}{n^k}\pare{1-\frac{\lambda}{ n }}^{n-k} \br &= \frac{ n }{ n } \cdot \frac{ n-1 }{ n } … \frac{ n-k+1 }{ n } \cdot \frac{ \lambda^k }{ k! } \cdot \pare{ 1- \frac{ \lambda }{ n }}^{n-k} \end{align*}$$

$\limu{ n }{ \infty } \frac{ n - c }{ n } = 1$, given $c$ is constant.

Furthermore,

$$\pare{ 1- \frac{ \lambda }{ n }}^{-k} \to 1 $$

$$\pare{ 1 - \frac{ \lambda }{ n }}^n \to e^{- \lambda} $$

Then, fix $k > 0 $

$$\limu{ n }{ \infty } p_S(k) = e^{- \lambda} \frac{ \lambda^k }{ k! }$$

Note

As a rule of thumb, the Poisson/binomial approximation is valid to several decimal places if $n \geq 100, p \leq 0.01, \lambda = np $.