# Conditional Probability

Notes on the formal definition of conditional probability.

Valerio Gherardi https://vgherard.github.io
2023-11-03

Let $$(\Omega, \,\mathcal E,\,P)$$ be a probability space, and let $$X\colon \Omega \to \Omega _X$$ be a random variable with target space $$(\Omega _X, \mathcal X)$$. We denote the corresponding push-forward measure on $$\mathcal X$$ by $$X_*P$$, so that:

$(X_*P)(B)= P(X^{-1}(B))$ for all $$B\in \mathcal X$$. A measurable function $$f\colon \Omega _X \to \mathbb R$$ is integrable with respect to $$X_*P$$ if and only if $$f\circ X$$ is integrable with respect to $$P$$, in which case1: $\intop _\mathcal X f\,\text d(X_*P) = \intop _\mathcal \Omega (f \circ X)\,\text dP.$ Now, given an arbitrary event $$E\in \mathcal E$$ define $$(X_*P)_E(A)=P(E\cap X^{-1}(A))$$. Then $$(X_*P)_E$$ is a measure on $$\mathcal X$$ which is clearly dominated by $$X_*P$$, and there exists a Radon-Nikodym derivative $$\frac{\text d (X_*P)_E}{\text d (X_*P)} \in L_1(X_*P)$$. We define the conditional probability of event $$E$$ with respect to the random variable $$X$$ as the random variable:

$P(E\vert X)\equiv \frac{\text d (X_*P)_E}{\text d (X_*P)}.$

The intuition behind this definition comes from the tautology (given the definition in terms of Radon-Nikodym derivative):

$P(E \cap (X\in A)) = \intop _{A} P(E\vert X)\,\text d(X_*P).$ On one hand, from elementary probability theory, one would expect any sensible definition of conditional probability to satisfy this theorem. On the other hand, the theorem univocally identifies $$P(E\vert X)$$ as the Radon-Nikodym derivative $$\frac{\text d (X_*P)_E}{\text d (X_*P)}$$, modulo a set of $$X_*P$$ measure zero.

It is fairly easy to verify the following properties of conditional probability:

• Countable additivity. For any finite or countable family $$(E_i)_{i\in I}$$ of disjoint events, $$E_i \cap E_j = \emptyset$$, we have: $P(\cup _{i\in I} E_i \vert X = x) = \sum _{i \in I}P(E_i \vert X = x)$ for almost all $$x\in \Omega_X$$.

• Positivity. For any event $$E$$ we have $$P(E \vert X=x) \geq 0$$ for almost all $$x \in \Omega$$.

• Normalization. $$P(\Omega \vert X = x) = 1$$ for almost all $$x \in \Omega$$.

This, however, does not generally imply that $$P(\cdot \vert X = x)$$ is a probability measure for almost all $$x\in \Omega_X$$2. Functions $$\nu \colon \mathcal E \times \Omega _X \to \mathbb R^+$$ such that $$\nu(\cdot, x)$$ is a measure for all $$x\in \Omega _X$$, and $$\nu (E,\cdot)$$ is $$\mathcal X$$-measurable for all $$E\in \mathcal E$$ are called random measures. If $$\nu$$ satisfies $P(E \cap (X\in A)) = \intop _{A} \nu (E,\cdot)\,\text d(X_*P)$ (or, equivalently, if $$\nu (E,\cdot)$$ is a version of $$\frac{\text d (X_*P)_E}{\text d (X_*P)}$$) for all $$E\in \mathcal E$$, $$\nu$$ is called a regular conditional probability for the random variable $$X$$. If the space $$(\Omega,\, \mathcal E)$$ is regular enough (e.g. if it is a Borel space) one can prove that a regular conditional probability exists for any random variable $$X$$, see e.g. .

If $$X = \chi _A \colon \Omega \to \{0,1\}$$, where $$A\in \mathcal E$$ has positive probability $$0<P(A)<1$$, we can easily compute:

$P(E\vert \chi _A) = \chi _A\cdot \frac{P(E\cap A)}{P(A)} + (1-\chi _A)\cdot \frac{P(E\cap A^c)}{P(A^c)}$ In particular, $$P(E\vert A) \equiv P(E\vert \chi _A = 1)$$ agrees with the usual elementary definition of conditional probability.

More generally, if $$X = \text {id} _\Omega$$, where the target space is equipped with a sub-$$\sigma$$-algebra $$\mathcal F \subseteq \mathcal E$$, we have:

$P(E\vert \mathcal F)\equiv \frac{\text d (P\vert _\mathcal F)_E}{\text d (P\vert _\mathcal F)},$ which is sometimes taken as the definition of conditional probability with respect to a sub-$$\sigma$$-algebra. When $$\mathcal F$$ is the $$\sigma$$-algebra generated by a finite or countable partition $$\mathcal A = (A_i)_{i\in I}$$ of $$\mathcal \Omega$$ such that $$P(A_i)>0$$ for all $$i\in I$$, we find:

$P(E\vert \mathcal A)=\sum _{i\in I} \frac{P(E\cap A_i)}{P(A_i)}\chi _{A_i},$ again in agreement with elementary definitions.

Finally, if $$X\colon \Omega \to \mathbb R$$ is a real-valued random variable, where $$\mathbb R$$ is equipped with the Borel $$\sigma$$-algebra, $$X_*P$$ coincides with the Stieltjes measure generated by the cumulative distribution function $$P_X$$ of $$X$$. Denoting $$P(E\vert X)(x) \equiv P(E\vert X = x)$$, we may write:

$P(E \cap (X \in B))=\intop _B P(E\vert X=x) \,\text dP_X(x).$ and, in particular:

$P(E)=\intop _\mathbb R P(E\vert X=x) \,\text dP_X(x).$

Kallenberg, Olav. 1997. Foundations of Modern Probability. Vol. 2. Springer.

1. These claims can be proved by a standard argument using approximations by simple functions.↩︎

2. For instance, denoting by $$N_E = \{x \in \Omega_X \vert P(E\vert X = x) \geq 0\}$$, positivity implies that $$(X_*P)(N_E)=0$$. However, there’s no guarantee that $$\cup _{E\in \mathcal E} N_E$$ is also a measure zero set (and in fact it does not need to be measurable, since the union is generally uncountable).↩︎

### Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

### Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-SA 4.0. Source code is available at https://github.com/vgherard/vgherard.github.io/, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

### Citation

Gherardi (2023, Nov. 3). vgherard: Conditional Probability. Retrieved from https://vgherard.github.io/posts/2023-11-03-conditional-probability/
@misc{gherardi2023conditional,
}