Let \((\Omega, \,\mathcal E,\,P)\) be a probability space, and let \(X\colon \Omega \to \Omega _X\) be a random variable with target space \((\Omega _X, \mathcal X)\). We denote the corresponding push-forward measure on \(\mathcal X\) by \(X_*P\), so that:
\[ (X_*P)(B)= P(X^{-1}(B)) \] for all \(B\in \mathcal X\). A measurable function \(f\colon \Omega _X \to \mathbb R\) is integrable with respect to \(X_*P\) if and only if \(f\circ X\) is integrable with respect to \(P\), in which case1: \[ \intop _\mathcal X f\,\text d(X_*P) = \intop _\mathcal \Omega (f \circ X)\,\text dP. \] Now, given an arbitrary event \(E\in \mathcal E\) define \((X_*P)_E(A)=P(E\cap X^{-1}(A))\). Then \((X_*P)_E\) is a measure on \(\mathcal X\) which is clearly dominated by \(X_*P\), and there exists a Radon-Nikodym derivative \(\frac{\text d (X_*P)_E}{\text d (X_*P)} \in L_1(X_*P)\). We define the conditional probability of event \(E\) with respect to the random variable \(X\) as the random variable:
\[ P(E\vert X)\equiv \frac{\text d (X_*P)_E}{\text d (X_*P)}. \]
The intuition behind this definition comes from the tautology (given the definition in terms of Radon-Nikodym derivative):
\[ P(E \cap (X\in A)) = \intop _{A} P(E\vert X)\,\text d(X_*P). \] On one hand, from elementary probability theory, one would expect any sensible definition of conditional probability to satisfy this theorem. On the other hand, the theorem univocally identifies \(P(E\vert X)\) as the Radon-Nikodym derivative \(\frac{\text d (X_*P)_E}{\text d (X_*P)}\), modulo a set of \(X_*P\) measure zero.
It is fairly easy to verify the following properties of conditional probability:
Countable additivity. For any finite or countable family \((E_i)_{i\in I}\) of disjoint events, \(E_i \cap E_j = \emptyset\), we have: \[ P(\cup _{i\in I} E_i \vert X = x) = \sum _{i \in I}P(E_i \vert X = x) \] for almost all \(x\in \Omega_X\).
Positivity. For any event \(E\) we have \(P(E \vert X=x) \geq 0\) for almost all \(x \in \Omega\).
Normalization. \(P(\Omega \vert X = x) = 1\) for almost all \(x \in \Omega\).
This, however, does not generally imply that \(P(\cdot \vert X = x)\) is a probability measure for almost all \(x\in \Omega_X\)2. Functions \(\nu \colon \mathcal E \times \Omega _X \to \mathbb R^+\) such that \(\nu(\cdot, x)\) is a measure for all \(x\in \Omega _X\), and \(\nu (E,\cdot)\) is \(\mathcal X\)-measurable for all \(E\in \mathcal E\) are called random measures. If \(\nu\) satisfies \[ P(E \cap (X\in A)) = \intop _{A} \nu (E,\cdot)\,\text d(X_*P) \] (or, equivalently, if \(\nu (E,\cdot)\) is a version of \(\frac{\text d (X_*P)_E}{\text d (X_*P)}\)) for all \(E\in \mathcal E\), \(\nu\) is called a regular conditional probability for the random variable \(X\). If the space \((\Omega,\, \mathcal E)\) is regular enough (e.g. if it is a Borel space) one can prove that a regular conditional probability exists for any random variable \(X\), see e.g. (Kallenberg 1997).
If \(X = \chi _A \colon \Omega \to \{0,1\}\), where \(A\in \mathcal E\) has positive probability \(0<P(A)<1\), we can easily compute:
\[ P(E\vert \chi _A) = \chi _A\cdot \frac{P(E\cap A)}{P(A)} + (1-\chi _A)\cdot \frac{P(E\cap A^c)}{P(A^c)} \] In particular, \(P(E\vert A) \equiv P(E\vert \chi _A = 1)\) agrees with the usual elementary definition of conditional probability.
More generally, if \(X = \text {id} _\Omega\), where the target space is equipped with a sub-\(\sigma\)-algebra \(\mathcal F \subseteq \mathcal E\), we have:
\[ P(E\vert \mathcal F)\equiv \frac{\text d (P\vert _\mathcal F)_E}{\text d (P\vert _\mathcal F)}, \] which is sometimes taken as the definition of conditional probability with respect to a sub-\(\sigma\)-algebra. When \(\mathcal F\) is the \(\sigma\)-algebra generated by a finite or countable partition \(\mathcal A = (A_i)_{i\in I}\) of \(\mathcal \Omega\) such that \(P(A_i)>0\) for all \(i\in I\), we find:
\[ P(E\vert \mathcal A)=\sum _{i\in I} \frac{P(E\cap A_i)}{P(A_i)}\chi _{A_i}, \] again in agreement with elementary definitions.
Finally, if \(X\colon \Omega \to \mathbb R\) is a real-valued random variable, where \(\mathbb R\) is equipped with the Borel \(\sigma\)-algebra, \(X_*P\) coincides with the Stieltjes measure generated by the cumulative distribution function \(P_X\) of \(X\). Denoting \(P(E\vert X)(x) \equiv P(E\vert X = x)\), we may write:
\[ P(E \cap (X \in B))=\intop _B P(E\vert X=x) \,\text dP_X(x). \] and, in particular:
\[ P(E)=\intop _\mathbb R P(E\vert X=x) \,\text dP_X(x). \]
References
Footnotes
These claims can be proved by a standard argument using approximations by simple functions.↩︎
For instance, denoting by \(N_E = \{x \in \Omega_X \vert P(E\vert X = x) \geq 0\}\), positivity implies that \((X_*P)(N_E)=0\). However, there’s no guarantee that \(\cup _{E\in \mathcal E} N_E\) is also a measure zero set (and in fact it does not need to be measurable, since the union is generally uncountable).↩︎
Reuse
Citation
@online{gherardi2023,
author = {Gherardi, Valerio},
title = {Conditional {Probability}},
date = {2023-11-03},
url = {https://vgherard.github.io/posts/2023-11-03-conditional-probability/conditional-probability.html},
langid = {en}
}