Correlation Without Causation

Cum hoc ergo propter hoc

Statistics
Author
Published

March 30, 2023

It is part of common knowledge that correlation does not require causation. Absence of causation, say between a condition \(p\) and an effect \(q\), means that the realization of \(p\) has no influence on the presence of \(q\). If this is the case, a statistical correlation between \(p\) and \(q\) can still be present, if the realization of \(p\) modifies our state of information about \(q\).

As an example, let \(X,Y\) be two conditionally independent binary random variables, with a common probability \(\Theta\) of evaluating to one. Think, for instance, of a machine that produces pairs of identical biased coins, with a probability of tails \(\Theta\). If \(\Theta\) is equal to a given value \(\theta\), the joint probability distribution of \(X\) and \(Y\) is:

\[ \text {Pr}(X=x,Y=y\vert \Theta = \theta) = B(x;\theta)B(y;\theta), \tag{1}\]

where \(B(z; \theta) = \theta ^z (1 - \theta) ^ {1-z}\). Whether or not this provides a satisfying probabilistic description of experiments on \(X\) and \(Y\) depends on context.

From a frequentist point of view, if \(\Theta\) is fixed once and for all, the right hand side of Equation 1 correctly describes the experimental outcomes of \(X\) and \(Y\) for some value of \(\theta\). On the other hand, if \(\Theta\) can change from experiment to experiment in a random fashion, and we do not observe its values \(\theta\), we clearly cannot use Equation 1 as it stands, as its usage requires knowing \(\theta\). Finally, from a bayesian’s point of view, if \(\Theta\) is fixed but unknown, Equation 1 does not describe our state of knowledge about \(X\) and \(Y\), because it assumes unavailable information (\(\Theta = \theta\)).

In the last two cases, what we’re actually after is the unconditional probability:

\[ \text{Pr}(X=x,\,Y=y)=\intop\,\text{d}P_\Theta(\theta) \,\text{Pr}(X=x,Y=y\vert\Theta = \theta) \tag{2}\]

where \(\text{d}P_\Theta(\theta)\) can be regarded either as the actual probability distribution of \(\Theta\) (in a frequentist framework) or as a subjective prior distribution (in a bayesian framework).

Plugging Equation 1 into Equation 2, we find:

\[ \begin{split} \text{Pr}(X=1,\,Y=1) & = \mathbb E(\Theta)^2+\text{Var}(\Theta)\\ \text{Pr}(X=1,\, Y=0)&=\mathbb E(\Theta)-\mathbb E(\Theta)^2-\text{Var}(\Theta)\\ \text{Pr}(X=0,\, Y=1)&=\mathbb E(\Theta)-\mathbb E(\Theta)^2-\text{Var}(\Theta)\\ \text{Pr}(X=0,\,Y=0) & = \mathbb (1-\mathbb E(\Theta))^2+\text{Var}(\Theta) \\ \end{split} \]

In particular, we have:

\[ \dfrac{\text{Pr}(Y = 1 \vert\, X = 1)}{\text {Pr}(Y=1)} = 1+\frac{\text{Var}(\Theta)}{\mathbb{E}(\Theta)^2}, \tag{3}\]

which means that, unconditionally, \(X\) and \(Y\) are not independent, but in fact positively correlated1.

Observations of this kind apply, mutatis mutandis, in many practical situations. For instance if we were modeling the time series of new visitors to a website, we could reasonably assume that the number of yesterday’s new visitors does not influence the number of today’s ones (if individual visitors are unlikely to interact with each other). Yet, it would be wrong to assume, and easy to disprove, that these two numbers are by themselves statistically independent, because yesterday’s new visitors carry useful background information on today’s potential new visitors.

The bottom line of the post is that lack of causation does not imply lack of correlation, which is logically equivalent to the original motto… but, for some strange reason, I find easier to forget.

Footnotes

  1. Here I’m using the word correlation in a loose sense, as in the popular motto.↩︎

Reuse

Citation

BibTeX citation:
@online{gherardi2023,
  author = {Gherardi, Valerio},
  title = {Correlation {Without} {Causation}},
  date = {2023-03-30},
  url = {https://vgherard.github.io/posts/2023-03-10-correlation-without-causation/},
  langid = {en}
}
For attribution, please cite this work as:
Gherardi, Valerio. 2023. “Correlation Without Causation.” March 30, 2023. https://vgherard.github.io/posts/2023-03-10-correlation-without-causation/.