# Consistency and bias of OLS estimators

OLS estimators are consistent but generally biased - here’s an example.

Valerio Gherardi https://vgherard.github.io
2023-05-12

Given random variables $$Y\colon \Omega \to \mathbb R$$ and $$X\colon \Omega \to \mathbb R ^{p}$$ defined on an event space $$\Omega$$, denote:

$\beta = \arg \min _{\beta ^\prime } \mathbb E[(Y-X \beta^\prime )^2]= \mathbb E(X^TX)^{-1}\mathbb E(X^TY), \tag{1}$ so that $$X \beta$$ is the best linear predictor of $$Y$$ in terms of $$X$$ ($$X$$ is regarded as a row vector).

Let $$(\textbf Y, \textbf X)$$ be independent samples from the joint $$XY$$ distribution, with independent observations stacked vertically in $$N \times 1$$ and $$N \times p$$ matrices respectively, as customary. Then the usual Ordinary Least Squares (OLS) estimator of $$\beta$$ is given by:

$\hat \beta = \arg \min _{\beta ^\prime}(\textbf Y - \textbf X \beta ^\prime)^2=(\textbf X^T\textbf X)^{-1} \textbf X^T \textbf Y. \tag{2}$ This is a consistent, but generally biased estimator of $$\beta$$.

Comparing Eqs. (1) and (2), consistency follows immediately from the law of large numbers and continuity. In order to show that $$\mathbb E (\hat \beta) \neq \beta$$ in general, it is sufficient to provide an example.

Consider, for instance (example adapted from D.A. Freedman):

$X \sim \mathcal N (0, 1),\qquad Y=X(1+aX^2)$ Recalling that $$\mathbb E (X^4) = 3$$ for the standard normal, we have:

$\beta = 1+3a,$ where we have ignored a potential intercept term (which would vanish here, since $$\mathbb E (Y) = 0$$). To compute $$\mathbb E (\hat \beta)$$, we use the identity $$\frac{e^{-z}}{z} = \intop _1 ^\infty \text d t\, e ^{-zt}$$ to rewrite this expected value as:

$\begin{split} \mathbb E (\hat \beta) & = (2 \pi)^{-N/2} \intop \text d\textbf X \,e^{-\sum _j X_i ^2 /2} \dfrac{\sum _i X_i^2(1+aX_i^2)}{\sum _i X_i^2} = \frac{N}{2}\intop_1 ^\infty \text d t\,I(t) \\ I(t) & \equiv (2 \pi)^{-N/2} \intop \text d\textbf X\, e^{-t \sum _j X_j ^2 /2}X_1^2(1+aX_1^2) \end{split}$ The inner integral can be computed easily:

$I(t) = t^{-\frac{N}{2}}(\frac{1}{t}+a\frac{3}{t^2})$ and we eventually find: $\mathbb E (\hat \beta) = 1+3 a\frac{N}{N+2}$

The bias is thus given by:

$\beta - \mathbb E (\hat \beta) = \frac{6a}{N+2}$ This vanishes linearly, in agreement with the fact that $$\sqrt N (\hat \beta - \beta )$$ converges in probability to a gaussian with zero mean and finite variance (which requires the bias to be $$o(N^{-1/2})$$).

### Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

### Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-SA 4.0. Source code is available at https://github.com/vgherard/vgherard.github.io/, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

### Citation

Gherardi (2023, May 12). vgherard: Consistency and bias of OLS estimators. Retrieved from https://vgherard.github.io/posts/2023-05-12-consistency-and-bias-of-ols-estimators/
@misc{gherardi2023consistency,
}