Consistency and bias of OLS estimators

OLS estimators are consistent but generally biased - here’s an example.

Statistics
Regression
Linear Models
Model Misspecification
Author
Published

May 12, 2023

Given random variables \(Y\colon \Omega \to \mathbb R\) and \(X\colon \Omega \to \mathbb R ^{p}\) defined on an event space \(\Omega\), denote:

\[ \beta = \arg \min _{\beta ^\prime } \mathbb E[(Y-X \beta^\prime )^2]= \mathbb E(X^TX)^{-1}\mathbb E(X^TY), \tag{1}\]

so that \(X \beta\) is the best linear predictor of \(Y\) in terms of \(X\) (\(X\) is regarded as a row vector).

Let \((\textbf Y, \textbf X)\) be independent samples from the joint \(XY\) distribution, with independent observations stacked vertically in \(N \times 1\) and \(N \times p\) matrices respectively, as customary. Then the usual Ordinary Least Squares (OLS) estimator of \(\beta\) is given by:

\[ \hat \beta = \arg \min _{\beta ^\prime}(\textbf Y - \textbf X \beta ^\prime)^2=(\textbf X^T\textbf X)^{-1} \textbf X^T \textbf Y. \tag{2}\]

This is a consistent, but generally biased estimator of \(\beta\).

Comparing Equation 1 and Equation 2, consistency follows immediately from the law of large numbers and continuity. In order to show that \(\mathbb E (\hat \beta) \neq \beta\) in general, it is sufficient to provide an example.

Consider, for instance (example adapted from D.A. Freedman):

\[ X \sim \mathcal N (0, 1),\qquad Y=X(1+aX^2) \] Recalling that \(\mathbb E (X^4) = 3\) for the standard normal, we have:

\[ \beta = 1+3a, \] where we have ignored a potential intercept term (which would vanish here, since \(\mathbb E (Y) = 0\)). To compute \(\mathbb E (\hat \beta)\), we use the identity \(\frac{e^{-z}}{z} = \intop _1 ^\infty \text d t\, e ^{-zt}\) to rewrite this expected value as:

\[ \begin{split} \mathbb E (\hat \beta) & = (2 \pi)^{-N/2} \intop \text d\textbf X \,e^{-\sum _j X_i ^2 /2} \dfrac{\sum _i X_i^2(1+aX_i^2)}{\sum _i X_i^2} = \frac{N}{2}\intop_1 ^\infty \text d t\,I(t) \\ I(t) & \equiv (2 \pi)^{-N/2} \intop \text d\textbf X\, e^{-t \sum _j X_j ^2 /2}X_1^2(1+aX_1^2) \end{split} \] The inner integral can be computed easily:

\[ I(t) = t^{-\frac{N}{2}}(\frac{1}{t}+a\frac{3}{t^2}) \] and we eventually find: \[ \mathbb E (\hat \beta) = 1+3 a\frac{N}{N+2} \]

The bias is thus given by:

\[ \beta - \mathbb E (\hat \beta) = \frac{6a}{N+2} \] This vanishes linearly, in agreement with the fact that \(\sqrt N (\hat \beta - \beta )\) converges in probability to a gaussian with zero mean and finite variance (which requires the bias to be \(o(N^{-1/2})\)).

Reuse

Citation

BibTeX citation:
@online{gherardi2023,
  author = {Gherardi, Valerio},
  title = {Consistency and Bias of {OLS} Estimators},
  date = {2023-05-12},
  url = {https://vgherard.github.io/posts/2023-05-12-consistency-and-bias-of-ols-estimators/consistency-and-bias-of-ols-estimators.html},
  langid = {en}
}
For attribution, please cite this work as:
Gherardi, Valerio. 2023. “Consistency and Bias of OLS Estimators.” May 12, 2023. https://vgherard.github.io/posts/2023-05-12-consistency-and-bias-of-ols-estimators/consistency-and-bias-of-ols-estimators.html.