Consistency and bias of OLS estimators

Statistics Regression Linear Models Model Misspecification

OLS estimators are consistent but generally biased - here’s an example.


Given random variables \(Y\colon \Omega \to \mathbb R\) and \(X\colon \Omega \to \mathbb R ^{p}\) defined on an event space \(\Omega\), denote:

\[ \beta = \arg \min _{\beta ^\prime } \mathbb E[(Y-X \beta^\prime )^2]= \mathbb E(X^TX)^{-1}\mathbb E(X^TY), \tag{1} \] so that \(X \beta\) is the best linear predictor of \(Y\) in terms of \(X\) (\(X\) is regarded as a row vector).

Let \((\textbf Y, \textbf X)\) be independent samples from the joint \(XY\) distribution, with independent observations stacked vertically in \(N \times 1\) and \(N \times p\) matrices respectively, as customary. Then the usual Ordinary Least Squares (OLS) estimator of \(\beta\) is given by:

\[ \hat \beta = \arg \min _{\beta ^\prime}(\textbf Y - \textbf X \beta ^\prime)^2=(\textbf X^T\textbf X)^{-1} \textbf X^T \textbf Y. \tag{2} \] This is a consistent, but generally biased estimator of \(\beta\).

Comparing Eqs. (1) and (2), consistency follows immediately from the law of large numbers and continuity. In order to show that \(\mathbb E (\hat \beta) \neq \beta\) in general, it is sufficient to provide an example.

Consider, for instance (example adapted from D.A. Freedman):

\[ X \sim \mathcal N (0, 1),\qquad Y=X(1+aX^2) \] Recalling that \(\mathbb E (X^4) = 3\) for the standard normal, we have:

\[ \beta = 1+3a, \] where we have ignored a potential intercept term (which would vanish here, since \(\mathbb E (Y) = 0\)). To compute \(\mathbb E (\hat \beta)\), we use the identity \(\frac{e^{-z}}{z} = \intop _1 ^\infty \text d t\, e ^{-zt}\) to rewrite this expected value as:

\[ \begin{split} \mathbb E (\hat \beta) & = (2 \pi)^{-N/2} \intop \text d\textbf X \,e^{-\sum _j X_i ^2 /2} \dfrac{\sum _i X_i^2(1+aX_i^2)}{\sum _i X_i^2} = \frac{N}{2}\intop_1 ^\infty \text d t\,I(t) \\ I(t) & \equiv (2 \pi)^{-N/2} \intop \text d\textbf X\, e^{-t \sum _j X_j ^2 /2}X_1^2(1+aX_1^2) \end{split} \] The inner integral can be computed easily:

\[ I(t) = t^{-\frac{N}{2}}(\frac{1}{t}+a\frac{3}{t^2}) \] and we eventually find: \[ \mathbb E (\hat \beta) = 1+3 a\frac{N}{N+2} \]

The bias is thus given by:

\[ \beta - \mathbb E (\hat \beta) = \frac{6a}{N+2} \] This vanishes linearly, in agreement with the fact that \(\sqrt N (\hat \beta - \beta )\) converges in probability to a gaussian with zero mean and finite variance (which requires the bias to be \(o(N^{-1/2})\)).


If you see mistakes or want to suggest changes, please create an issue on the source repository.


Text and figures are licensed under Creative Commons Attribution CC BY-SA 4.0. Source code is available at, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".


For attribution, please cite this work as

vgherard (2023, May 12). Valerio Gherardi: Consistency and bias of OLS estimators. Retrieved from

BibTeX citation

  author = {vgherard, },
  title = {Valerio Gherardi: Consistency and bias of OLS estimators},
  url = {},
  year = {2023}