AIC for the linear model: known vs. unknown variance

Model Selection Linear Models Regression Statistics

Does knowledge of noise variance have any effect on model selection for the mean?

Valerio Gherardi

The Akaike Information Criterion (AIC) for the linear model \(Y = X \beta + \varepsilon\), with takes the form:

\[ \text{AIC}^{\text{(k)}} = \frac{(\mathbf Y-\mathbf X\hat \beta )^2}{\sigma ^2} + 2p \] if the noise variance \(\sigma ^2 = \mathbb V(\varepsilon\vert X)\) is known, and:

\[ \text{AIC}^{\text{(u)}} = N\ln(\hat \sigma ^2) + 2(p + 1) \]

if \(\sigma^2\) is unknown. Here \(\hat \beta\) denotes the maximum-likelihood estimate of \(\beta\), and \(\hat \sigma ^2 = \frac{1}{N}(\mathbf Y -\mathbf X \hat \beta)^2\) the corresponding estimate of \(\sigma ^2\) if the latter is unknown; \(p\) is the dimension of the covariate vector \(X\).

One would expect knowledge on variance to have little effect on model selection for the mean, at least in a limit in which variance can be considered to be reasonably well estimated. In order to check that this is actually the case, we expand \(\text{AIC}^{\text{(u)}}\) differences to first order \(\hat \sigma _1 ^2 - \hat \sigma _2 ^2\):

\[ \begin{split} \text{AIC}^{\text{(u)}}_1-\text{AIC}^{\text{(u)}}_2 &= N\ln(\frac{\hat \sigma ^2_1}{\hat \sigma ^2_2}) + 2(p_1-p_2)\\ &\approx N\frac{\hat \sigma _{1}^2-\hat \sigma _2 ^2}{\hat \sigma _2 ^2} + 2(p_1-p_2)\\ & = \text{AIC}^{\text{(k)}}_1-\text{AIC}^{\text{(k)}}_2+N\frac{(\hat \sigma _{1}^2-\hat \sigma _2 ^2)(\sigma ^2-\hat \sigma _2 ^2)}{\hat \sigma _2 ^2\sigma^2} \end{split} \] The approximation in the second line requires \(\vert \hat \sigma _1 ^2 - \hat \sigma _2 ^2\vert \ll\hat \sigma _2 ^2\). Furthermore, the last term in the final expression is a small fraction of \(\text{AIC}^{\text{(u)}}_1-\text{AIC}^{\text{(u)}}_2\) if \(|\sigma ^2 -\hat \sigma _2 ^2| \ll \sigma ^2\).

Putting these two conditions together, we obtain:

\[ |\hat \sigma _1 ^2 -\hat \sigma _2 ^2|,|\sigma ^2 -\hat \sigma _2 ^2| \ll \sigma ^2,\qquad \] which means that \(\text{AIC}^{\text{(u)}}\) and \(\text{AIC}^{\text{(k)}}\) lead to the same model selection provided that the models involved in the AIC comparison estimate reasonably well the true variance.

Concluding remarks:


If you see mistakes or want to suggest changes, please create an issue on the source repository.


Text and figures are licensed under Creative Commons Attribution CC BY-SA 4.0. Source code is available at, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".


For attribution, please cite this work as

Gherardi (2024, March 13). vgherard: AIC for the linear model: known vs. unknown variance. Retrieved from

BibTeX citation

  author = {Gherardi, Valerio},
  title = {vgherard: AIC for the linear model: known vs. unknown variance},
  url = {},
  year = {2024}