AIC for the linear model: known vs. unknown variance

Model Selection Linear Models Regression Statistics

Does knowledge of noise variance have any effect on model selection for the mean?

Valerio Gherardi https://vgherard.github.io
2024-03-13

The Akaike Information Criterion (AIC) for the linear model \(Y = X \beta + \varepsilon\), with takes the form:

\[ \text{AIC}^{\text{(k)}} = \frac{(\mathbf Y-\mathbf X\hat \beta )^2}{\sigma ^2} + 2p \] if the noise variance \(\sigma ^2 = \mathbb V(\varepsilon\vert X)\) is known, and:

\[ \text{AIC}^{\text{(u)}} = N\ln(\hat \sigma ^2) + 2(p + 1) \]

if \(\sigma^2\) is unknown. Here \(\hat \beta\) denotes the maximum-likelihood estimate of \(\beta\), and \(\hat \sigma ^2 = \frac{1}{N}(\mathbf Y -\mathbf X \hat \beta)^2\) the corresponding estimate of \(\sigma ^2\) if the latter is unknown; \(p\) is the dimension of the covariate vector \(X\).

One would expect knowledge on variance to have little effect on model selection for the mean, at least in a limit in which variance can be considered to be reasonably well estimated. In order to check that this is actually the case, we expand \(\text{AIC}^{\text{(u)}}\) differences to first order \(\hat \sigma _1 ^2 - \hat \sigma _2 ^2\):

\[ \begin{split} \text{AIC}^{\text{(u)}}_1-\text{AIC}^{\text{(u)}}_2 &= N\ln(\frac{\hat \sigma ^2_1}{\hat \sigma ^2_2}) + 2(p_1-p_2)\\ &\approx N\frac{\hat \sigma _{1}^2-\hat \sigma _2 ^2}{\hat \sigma _2 ^2} + 2(p_1-p_2)\\ & = \text{AIC}^{\text{(k)}}_1-\text{AIC}^{\text{(k)}}_2+N\frac{(\hat \sigma _{1}^2-\hat \sigma _2 ^2)(\sigma ^2-\hat \sigma _2 ^2)}{\hat \sigma _2 ^2\sigma^2} \end{split} \] The approximation in the second line requires \(\vert \hat \sigma _1 ^2 - \hat \sigma _2 ^2\vert \ll\hat \sigma _2 ^2\). Furthermore, the last term in the final expression is a small fraction of \(\text{AIC}^{\text{(u)}}_1-\text{AIC}^{\text{(u)}}_2\) if \(|\sigma ^2 -\hat \sigma _2 ^2| \ll \sigma ^2\).

Putting these two conditions together, we obtain:

\[ |\hat \sigma _1 ^2 -\hat \sigma _2 ^2|,|\sigma ^2 -\hat \sigma _2 ^2| \ll \sigma ^2,\qquad \] which means that \(\text{AIC}^{\text{(u)}}\) and \(\text{AIC}^{\text{(k)}}\) lead to the same model selection provided that the models involved in the AIC comparison estimate reasonably well the true variance.

Concluding remarks:

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-SA 4.0. Source code is available at https://github.com/vgherard/vgherard.github.io/, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Gherardi (2024, March 13). vgherard: AIC for the linear model: known vs. unknown variance. Retrieved from https://vgherard.github.io/posts/2024-03-13-aic-for-the-linear-model-known-vs-unknown-variance/

BibTeX citation

@misc{gherardi2024aic,
  author = {Gherardi, Valerio},
  title = {vgherard: AIC for the linear model: known vs. unknown variance},
  url = {https://vgherard.github.io/posts/2024-03-13-aic-for-the-linear-model-known-vs-unknown-variance/},
  year = {2024}
}