AIC for the linear model: known vs. unknown variance

Does knowledge of noise variance have any effect on model selection for the mean?

Model Selection
Linear Models
Regression
Statistics
Author
Published

March 13, 2024

The Akaike Information Criterion (AIC) for the linear model \(Y = X \beta + \varepsilon\), with takes the form:

\[ \text{AIC}^{\text{(k)}} = \frac{(\mathbf Y-\mathbf X\hat \beta )^2}{\sigma ^2} + 2p \]

if the noise variance \(\sigma ^2 = \mathbb V(\varepsilon\vert X)\) is known, and:

\[ \text{AIC}^{\text{(u)}} = N\ln(\hat \sigma ^2) + 2(p + 1) \]

if \(\sigma^2\) is unknown. Here \(\hat \beta\) denotes the maximum-likelihood estimate of \(\beta\), and \(\hat \sigma ^2 = \frac{1}{N}(\mathbf Y -\mathbf X \hat \beta)^2\) the corresponding estimate of \(\sigma ^2\) if the latter is unknown; \(p\) is the dimension of the covariate vector \(X\).

One would expect knowledge on variance to have little effect on model selection for the mean, at least in a limit in which variance can be considered to be reasonably well estimated. In order to check that this is actually the case, we expand \(\text{AIC}^{\text{(u)}}\) differences to first order \(\hat \sigma _1 ^2 - \hat \sigma _2 ^2\):

\[ \begin{split} \text{AIC}^{\text{(u)}}_1-\text{AIC}^{\text{(u)}}_2 &= N\ln(\frac{\hat \sigma ^2_1}{\hat \sigma ^2_2}) + 2(p_1-p_2)\\ &\approx N\frac{\hat \sigma _{1}^2-\hat \sigma _2 ^2}{\hat \sigma _2 ^2} + 2(p_1-p_2)\\ & = \text{AIC}^{\text{(k)}}_1-\text{AIC}^{\text{(k)}}_2+N\frac{(\hat \sigma _{1}^2-\hat \sigma _2 ^2)(\sigma ^2-\hat \sigma _2 ^2)}{\hat \sigma _2 ^2\sigma^2} \end{split} \]

The approximation in the second line requires \(\vert \hat \sigma _1 ^2 - \hat \sigma _2 ^2\vert \ll\hat \sigma _2 ^2\). Furthermore, the last term in the final expression is a small fraction of \(\text{AIC}^{\text{(u)}}_1-\text{AIC}^{\text{(u)}}_2\) if \(|\sigma ^2 -\hat \sigma _2 ^2| \ll \sigma ^2\).

Putting these two conditions together, we obtain:

\[ |\hat \sigma _1 ^2 -\hat \sigma _2 ^2|,|\sigma ^2 -\hat \sigma _2 ^2| \ll \sigma ^2,\qquad \]

which means that \(\text{AIC}^{\text{(u)}}\) and \(\text{AIC}^{\text{(k)}}\) lead to the same model selection provided that the models involved in the AIC comparison estimate reasonably well the true variance.

Concluding remarks:

Reuse

Citation

BibTeX citation:
@online{gherardi2024,
  author = {Gherardi, Valerio},
  title = {AIC for the Linear Model: Known Vs. Unknown Variance},
  date = {2024-03-13},
  url = {https://vgherard.github.io/posts/2024-03-13-aic-for-the-linear-model-known-vs-unknown-variance/},
  langid = {en}
}
For attribution, please cite this work as:
Gherardi, Valerio. 2024. “AIC for the Linear Model: Known Vs. Unknown Variance.” March 13, 2024. https://vgherard.github.io/posts/2024-03-13-aic-for-the-linear-model-known-vs-unknown-variance/.