Consistency of AIC, BIC, RIC A model selection criterion is consistent if the probability of selecting the right model approaches 1 as n goes to infinity AIC is not consistent AIC effectively has a prior that each feature comes in with probability 1/2; It will with a positive probability choose a model with features that should not be there. (Note that it is consistent in the sense that the prediction converges to an unbiased estimator, so if you only care about prediction accuracy it is consistent.) BIC is consistent BIC effectively puts the probability of a variable being useful at 1/sqrt(n). as n goes to infinity (p fixed) it will choose the right model; Features will be incorporated into the model based on the cost of coding them to the accuracy dictated by the irreducible uncertainty RIC is not consistent RIC effectively has a prior that each feature comes in with probability 1/p As n goes to infinity, this is less conservative that BIC. (but only for n > p^2) Like AIC, with infinite number of observations it forms an unbiased estimator, but one that puts in too many features. (but the weights on the features that should not have been included will converge towards zero) More formally A consistent estimator is a rule for computing estimates of a parameter θ having the property that as the number of data points, n, used increases indefinitely, the resulting sequence of estimates converges in probability to θ. https://en.wikipedia.org/wiki/Consistent_estimator A sequence of estimators (as n grows indefinitely) is consistent if and only if it converges to a value and the bias converges to zero.