Consistency of AIC, BIC, RIC

A model selection criterion is consistent if the probability of selecting the right model approaches 1 as n goes to infinity

AIC is not consistent
  AIC effectively has a prior that each feature comes in with probability 1/2;
  It will with a positive probability choose a model with 
  features that should not be there. (Note that it is consistent in the sense
  that the prediction converges to an unbiased estimator, so if you only care about
  prediction accuracy it is consistent.)

BIC is consistent
 BIC effectively puts the  probability of a variable being useful at 1/sqrt(n).
 as n goes to infinity (p fixed) it will choose the right model;
 Features will be incorporated into the model based on the cost of coding them to
 the accuracy dictated by the irreducible uncertainty

RIC is not consistent
 RIC effectively has a prior that each feature comes in with probability 1/p
 As n goes to infinity, this is less conservative that BIC. (but only for n > p^2)
 Like AIC, with infinite number of observations it forms an unbiased estimator, but one
 that puts in too many features. (but the weights on the features that should not have
 been included will converge towards zero)


More formally

A consistent estimator  is a rule for computing estimates of a
parameter θ having the property that as the number of data points, n,
used increases indefinitely, the resulting sequence of estimates
converges in probability to θ.
   https://en.wikipedia.org/wiki/Consistent_estimator

A sequence of estimators (as n grows indefinitely) is consistent if and
only if it converges to a value and the bias converges to zero.