Students stumble over the meaning of "overfit", indicated in Rasch program output by negative t-statistics and mean-squares less than 1.0 (chi-squares less than their d.f.). This is an aspect of Rasch measurement which sounds strange to analysts familiar with conventional descriptive statistics and model-building.
The nearest thing to overfit in conventional statistical analysis (which fits models to data) is over-parameterization. But statistics courses usually spend little time discussing this problem, so the students imagine that "the better the fit, the better the model!" In building statistical models, they may be instructed to choose between better and worse fitting models, but they are rarely told to reject a model because its fit is "too good".
A demonstration of over-parameterization is to put some points on an x-y plot, then fit a series of polynomials with more and more terms to those points. It becomes obvious that too many terms actually make prediction of a new point (especially one extrapolated outside the range of the current points) worse, not better.
Here is an example. The observations for 6 time-points are plotted. What is our prediction for the next time-point? The 4 trend-lines are polynomials fitted to the observations. The higher-order is the polynomial, the better is its the fit to the existing points, but, beyond the quadratic, the worse the prediction of the next point.
In Rasch, we are fitting data to models, so the Rasch equivalent to overfit due to over-parameterization is overfit due to redundancy (over-predictability) in the data.
Based on a comment by Steve Walter, Graduate Institute of Applied Linguistics.
Conceptualizing Overfit S. Walter, Rasch Measurement Transactions, 2008, 22:2 p. 1165
The URL of this page is www.rasch.org/rmt/rmt222b.htm
Website: www.rasch.org/rmt/contents.htm