Where Does Misfit Begin?

The Rasch model has the delightful feature that the probability of the occurrence of any particular set of observations can be determined precisely from the latent parameters. Furthermore no set of observations is impossible, though many are highly improbable. Every conceivable set of observations has some non-zero probability and so could occur. What is surprising is that the probability of observing even the most likely set of observations can be rather small. None of this is news to those who have scrutinized simulated sets of observations. But how many have?

The procedural question is to decide the point at which the Rasch-model probability of a set of observations is so small that it is no longer reasonable to consider such a set of observations as the outcome of a Rasch measurement process. The field-tested solution is to calculate fit statistics which reflect, in some useful way, improbability of a set (or sub-set) of observations. For example, a demonstrably useful, but approximate, fit statistic, based on standardized mean-square residuals (Wright and Panchapakesan 1969), may indicate that a particular set of observations, or a set less probable than it in the same way, can be expected to occur by chance only once in 100 times.

Some purists attack this interpretation of the commonly-used fit statistics. Their position is that such a probability calculation can only be derived from a theoretical distribution and is only true when the distribution of the underlying parameters is of a particular ideal nature. Since this can never happen in any practice, "all bets are off". Accordingly, they say, we cannot rely on any calculated probability, but we must adjust it in some way. Their argument may seem appealing, but it is naive. No observed statistic can ever have exactly the ideal distribution through which it is evaluated. But even were the probability calculations to be successfully adjusted, a more important problem would remain.

Suppose a test is given to 5,000 children. The misfit probability significance level is arbitrarily (there being no other way) chosen as .01. On inspection, it turns out that 50 of the children's fit statistics are at the .01 level or smaller. From an overall perspective the data clearly fit, since, on theoretical grounds, we expect 1% of the children's response strings to be at or below the .01 level and 1% are. Nevertheless, in any real testing situation, we are concerned not only about the sample, but also about the individual. We want the data to fit for each individual, i.e., "better" than sample theory requires. We really don't want any particular child's fit statistic to be at or below the .01 level. Consequently, we question and perhaps eliminate from the analysis the response strings of the 50 children. The hoped for result is that all alarming misfit will be eliminated from the data set.

The fallacy in this approach is immediately apparent when we eliminate these 50 children and then reanalyze the data set. The effect of eliminating these children is, almost always, to produce an overfit of the data to the model. A consequence of the removal of the individual outliers is to alter the structure of the very randomness in the data which is intrinsic to Rasch model estimation. On re-estimation, this causes the generation of a revised set of estimates (assuming the parameter estimates are not pre-set or "anchored") together with new fit statistics. Again, we see, alas, that around 50 of the remaining children have fit statistics at or below the .01 level. If the "estimate - trim misfits - re-estimate - trim misfits" process is repeated mindlessly, the observations of all 5,000 children will eventually be eliminated. This kind of "inability to eliminate misfit" has caused considerable anxiety among practitioners.

Let us look at the problem of misfitting response strings from a different vantage point: what are the odds of the occurrence of a data set, based on a realistic test length, in which none of the 5,000 children has a fit statistic at or below the .01 level" You can do the calculation for yourself, but I make it to be less than 1 in 1,000,000,000,000,000,000,000. In other words, a data set in which there are no misfits is so unlikely as to be a misfit itself!

So, a mechanical, mathematical, approach is doomed to failure. Either we choose a conventional limit, such as .01, and are in danger of discovering, in the course of successive analyses, that none of our data fit the model, or we choose a misfit cut-off limit that is so remote that all data is decreed to fit the model. In practice, of course, we compromise. A common method of "saving face" is to neglect to report the fit statistics obtained on re-estimation.

My proposal for this dilemma is to face the facts! The fit statistics are indicative, not absolute. The decision as to whether a set of data fits the model is a matter for the informed judgement of the analyst, based on the details of a measurable performance by a member of the test sample on the protocol, rather than a matter of some arbitrary cut-off rule applied to a column of numbers on a computer print-out.

"Whoa!," some practitioners cry, "I have to make fit decisions about strings of responses relating to test protocols and students about whom I know nothing. I have no idea what a measurable string of responses looks like. I have no choice but to use the numerical values of the fit statistics as they stand". In this case, it would seem that an arbitrary criterion has to be chosen, and there is no way of knowing, a priori, how closely that criterion aligns with a cut-off chosen on the basis of measurableness of performance. In this case, the choice of the criterion is a decision made apart from the measurement properties of the Rasch model, statistical sophistry notwithstanding.

Based on informed judgment, how, then, do we deal with the 50 "misfitting" children? First, arrange them in order of improbability of response strings, which is approximated by the size of the misfit statistic. Then examine the substantive details of the most unlikely response string. Does it appear to be the outcome of a valid measurement process? If it does, we may care to examine the next most unlikely child's behavior, but we can expect it to appear even more measurable. If this is what happens, none of the children need be eliminated; the data fit the model. On the other hand, if a worst-fitting set of responses does not appear to be in accord with uni-dimensional measurement, then eliminate all, or at least the contradictory part, of that child's responses and continue on to the next child, until the criterion of measurableness is met. If there are many apparently misfitting children, we could stratify them into layers of improbability in order to expedite determining the measurable performance threshold. With experience gained over similar protocols and samples, a close correspondence between the measurableness criterion and some values of fit statistics may become clear, but these must be expected to be entirely local to the situation.

Once unmeasurable performances have been removed, re-estimation will again produce around 1% of the fit statistics at or below the .01 level. But now this is no longer a cause for concern, since, once we determine where misfit begins, we also know where it ends. Such apparent misfit now confirms the stochastic nature of the data - that they do, indeed, fit the model.

Where does misfit begin? Linacre JM. … Rasch Measurement Transactions, 1990, 3:4 p.80

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn. George Engelhard, Jr. & Jue Wang	Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Apr. 21 - 22, 2025, Mon.-Tue.	International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025	On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025	On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
July 21 - 23, 2025, Mon.-Wed.	Pacific Rim Objective Measurement Symposium (PROMS) 2025, www.proms2025.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com