Theory and Practice of Fit

The article by Douglas serves as a starting point for a discussion of the theory of fit and its practice in Rasch measurement. Although we often speak of fit as a unitary concept, there are really two underlying questions being asked when fit is discussed.

The first question concerns fit of the data to the model. If the desirable properties of Rasch measurement are to hold, then the data must approximate the model. This is important in calibrating item sets, in equating test forms, in studies of bias and in studies of the underlying definition of the variable. This question must be answered for the data as a whole, before further analysis of the data are useful. It is similar to that asked about any statistical analysis: are the specifications of the model approximated by the data under study? In Rasch measurement there are many global, item, and person fit statistics that have been used to assess this question including the Wright-Panchapakesan (1969) statistics.

The second question concerns the degree to which the total score that an examinee earns on a test adequately summarizes the examinee's total set of responses. This question of response fit comes later in the analysis at the point when a decision must be made about the individual examinee based on the results of his/her particular examination performance - decisions such as admission, assigning grades, promotion, graduation, certification. A variety of person fit or appropriateness techniques have been developed to answer this question. This is not a question of the utility of the data for analysis by the measurement model, but of the meaning (validity) of the measure for the individual. It is possible, and in practice inevitable, that a favorable answer to the question of the utility of the data for analysis by a Rasch measurement model does not guarantee a favorable answer to the question concerning validity for every individual tested. No matter how hard we try to construct potentially valid tests there will always be individual performances for whom those tests were not valid.

To understand the first question it is necessary to understand that models are abstractions designed to bring order to observations. Real data can never fit any model perfectly. That is why simulated data must be used to develop significance values for fit indices. The relevant question is one of robustness. How robust is a set of data to violations of the model's requirements? Can the analysis extract useful information from the data? A vital property of strong models, such as the Rasch models, is that the information extracted from the data can be useful even when the data do not fit the model very well. This is because the model constructs a strong frame of reference against which the particular properties of the data are revealed. Experience has shown that data analyses guided by Rasch measurement models are quite robust to violations of the model's requirements. In particular, individual measurement disturbances seldom have tangible effects on equating or bias studies.

The alert investigator can strengthen the "person-freed" item calibration property by a priori removing, from the calibration sample, individuals whose responses exhibit measurement disturbances without actually calculating person fit indices. The BICAL item calibration program, for example, made it easy for investigators to omit extreme raw scores from item calibration, i.e., performances with scores near or below the chance level where "guessing" might occur or near perfect performances where "carelessness" might occur. The BICAL program also produced, on request, two sets of item calibrations, one with all misfitting persons (based on total weighted fit) excluded and one with them included. The alert investigator could compare these two calibrations to determine the effect, if any, of the misfitting persons on the item calibrations.

As more powerful data editors and word processors became available, these features were dropped from subsequent Rasch calibration programs. The fact remains, however, that misfit editing of the data prior to final calibration often produces more stable item calibrations.

When in doubt, run two calibrations with misfits and/or low and high scoring persons included and excluded and study the differences.

With regard to the validity of the total score or ability estimate for a person, a second set of concerns arises. Investigators often assume that the fit indices contained in calibration programs for items and persons are sufficient to guarantee the validity of the measure for the individual against all meaningful measurement disturbances. These global fit indices may provide adequate information for answering the question as to the utility of the data for analysis by the model. However, they only begin to provide the information necessary to answer the second question. Studies by Smith point to the need to use a combination of total and between fit statistics when investigating the validity of person measures or item difficulties (Smith, 1986, 1988; Smith & Hedges, 1982). The extent of care and thoroughness needed to validate person measures depends on the importance of the decision to be made with the measures.

It has been implied that the size of most testing programs makes it impractical to look closely at the validity of person measures. But recent efforts by the College Board (for PSAT and SAT tests) and Australian Council for Educational Research (KIDMAPS for grade level achievement tests in New South Wales) show that the statistical results of person fit analysis can be expressed in terms that are accessible and useful to students and parents.

The primary tool for fit analysis in Rasch measurement have been the standardized chi-square statistics based on the work of Wright and Panchapakesan (1969) and further elaborated by Mead (1975), Wright (1977), Wright and Stone (1979), and Wright and Masters (1982). Since their inception these statistics have come under criticism from several fronts. Initial criticism was based on the fact that the squared differences between observed and predicted responses for item/person interactions were only approximately chi-square. Since, however, real data never fit any ideal model, all applications of chi-square are approximations.

Later criticism was that the true distributional properties of these approximate chi-squares or their transformations were unknown. A variety of alternatives have been proposed (Andersen, 1973; Van den Wollenberg, 1982; Yen, 1981). But study and practice has shown that these other statistics offer no useful advantage over the Wright- Panchapakesan statistics. Work by Smith on the distribution of standardized residuals and the null distributions of standardized fit statistics has shown that even though these statistics are not "true" chi-squares, they are regular enough to identify outliers reliably.

The most recent suggestion for an alternative fit statistic, based on the exact probabilities of a given person response pattern (Molenaar and Hoijtink, 1990), is discussed in the Douglas paper. The Wright-Panchapakesan statistics are computationally simpler than the Molenaar-Hoijtink statistic, and are highly correlated with the exact probabilistic results, but can be summarized to answer a priori hypotheses that are inaccessible with the Molenaar-Hoijtink statistic.

The Wright-Panchapakesan (WP) statistics and their derivatives have offered an efficient and practical way to evaluate fit to the Rasch measurement models for 20 years. The WP approximations stand up well in comparison with possibly more precise tests such as likelihood-ratio chi-squares and the Molenaar-Hoijtink statistic. Studies of the distributional properties of WP statistics show that the tails of their distributions are regular enough to identify outliers reliably. There is no practical reason to use anything more complicated.

Andersen, E.B. (1973) A goodness of fit test for the Rasch Model. Psychometrika, 38, 123-140.

Mead, R.J. (1975) Analysis of fit to the Rasch Model. Ph.D. dissertation. University of Chicago.

Smith, R.M. (1986) Person fit in the Rasch Model. Educational and Psychological Measurement, 46, 359-372.

Smith, R.M. & Hedges, L.V. (1982) A comparison of likelihood ratio chi-square and Pearsonian chi-square tests of fit in the Rasch model. Educational Research and Perspectives, 9, 44-54.

van den Wollenberg, A.L. (1982) Two new test statistics for the Rasch model. Psychometrika, 47, 123-140.

Wright, B.D. (1977) Solving measurement problems with the Rasch Model. Journal of Educational Measurement, 14, 97-116.

Wright, B.D. & Panchapakesan N.A. (1969) A procedure for sample-free item analysis. Educational and Psychological Measurement, 29, 23 - 48.

Yen, W.M. (1981) Using simulation results to choose a latent trait model Applied Psychological Measurement, 5, 245-262.

Theory and practice of fit. Smith RM. … Rasch Measurement Transactions, 1990, 3:4 p.78

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn. George Engelhard, Jr. & Jue Wang	Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Apr. 21 - 22, 2025, Mon.-Tue.	International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025	On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025	On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
July 21 - 23, 2025, Mon.-Wed.	Pacific Rim Objective Measurement Symposium (PROMS) 2025, www.proms2025.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com