Functional Assessment of Fit

My perspective on evaluating the fit of data to the Rasch model was formed when I worked on a "functional assessment questionnaire" (FAQ) at the Hines Veterans Hospital (Becker et al., 1985; Schulz et al., 1985; Schulz et al., 1987). The FAQ was designed to assess the functioning of visually impaired veterans. Trainees of the Hines Blind Rehabilitation Center were assessed three times with the FAQ: immediately before, immediately after, and approximately six months after rehabilitation training. Rasch measures of trainees' functioning were used for treatment planning, needs assessment, and program evaluation.

One of my tasks was to convince rehabilitation workers that a Rasch measure of SKILL could be useful in treatment planning. I had to explain the "fit" of data to the Rasch model without relying on statistical abstractions, i.e., the variance of the standardized infit statistic across items or persons. This was a valuable experience for me, and opened my eyes to some inadequacies in, and misunderstandings of, summary indices of fit of data to the Rasch model.

The Rasch measure of SKILL was constructed from the question, "How hard is it for you to ...?". There were thirty-six skills. In the Table, these skills are represented by four-letter acronyms in column one. Response choices were "easy", "moderate", and "hard". Moderate and hard were rated 0, and easy was rated 1. Skill calibrations are shown in column two. Calibrations range from 4.03 for the most difficult skill, "reading braille" (RBRA), to -2.15 for the easiest skill, "dressing yourself", (DRES).

I gave rehabilitation workers examples of fit like those in the right-hand columns of the Table. These columns show the rated responses of three trainees having the same level of SKILL, -.12 logits, but different levels of fit: high (-2.11), average (-0.04), and low (1.77). The central columns show the logit difficulty of the skills for these particular trainees, and the Rasch probability that these trainees will be rated '1' on each skill. Based on this information, unexpected responses with p<0.5 are shown enclosed in parentheses, e.g., (0). The Table relates infit statistics to response patterns - to the number, improbability, and content of surprising ratings rather than to the normal distribution or to the probability of the infit statistics. Practical examples like this help practitioners develop a functional understanding of person-fit, and the level of fit needed to support various uses of the Rasch scale.

This presentation also recognizes that practitioners are concerned with the fit of data to the Rasch model on a case-by-case, and item-by-item basis. If the trainee fits the measurement scale, his measure is considered valid, regardless of the global fit of data to the scale. If the trainee's fit to the scale is poor, workers should question his ratings, regardless of how well other data fit. Over-fit, i.e., a negative standardized fit, is not a clinical problem for these practitioners.

Since they have to establish the person's need for training in each skill, rehabilitation staff are concerned with the validity of inferences about each person-item encounter. For this concern, the probability of each person-item rating is useful information. Improbable ratings suggest something unique about the trainee, and flag the need for a more detailed assessment. We consider a rating improbable when its probability is less than some critical value, say, .30. Patient #76 has three such ratings: he rated 0 on using stairs (STAI), 1 on using a typewriter (TYPE), and 1 on measuring length (MEAS). The overall infit statistic of trainee #76 is acceptable (-0.04), yet here are these three ratings we might want to investigate further.

From a theoretical standpoint, we can accept a few improbable ratings in the data as random in nature and hence of no particular significance for the validity of inferences. We expect some improbable ratings, given our assumptions about the probability distribution of raw scores as a function of an underlying continuous variable. We recognize, however, that there may be other, systematic effects in the data, such as interviewer effects, and that these effects are more likely to be revealed by detailed systematic investigation than by summary fit statistics. Therefore, in practical situations like treatment planning, an improbable rating warrants investigation even when our summary fit statistics say the data fit the model.

Sometimes summary fit indices can be of doubtful value. Take, for example, the variance of Rasch infit statistics across items or persons. Experienced users know that the variance of infit statistics across items is a function of the number of persons taking the test. One can always show that data do not fit the Rasch model if one is willing to collect enough data in order to increase the power of the statistical test. Why? Because no data fits any mathematical model. This is true even of the most popular model in statistics - the normal distribution. Absolutely no data from the real world is exactly normally distributed.

I question the utility of asking, "do these data fit the Rasch model?" or vice versa, especially when one has chosen the model a priori, as many choose the Rasch model, because it implements fundamental standards for measurement. What we need is more information on the degree to which misfit matters in the various uses of Rasch measures.

We also need indices that are more or less free of the effects of sample size. In my experience with Rasch analysis, the mean value of the "weighted mean square residual", INFIT MEANSQ, for a given item does not vary with sample size. Formulas for the unweighted and weighted mean square residual are in Wright and Masters (1982 p. 100, see RMT p. 81). The effects of sample size are introduced when the weighted mean square residual is standardized to mean 0 and variance 1, i.e., reported as INFIT ZSTD.

I would also like to see Rasch computer programs that allow the user to treat data as missing when the ratings involve mismatched person-item encounters. For example, standard paper and pencil tests typically require respondents to take items that are too easy or too hard. Guessing and carelessness then become significant factors in test performance. The provocation of these factors could be avoided in Rasch estimation if person measures and item calibrations were based only on data from person-item encounters within a reasonable range (scale distance) of the person or item. The loss of some data would increase standard errors of person measurement and item calibration, but I doubt the standard errors would increase as much as they would were one to introduce parameters for guessing (and other noise). I am using data from the Iowa Tests of Basic Skills to study this issue with a version of MFORMS (Wright and Schulz, 1988) that allows one to reject mismatched person-item data on the basis of the person and item scale values. [This feature has been implemented in BIGSTEPS].

In sum, I think we need to deal more constructively with the fit of data to the Rasch model. We encourage naive and unfair expectations about the Rasch model when we look only at summary indices of fit that are sensitive to sample size, like the standardized infit statistic. We can never prove that real data fit the Rasch model or any model. Real data do not fit mathematical models. We need to produce more substantive illustrations of fit, as in the Table, so that practitioners can gain a functional understanding of fit. We also need to show that misfit, or more specifically misfitting ratings, do not seriously invalidate most uses of the Rasch model, but are diagnostically productive and/or can be systematically avoided when constructing Rasch scales from data.

Becker, S., Lambert, R.W., Schulz, E.M., and Wright, B.D. (1985) An instrument to measure the activity level of the blind. International Journal of Rehabilitation Research 8(4): December.

Schulz, E.M. (1987) "Functional Assessment in Rehabilitation--An Example with the Visually Impaired" (Ph.D. dissertation. University of Chicago).

Schulz, E.M., Lambert, R.W., Becker, S., Wright, B.D., and Bezruczko, N. (1985) An assessment of the needs of rehabilitated blinded veterans. Journal of Visual Impairment and Blindness. 79 (7): 301-305.

Wright, B.D. and Schulz, E.M. (1987) MFORMS [A FORTRAN computer program for one-step (concurrent) item banking of dichotomous and partial credit data from multiple forms]

Examples of Fit of Trainees to Measurement Scale of Skill at Pre-rehabilitation
Skill name	SKILL scale value "d"	Difficulty "b-d" of skill for trainee with b=-.12	Probability trainee with b=-.12 rates "1"	Trainee 111 b=-.12 high fit	Trainee 76 b=-.12 average fit	Trainee 47 b=-.12 low fit
RBRA WBRA READ GROC TYPE UNFA MEAS WRIT PWRT TOOL TRAN OVEN STRE MATC TAXI ELEV STOV p=0.5 CUPB POUR SIGN VACU WASH CLIP BLOK MBED MEAT TAPE STAI TABL DISH MEDI TELE SHAV FAMI HAIR DRES	4.03 3.60 2.48 1.98 1.77 1.59 1.21 1.14 1.14 0.88 0.82 0.44 0.39 0.20 -0.07 -0.11 -0.11	4.15 3.72 2.60 2.10 1.89 1.71 1.33 1.26 1.26 1.00 .94 .56 .51 .32 .05 .01 .01	.016 .024 .069 .109 .131 .153 .209 .221 .221 .269 .281 .364 .375 .421 .488 .498 .498	0 0 0 0 0 0 0 0 0 0 0 0 0 0 (1) 0 (1)	0 0 0 0 (1) 0 (1) 0 0 0 0 0 (1) 0 (1) 0 0	0 0 (1) 0 0 0 0 0 0 0 (1) (1) 0 0 0 (1) (1)
	-0.16 -0.41 -0.57 -0.61 -0.65 -0.65 -0.65 -0.77 -0.89 -1.09 -1.17 -1.30 -1.30 -1.46 -1.72 -1.72 -1.95 -2.15 -2.15	-.04 -.29 -.45 -.49 -.53 -.53 -.53 -.65 -.77 -.97 -1.05 -1.18 -1.18 -1.34 -1.60 -1.60 -1.83 -2.03 -2.03	.510 .572 .612 .620 .629 .629 .629 .657 .684 .725 .741 .765 .765 .792 .832 .832 .862 .884 .884	1 1 (0) 1 1 (0) 1 1 1 1 (0) 1 1 1 1 1 1 1 1	(0) 1 1 (0) 1 (0) 1 1 (0) 1 (0) 1 1 1 1 1 1 1 1	1 1 1 1 1 1 (0) 1 1 (0) 1 1 1 1 (0) (0) 1 (0) (0)
Total	0.00	+0.12	18.000	18	18	18
Trainee standardized infit:				-2.11	-0.04	1.77

Functional assessment of fit. Schulz EM. … Rasch Measurement Transactions, 1990, 3:4 p.82

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn. George Engelhard, Jr. & Jue Wang	Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Apr. 21 - 22, 2025, Mon.-Tue.	International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025	On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025	On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
July 21 - 23, 2025, Mon.-Wed.	Pacific Rim Objective Measurement Symposium (PROMS) 2025, www.proms2025.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com