My perspective on evaluating the fit of data to the Rasch model was formed when I worked on a "functional assessment questionnaire" (FAQ) at the Hines Veterans Hospital (Becker et al., 1985; Schulz et al., 1985; Schulz et al., 1987). The FAQ was designed to assess the functioning of visually impaired veterans. Trainees of the Hines Blind Rehabilitation Center were assessed three times with the FAQ: immediately before, immediately after, and approximately six months after rehabilitation training. Rasch measures of trainees' functioning were used for treatment planning, needs assessment, and program evaluation.
One of my tasks was to convince rehabilitation workers that a Rasch measure of SKILL could be useful in treatment planning. I had to explain the "fit" of data to the Rasch model without relying on statistical abstractions, i.e., the variance of the standardized infit statistic across items or persons. This was a valuable experience for me, and opened my eyes to some inadequacies in, and misunderstandings of, summary indices of fit of data to the Rasch model.
The Rasch measure of SKILL was constructed from the question, "How hard is it for you to ...?". There were thirty-six skills. In the Table, these skills are represented by four-letter acronyms in column one. Response choices were "easy", "moderate", and "hard". Moderate and hard were rated 0, and easy was rated 1. Skill calibrations are shown in column two. Calibrations range from 4.03 for the most difficult skill, "reading braille" (RBRA), to -2.15 for the easiest skill, "dressing yourself", (DRES).
I gave rehabilitation workers examples of fit like those in the right-hand columns of the Table. These columns show the rated responses of three trainees having the same level of SKILL, -.12 logits, but different levels of fit: high (-2.11), average (-0.04), and low (1.77). The central columns show the logit difficulty of the skills for these particular trainees, and the Rasch probability that these trainees will be rated '1' on each skill. Based on this information, unexpected responses with p<0.5 are shown enclosed in parentheses, e.g., (0). The Table relates infit statistics to response patterns - to the number, improbability, and content of surprising ratings rather than to the normal distribution or to the probability of the infit statistics. Practical examples like this help practitioners develop a functional understanding of person-fit, and the level of fit needed to support various uses of the Rasch scale.
This presentation also recognizes that practitioners are concerned with the fit of data to the Rasch model on a case-by-case, and item-by-item basis. If the trainee fits the measurement scale, his measure is considered valid, regardless of the global fit of data to the scale. If the trainee's fit to the scale is poor, workers should question his ratings, regardless of how well other data fit. Over-fit, i.e., a negative standardized fit, is not a clinical problem for these practitioners.
Since they have to establish the person's need for training in each skill, rehabilitation staff are concerned with the validity of inferences about each person-item encounter. For this concern, the probability of each person-item rating is useful information. Improbable ratings suggest something unique about the trainee, and flag the need for a more detailed assessment. We consider a rating improbable when its probability is less than some critical value, say, .30. Patient #76 has three such ratings: he rated 0 on using stairs (STAI), 1 on using a typewriter (TYPE), and 1 on measuring length (MEAS). The overall infit statistic of trainee #76 is acceptable (-0.04), yet here are these three ratings we might want to investigate further.
From a theoretical standpoint, we can accept a few improbable ratings in the data as random in nature and hence of no particular significance for the validity of inferences. We expect some improbable ratings, given our assumptions about the probability distribution of raw scores as a function of an underlying continuous variable. We recognize, however, that there may be other, systematic effects in the data, such as interviewer effects, and that these effects are more likely to be revealed by detailed systematic investigation than by summary fit statistics. Therefore, in practical situations like treatment planning, an improbable rating warrants investigation even when our summary fit statistics say the data fit the model.
Sometimes summary fit indices can be of doubtful value. Take, for example, the variance of Rasch infit statistics across items or persons. Experienced users know that the variance of infit statistics across items is a function of the number of persons taking the test. One can always show that data do not fit the Rasch model if one is willing to collect enough data in order to increase the power of the statistical test. Why? Because no data fits any mathematical model. This is true even of the most popular model in statistics - the normal distribution. Absolutely no data from the real world is exactly normally distributed.
I question the utility of asking, "do these data fit the Rasch model?" or vice versa, especially when one has chosen the model a priori, as many choose the Rasch model, because it implements fundamental standards for measurement. What we need is more information on the degree to which misfit matters in the various uses of Rasch measures.
We also need indices that are more or less free of the effects of sample size. In my experience with Rasch analysis, the mean value of the "weighted mean square residual", INFIT MEANSQ, for a given item does not vary with sample size. Formulas for the unweighted and weighted mean square residual are in Wright and Masters (1982 p. 100, see RMT p. 81). The effects of sample size are introduced when the weighted mean square residual is standardized to mean 0 and variance 1, i.e., reported as INFIT ZSTD.
I would also like to see Rasch computer programs that allow the user to treat data as missing when the ratings involve mismatched person-item encounters. For example, standard paper and pencil tests typically require respondents to take items that are too easy or too hard. Guessing and carelessness then become significant factors in test performance. The provocation of these factors could be avoided in Rasch estimation if person measures and item calibrations were based only on data from person-item encounters within a reasonable range (scale distance) of the person or item. The loss of some data would increase standard errors of person measurement and item calibration, but I doubt the standard errors would increase as much as they would were one to introduce parameters for guessing (and other noise). I am using data from the Iowa Tests of Basic Skills to study this issue with a version of MFORMS (Wright and Schulz, 1988) that allows one to reject mismatched person-item data on the basis of the person and item scale values. [This feature has been implemented in BIGSTEPS].
In sum, I think we need to deal more constructively with the fit of data to the Rasch model. We encourage naive and unfair expectations about the Rasch model when we look only at summary indices of fit that are sensitive to sample size, like the standardized infit statistic. We can never prove that real data fit the Rasch model or any model. Real data do not fit mathematical models. We need to produce more substantive illustrations of fit, as in the Table, so that practitioners can gain a functional understanding of fit. We also need to show that misfit, or more specifically misfitting ratings, do not seriously invalidate most uses of the Rasch model, but are diagnostically productive and/or can be systematically avoided when constructing Rasch scales from data.
E. Matthew Schulz
Chicago Public Schools
Becker, S., Lambert, R.W., Schulz, E.M., and Wright, B.D. (1985) An instrument to measure the activity level of the blind. International Journal of Rehabilitation Research 8(4): December.
Schulz, E.M. (1987) "Functional Assessment in Rehabilitation--An Example with the Visually Impaired" (Ph.D. dissertation. University of Chicago).
Schulz, E.M., Lambert, R.W., Becker, S., Wright, B.D., and Bezruczko, N. (1985) An assessment of the needs of rehabilitated blinded veterans. Journal of Visual Impairment and Blindness. 79 (7): 301-305.
Wright, B.D. and Schulz, E.M. (1987) MFORMS [A FORTRAN computer program for one-step (concurrent) item banking of dichotomous and partial credit data from multiple forms]
Examples of Fit of Trainees to Measurement Scale of Skill at Pre-rehabilitation | ||||||
---|---|---|---|---|---|---|
Skill name |
SKILL scale value "d" |
Difficulty "b-d" of skill for trainee with b=-.12 |
Probability trainee with b=-.12 rates "1" |
Trainee 111 b=-.12 high fit |
Trainee 76 b=-.12 average fit |
Trainee 47 b=-.12 low fit |
RBRA WBRA READ GROC TYPE UNFA MEAS WRIT PWRT TOOL TRAN OVEN STRE MATC TAXI ELEV STOV p=0.5 CUPB POUR SIGN VACU WASH CLIP BLOK MBED MEAT TAPE STAI TABL DISH MEDI TELE SHAV FAMI HAIR DRES |
4.03 3.60 2.48 1.98 1.77 1.59 1.21 1.14 1.14 0.88 0.82 0.44 0.39 0.20 -0.07 -0.11 -0.11 |
4.15 3.72 2.60 2.10 1.89 1.71 1.33 1.26 1.26 1.00 .94 .56 .51 .32 .05 .01 .01 |
.016 .024 .069 .109 .131 .153 .209 .221 .221 .269 .281 .364 .375 .421 .488 .498 .498 |
0 0 0 0 0 0 0 0 0 0 0 0 0 0 (1) 0 (1) |
0 0 0 0 (1) 0 (1) 0 0 0 0 0 (1) 0 (1) 0 0 |
0 0 (1) 0 0 0 0 0 0 0 (1) (1) 0 0 0 (1) (1) |
-0.16 -0.41 -0.57 -0.61 -0.65 -0.65 -0.65 -0.77 -0.89 -1.09 -1.17 -1.30 -1.30 -1.46 -1.72 -1.72 -1.95 -2.15 -2.15 |
-.04 -.29 -.45 -.49 -.53 -.53 -.53 -.65 -.77 -.97 -1.05 -1.18 -1.18 -1.34 -1.60 -1.60 -1.83 -2.03 -2.03 |
.510 .572 .612 .620 .629 .629 .629 .657 .684 .725 .741 .765 .765 .792 .832 .832 .862 .884 .884 |
1 1 (0) 1 1 (0) 1 1 1 1 (0) 1 1 1 1 1 1 1 1 |
(0) 1 1 (0) 1 (0) 1 1 (0) 1 (0) 1 1 1 1 1 1 1 1 |
1 1 1 1 1 1 (0) 1 1 (0) 1 1 1 1 (0) (0) 1 (0) (0) | |
Total | 0.00 | +0.12 | 18.000 | 18 | 18 | 18 |
Trainee standardized infit: | -2.11 | -0.04 | 1.77 |
Functional assessment of fit. Schulz EM. … Rasch Measurement Transactions, 1990, 3:4 p.82
Rasch Publications | ||||
---|---|---|---|---|
Rasch Measurement Transactions (free, online) | Rasch Measurement research papers (free, online) | Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch | Applying the Rasch Model 3rd. Ed., Bond & Fox | Best Test Design, Wright & Stone |
Rating Scale Analysis, Wright & Masters | Introduction to Rasch Measurement, E. Smith & R. Smith | Introduction to Many-Facet Rasch Measurement, Thomas Eckes | Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. | Statistical Analyses for Language Testers, Rita Green |
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar | Journal of Applied Measurement | Rasch models for measurement, David Andrich | Constructing Measures, Mark Wilson | Rasch Analysis in the Human Sciences, Boone, Stave, Yale |
in Spanish: | Análisis de Rasch para todos, Agustín Tristán | Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez |
Forum | Rasch Measurement Forum to discuss any Rasch-related topic |
Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement
Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.
Coming Rasch-related Events | |
---|---|
June 23 - July 21, 2023, Fri.-Fri. | On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com |
Aug. 11 - Sept. 8, 2023, Fri.-Fri. | On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com |
The URL of this page is www.rasch.org/rmt/rmt34d.htm
Website: www.rasch.org/rmt/contents.htm