Functional Assessment of Fit

My perspective on evaluating the fit of data to the Rasch model was formed when I worked on a "functional assessment questionnaire" (FAQ) at the Hines Veterans Hospital (Becker et al., 1985; Schulz et al., 1985; Schulz et al., 1987). The FAQ was designed to assess the functioning of visually impaired veterans. Trainees of the Hines Blind Rehabilitation Center were assessed three times with the FAQ: immediately before, immediately after, and approximately six months after rehabilitation training. Rasch measures of trainees' functioning were used for treatment planning, needs assessment, and program evaluation.

One of my tasks was to convince rehabilitation workers that a Rasch measure of SKILL could be useful in treatment planning. I had to explain the "fit" of data to the Rasch model without relying on statistical abstractions, i.e., the variance of the standardized infit statistic across items or persons. This was a valuable experience for me, and opened my eyes to some inadequacies in, and misunderstandings of, summary indices of fit of data to the Rasch model.

The Rasch measure of SKILL was constructed from the question, "How hard is it for you to ...?". There were thirty-six skills. In the Table, these skills are represented by four-letter acronyms in column one. Response choices were "easy", "moderate", and "hard". Moderate and hard were rated 0, and easy was rated 1. Skill calibrations are shown in column two. Calibrations range from 4.03 for the most difficult skill, "reading braille" (RBRA), to -2.15 for the easiest skill, "dressing yourself", (DRES).

I gave rehabilitation workers examples of fit like those in the right-hand columns of the Table. These columns show the rated responses of three trainees having the same level of SKILL, -.12 logits, but different levels of fit: high (-2.11), average (-0.04), and low (1.77). The central columns show the logit difficulty of the skills for these particular trainees, and the Rasch probability that these trainees will be rated '1' on each skill. Based on this information, unexpected responses with p<0.5 are shown enclosed in parentheses, e.g., (0). The Table relates infit statistics to response patterns - to the number, improbability, and content of surprising ratings rather than to the normal distribution or to the probability of the infit statistics. Practical examples like this help practitioners develop a functional understanding of person-fit, and the level of fit needed to support various uses of the Rasch scale.

This presentation also recognizes that practitioners are concerned with the fit of data to the Rasch model on a case-by-case, and item-by-item basis. If the trainee fits the measurement scale, his measure is considered valid, regardless of the global fit of data to the scale. If the trainee's fit to the scale is poor, workers should question his ratings, regardless of how well other data fit. Over-fit, i.e., a negative standardized fit, is not a clinical problem for these practitioners.

Since they have to establish the person's need for training in each skill, rehabilitation staff are concerned with the validity of inferences about each person-item encounter. For this concern, the probability of each person-item rating is useful information. Improbable ratings suggest something unique about the trainee, and flag the need for a more detailed assessment. We consider a rating improbable when its probability is less than some critical value, say, .30. Patient #76 has three such ratings: he rated 0 on using stairs (STAI), 1 on using a typewriter (TYPE), and 1 on measuring length (MEAS). The overall infit statistic of trainee #76 is acceptable (-0.04), yet here are these three ratings we might want to investigate further.

From a theoretical standpoint, we can accept a few improbable ratings in the data as random in nature and hence of no particular significance for the validity of inferences. We expect some improbable ratings, given our assumptions about the probability distribution of raw scores as a function of an underlying continuous variable. We recognize, however, that there may be other, systematic effects in the data, such as interviewer effects, and that these effects are more likely to be revealed by detailed systematic investigation than by summary fit statistics. Therefore, in practical situations like treatment planning, an improbable rating warrants investigation even when our summary fit statistics say the data fit the model.

Sometimes summary fit indices can be of doubtful value. Take, for example, the variance of Rasch infit statistics across items or persons. Experienced users know that the variance of infit statistics across items is a function of the number of persons taking the test. One can always show that data do not fit the Rasch model if one is willing to collect enough data in order to increase the power of the statistical test. Why? Because no data fits any mathematical model. This is true even of the most popular model in statistics - the normal distribution. Absolutely no data from the real world is exactly normally distributed.

I question the utility of asking, "do these data fit the Rasch model?" or vice versa, especially when one has chosen the model a priori, as many choose the Rasch model, because it implements fundamental standards for measurement. What we need is more information on the degree to which misfit matters in the various uses of Rasch measures.

We also need indices that are more or less free of the effects of sample size. In my experience with Rasch analysis, the mean value of the "weighted mean square residual", INFIT MEANSQ, for a given item does not vary with sample size. Formulas for the unweighted and weighted mean square residual are in Wright and Masters (1982 p. 100, see RMT p. 81). The effects of sample size are introduced when the weighted mean square residual is standardized to mean 0 and variance 1, i.e., reported as INFIT ZSTD.

I would also like to see Rasch computer programs that allow the user to treat data as missing when the ratings involve mismatched person-item encounters. For example, standard paper and pencil tests typically require respondents to take items that are too easy or too hard. Guessing and carelessness then become significant factors in test performance. The provocation of these factors could be avoided in Rasch estimation if person measures and item calibrations were based only on data from person-item encounters within a reasonable range (scale distance) of the person or item. The loss of some data would increase standard errors of person measurement and item calibration, but I doubt the standard errors would increase as much as they would were one to introduce parameters for guessing (and other noise). I am using data from the Iowa Tests of Basic Skills to study this issue with a version of MFORMS (Wright and Schulz, 1988) that allows one to reject mismatched person-item data on the basis of the person and item scale values. [This feature has been implemented in BIGSTEPS].

In sum, I think we need to deal more constructively with the fit of data to the Rasch model. We encourage naive and unfair expectations about the Rasch model when we look only at summary indices of fit that are sensitive to sample size, like the standardized infit statistic. We can never prove that real data fit the Rasch model or any model. Real data do not fit mathematical models. We need to produce more substantive illustrations of fit, as in the Table, so that practitioners can gain a functional understanding of fit. We also need to show that misfit, or more specifically misfitting ratings, do not seriously invalidate most uses of the Rasch model, but are diagnostically productive and/or can be systematically avoided when constructing Rasch scales from data.

E. Matthew Schulz
Chicago Public Schools

Becker, S., Lambert, R.W., Schulz, E.M., and Wright, B.D. (1985) An instrument to measure the activity level of the blind. International Journal of Rehabilitation Research 8(4): December.

Schulz, E.M. (1987) "Functional Assessment in Rehabilitation--An Example with the Visually Impaired" (Ph.D. dissertation. University of Chicago).

Schulz, E.M., Lambert, R.W., Becker, S., Wright, B.D., and Bezruczko, N. (1985) An assessment of the needs of rehabilitated blinded veterans. Journal of Visual Impairment and Blindness. 79 (7): 301-305.

Wright, B.D. and Schulz, E.M. (1987) MFORMS [A FORTRAN computer program for one-step (concurrent) item banking of dichotomous and partial credit data from multiple forms]

Examples of Fit of Trainees to Measurement Scale of Skill at Pre-rehabilitation
Skill
name
SKILL
scale
value
"d"
Difficulty
"b-d" of skill
for trainee
with b=-.12
Probability
trainee with
b=-.12
rates "1"
Trainee
 111
b=-.12
high fit
Trainee
76
b=-.12
average fit
Trainee
47
b=-.12
low fit
RBRA
WBRA
READ
GROC
TYPE
UNFA
MEAS
WRIT
PWRT
TOOL
TRAN
OVEN
STRE
MATC
TAXI
ELEV
STOV
p=0.5
CUPB
POUR
SIGN
VACU
WASH
CLIP
BLOK
MBED
MEAT
TAPE
STAI
TABL
DISH
MEDI
TELE
SHAV
FAMI
HAIR
DRES
4.03
 3.60
 2.48
 1.98
 1.77
 1.59
 1.21
 1.14
 1.14
 0.88
 0.82
 0.44
 0.39
 0.20
-0.07
-0.11
-0.11
  4.15
  3.72
  2.60
  2.10
  1.89
  1.71
  1.33
  1.26
  1.26
  1.00
  .94
  .56
  .51
  .32
  .05
  .01
  .01
  .016
  .024
  .069
  .109
  .131
  .153
  .209
  .221
  .221
  .269
  .281
  .364
  .375
  .421
  .488
  .498
  .498
0
0
0
0
0
0
0
0
0
0
0
0
0
0
(1)
0
(1)
0
0
0
0
(1)
0
(1)
0
0
0
0
0
(1)
0
(1)
0
0
0
0
(1)
0
0
0
0
0
0
0
(1)
(1)
0
0
0
(1)
(1)
-0.16
-0.41
-0.57
-0.61
-0.65
-0.65
-0.65
-0.77
-0.89
-1.09
-1.17
-1.30
-1.30
-1.46
-1.72
-1.72
-1.95
-2.15
-2.15
  -.04
  -.29
  -.45
  -.49
  -.53
  -.53
  -.53
  -.65
  -.77
  -.97
 -1.05
 -1.18
 -1.18
 -1.34
 -1.60
 -1.60
 -1.83
 -2.03
 -2.03
  .510
  .572
  .612
  .620
  .629
  .629
  .629
  .657
  .684
  .725
  .741
  .765
  .765
  .792
  .832
  .832
  .862
  .884
  .884
1
1
(0)
1
1
(0)
1
1
1
1
(0)
1
1
1
1
1
1
1
1
(0)
1
1
(0)
1
(0)
1
1
(0)
1
(0)
1
1
1
1
1
1
1
1
1
1
1
1
1
1
(0)
1
1
(0)
1
1
1
1
(0)
(0)
1
(0)
(0)
Total  0.00 +0.12 18.000 18 18 18
Trainee standardized infit: -2.11 -0.04 1.77

Functional assessment of fit. Schulz EM. … Rasch Measurement Transactions, 1990, 3:4 p.82




Rasch Publications
Rasch Measurement Transactions (free, online) Rasch Measurement research papers (free, online) Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Applying the Rasch Model 3rd. Ed., Bond & Fox Best Test Design, Wright & Stone
Rating Scale Analysis, Wright & Masters Introduction to Rasch Measurement, E. Smith & R. Smith Introduction to Many-Facet Rasch Measurement, Thomas Eckes Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. Statistical Analyses for Language Testers, Rita Green
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Journal of Applied Measurement Rasch models for measurement, David Andrich Constructing Measures, Mark Wilson Rasch Analysis in the Human Sciences, Boone, Stave, Yale
in Spanish: Análisis de Rasch para todos, Agustín Tristán Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez

To be emailed about new material on www.rasch.org
please enter your email address here:

I want to Subscribe: & click below
I want to Unsubscribe: & click below

Please set your SPAM filter to accept emails from Rasch.org

www.rasch.org welcomes your comments:

Your email address (if you want us to reply):

 

ForumRasch Measurement Forum to discuss any Rasch-related topic

Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Sept. 27-29, 2017, Wed.-Fri. In-person workshop: Introductory Rasch Analysis using RUMM2030, Leeds, UK (M. Horton), Announcement
Oct. 13 - Nov. 10, 2017, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Oct. 25-27, 2017, Wed.-Fri. In-person workshop: Applying the Rasch Model hands-on introductory workshop, Melbourne, Australia (T. Bond, B&FSteps), Announcement
Dec. 6-8, 2017, Wed.-Fri. In-person workshop: Introductory Rasch Analysis using RUMM2030, Leeds, UK (M. Horton), Announcement
Jan. 5 - Feb. 2, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Jan. 10-16, 2018, Wed.-Tues. In-person workshop: Advanced Course in Rasch Measurement Theory and the application of RUMM2030, Perth, Australia (D. Andrich), Announcement
Jan. 17-19, 2018, Wed.-Fri. Rasch Conference: Seventh International Conference on Probabilistic Models for Measurement, Matilda Bay Club, Perth, Australia, Website
April 13-17, 2018, Fri.-Tues. AERA, New York, NY, www.aera.net
May 25 - June 22, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 29 - July 27, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
Aug. 10 - Sept. 7, 2018, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com
Oct. 12 - Nov. 9, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com

 

The URL of this page is www.rasch.org/rmt/rmt34d.htm

Website: www.rasch.org/rmt/contents.htm