Response Patterns and Their Probabilities

A common misapprehension about person fit statistics has been the belief that the widely-known and widely-used standardized infit and outfit statistics have the power to detect all types of departures from the objective measurement model. The statistics in question, either of which is referred to here as Z, were designed to identify misfit associated with undifferentiated patterns of responses, and, in the case of outfit, the presence of lucky guessing or unlucky carelessness. What is often not appreciated is that there are other aspects of person misfit for which Z is not a suitable detective. Not only should we not expect Z to detect all aspects of misfit in persons, but any insistence on statistics that might claim such universality would be naive.

The response to critics, therefore, who point out that Z does not detect this-or-that particular type of misfit (usually induced artificially via specifically distributed simulated data) is to reemphasize which types of data problems can be usefully detected by Z. Certainly data constructed to produce two or more clearly identifiable subsets of items or persons, for example, can result in response patterns which Z would not identify as "misfitting". This is because the multi-dimensionality of such data could go unrecognized without fit analysis based on relevant partitions of the persons or the items. In this example, some form of between item-group or between person-group fit statistic is called for. The decision concerning which subsets of items or persons to investigate corresponds to an a priori hypothesis about the data. One might describe this type of misfit investigation as "confirmatory" in contrast to the "exploratory" role for which Z was designed. The practical utility of Z has been noted in the literature from the time of its invention (Wright and Panchapakesan, 1969), yet some call upon it to solve problems for which it was never intended.

There are also critics who, whilst recognizing the situations in which Z is rightly called for, claim that even there it is too crude to do a decent job of detective work, that it sometimes picks out patterns as aberrant when it shouldn't or that it sometimes fails to detect misfitting patterns when it should. Statisticians would describe these problems in terms of the power of the Z statistic: too much power in the first case and too little in the second. Recent research completed by the author and Ben Wright shows that at the exploratory level where some form of statistic is required to fine-tune our decisions about misfitting response patterns, Z is, in fact, quite satisfactory. It is our intention to report this work in a future article. For the present, some observations which have arisen out of that study are in order.

Of all of the possible response patterns resulting in a particular raw score, the majority are so unlikely under the governance of the objective measurement model, that one does not need any form of a fine-tuned Z statistic in order to make sensible decisions about person fit. Even in the case of a very short test of L=8 items (spread uniformly in difficulty between plus and minus three logits), of the seventy patterns resulting in a raw score of r=4, 80 percent have exceedance probabilities less than 0.05.

This comment requires amplification. The probabilities referred to are the conditional probabilities of the individual response patterns, conditioned on the raw score (in this example, the raw score of four). Furthermore, the determination of "exceedance" necessitates the calculation of the conditional probability of every response pattern which is less probable than the one under investigation. To implement this all patterns are ordered from most to least likely according to their probability. When items are sequenced in order of increasing difficulty, no matter how long the test nor what the value of the raw score under study, the most likely pattern is always the Guttman (1111...0000) and the least likely is always its inverse (0000...1111). For the eight items mentioned and a raw score of four, (11110000) has a probability of 0.43 and (00001111) has a probability of 0.0000005. The remaining 68 patterns are listed in the Table, and have probabilities between these two values. These individual probabilities, however, are not much help by themselves, since, for any test of practical length, the individual probabilities are infinitesimally small for the majority of patterns. The individual probability of a response pattern such as (10101100) in the current example is 0.006, but the accumulated probability of this and the 53 more aberrant and hence less probable patterns amounts to 0.04. If hypothesis testing is the psychometrician's preference, then the conclusion at an alpha level of 0.05 would be that the pattern (10101100) does not fit the objective measurement model. In practice, it is unlikely that rigid alpha levels would be useful, so patterns within some exceedance interval, say .025 to .075 (flagged by "?" in the Table), might be identified as warranting investigation.

The final observation concerns an "exact" fit statistic for patterns, but one with a distribution which has to be approximated in a way more complex than Z. As with so many aspects of objective measurement, the first suggestion of an exact fit statistic came from Georg Rasch in his 1960 book. He was describing fit in an overall sense when he made the observation that an exact test would require the determination of the probabilities of all data matrices (of 0's and 1's) less probable than the one under investigation. He recognized, as have others since (Douglas 1982), that the combinatorial problems of determining which are less probable matrices are sufficient to direct researchers to approximations and to focus those approximations on the rows (for persons) or columns (for items) of the matrix of responses.

The conditional probability of a pattern is a function of (1) the sum of the difficulties, Di, of the items the respondent had correct, and (2) the elementary symmetric function, tr, associated with a raw score of r, that is,

P([X]|r) = E(-ΣXiDi) / γr

where Xi = 0, 1 (for incorrect, correct) and Di = item difficulty.

There is a direct monotonic relationship between the accumulation of these probabilities (the exceedance) and the negative of the numerator terms, ΣXiDi, which we will call Wr. This is because, for fixed r, the denominator γr is a constant.

All response patterns may be ordered from least likely to most likely by the magnitude of Wr. This gives us an "exact" fit statistic for person performance. If the difficulties of the eight items in the example being used as illustration are ±3.0, ±2.1, ±1.3, and ±0.4, and we focus on a raw score of r=4, the most likely pattern (11110000) has W4 of -3.0 - 2.1 - 1.3 - 0.4 = -6.9. An intermediate pattern such as (10101100) has W4 = -2.6, and the least likely pattern (00001111) has W4 = 6.9. Extreme values of W, like extreme values of Z, in either direction, indicate misfit.

A preliminary investigation of this statistic and approximations to its distribution has been carried out by Molenaar and Hoijtink (1990). Since W is dependent on both the number and dispersion of item difficulties, however, it is capable of little more than a probability ordering of that score's patterns. The published approximations, however, reduce these dependencies. But they increase the complexity of the calculations substantially and would appear to have limited utility. Our research shows that the approximations embedded in Z are sufficiently accurate without resorting to complex expressions involving second and third-order symmetric functions. These conclusions add further evidence to the extensive body of research on fit carried out over the last decade by Richard Smith (1988).

Graham A. Douglas
University of Western Australia

Douglas, G.A. (1982) Issues in the fit of data to psychometric models. Education research and Perspectives, 9, 32-43.

Molenaar, I.W. and Hoijtink, H. (1990) The many null distributions of person fit indices. Psychometrika, 55(1), 75-106.

Rasch, G. (1960, 1992) Probabilistic models for some intelligence and attainment tests. Chicago: MESA Press.

Smith, R.M. (1988). The distributional properties of Rasch standardized residuals. Educational and Psychological Measurement, 48, 657-667.

Wright, B.D. and Panchapakesan, N.A. (1969) A procedure for sample-free item analysis. Educational and Psychological Measurement, 29, 23-48.

Ways of scoring 4 on 8 items, uniformly distributed ±3 logits
Response strings in descending order of probability
Items ordered from easiest to hardest
Response
String
Probability Exceedance Diagnosis   Response
String
Probability Exceedance Diagnosis
Most Likely
11110000
11101000
11011000
11100100
10111000
11100010
11010100
01111000
10110100
11001100
11010010
11100001
01110100
10101100
10110010
11001010
11010001
01101100
01110010
10011100
10101010
10110001
11000110
11001001
10011010
10100110
01011100
01101010
01110001
10101001
11000101
00111100
01011010
01100110
10010110
 
 
0.4317
0.1832
0.0777
0.0777
0.0330
0.0330
0.0330
0.0140
0.0140
0.0140
0.0140
0.0140
0.0059
0.0059
0.0059
0.0059
0.0059
0.0025
0.0025
0.0025
0.0025
0.0025
0.0025
0.0025
0.0011
0.0011
0.0011
0.0011
0.0011
0.0011
0.0011
0.0005
0.0005
0.0005
0.0005
 
 
1.0000
0.5683
 
0.3074
 
 
0.1637
 
 
 
 
0.0747
 
 
 
 
0.0369
 
 
 
 
 
 
0.0158
 
 
 
 
 
 
0.0069
 
 
 
0.0026
 
 
Muted
OK
 
 
 
 
OK
?
?
?
?
?
?
?
?
?
?
Noisy
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Noisy
 
   
10011001
10100101
11000011
01101001
00111010
01010110
10001110
10010101
10100011
01011001
01100101
00110110
00111001
01001110
01010101
01100011
10001101
10010011
00101110
10001011
00110101
01001101
01010011
10000111
00011110
00101101
00110011
01001011
00011101
00101011
01000111
00011011
00100111
00010111
00001111
Least Likely
 
0.0005
0.0005
0.0005
0.0005
0.0002
0.0002
0.0002
0.0002
0.0002
0.0002
0.0002
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
 
 
 
 
 
0.0026
 
 
 
 
 
 
0.0010
 
 
 
 
 
 
0.0004
 
 
 
 
0.0001
 
 
 
 
0.0000
 
 
0.0000
 
0.0000
0.0000
0.0000
 
 
Noisy
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Noisy
 

Response patterns and their probabilities. Douglas GA. … Rasch Measurement Transactions, 1990, 3:4 p.75


Please help with Standard Dataset 4: Andrich Rating Scale Model



Rasch Publications
Rasch Measurement Transactions (free, online) Rasch Measurement research papers (free, online) Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Applying the Rasch Model 3rd. Ed., Bond & Fox Best Test Design, Wright & Stone
Rating Scale Analysis, Wright & Masters Introduction to Rasch Measurement, E. Smith & R. Smith Introduction to Many-Facet Rasch Measurement, Thomas Eckes Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. Statistical Analyses for Language Testers, Rita Green
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Journal of Applied Measurement Rasch models for measurement, David Andrich Constructing Measures, Mark Wilson Rasch Analysis in the Human Sciences, Boone, Stave, Yale
in Spanish: Análisis de Rasch para todos, Agustín Tristán Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez

To be emailed about new material on www.rasch.org
please enter your email address here:

I want to Subscribe: & click below
I want to Unsubscribe: & click below

Please set your SPAM filter to accept emails from Rasch.org

www.rasch.org welcomes your comments:

Your email address (if you want us to reply):

 

ForumRasch Measurement Forum to discuss any Rasch-related topic

Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Sept. 15-16, 2017, Fri.-Sat. IOMC 2017: International Outcome Measurement Conference, Chicago, jampress.org/iomc2017.htm
Oct. 13 - Nov. 10, 2017, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Oct. 25-27, 2017, Wed.-Fri. In-person workshop: Applying the Rasch Model hands-on introductory workshop, Melbourne, Australia (T. Bond, B&FSteps), Announcement
Jan. 5 - Feb. 2, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Jan. 10-16, 2018, Wed.-Tues. In-person workshop: Advanced Course in Rasch Measurement Theory and the application of RUMM2030, Perth, Australia (D. Andrich), Announcement
Jan. 17-19, 2018, Wed.-Fri. Rasch Conference: Seventh International Conference on Probabilistic Models for Measurement, Matilda Bay Club, Perth, Australia, Website
April 13-17, 2018, Fri.-Tues. AERA, New York, NY, www.aera.net
May 25 - June 22, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 29 - July 27, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
Aug. 10 - Sept. 7, 2018, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com
Oct. 12 - Nov. 9, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
The HTML to add "Coming Rasch-related Events" to your webpage is:
<script type="text/javascript" src="https://www.rasch.org/events.txt"></script>

 

The URL of this page is www.rasch.org/rmt/rmt34a.htm

Website: www.rasch.org/rmt/contents.htm