An All-Purpose Person Fit Statistic?

Li & Olejnik (1997) investigate six person fit statistics and conclude that "Practitioners need no longer be confused by the large number of possible person-fit indexes available to detect nonfitting examinees. The lz index will provide as reliable and accurate identification of unusual responding as other person fit statistics" (p. 229). Oh joy! Oh rapture! Our fit detection problems are finally over! - But wait, Klauer (1995) cautions "a given person [index] implies a bias for detecting the [index]-specific deviations and a bias against detecting other kinds of deviations" (p. 109). Since Li & Olejnik consider only dichotomous tests, let us contain our joy while we evaluate the utility of lz for polytomies.

lz (Hulin et al., 1983) is a likelihood-based statistic, computed in several steps:

Step 1. Compute the likelihood, n, of the observed response string for person n of estimated ability Bn,


where Pnix is the Rasch-model probability that person n on item i would respond in the observed category x of a 0,M rating scale. This has log-likelihood,


Step 2. Compute the expected log-likelihood,


Step 3. Compute the log-likelihood variance,


Step 4. Compute the lz index,


The lz index is standardized, so that a value of 0.0 is intended to reflect a perfectly typical response string. Values greater than 2.0 could indicate unexpectedly good fit (overfit). Values below -2.0 could indicate unexpectedly poor fit (noise).

Applying this to Smith's (1996, RMT 10:3 p.516-7) simulated polytomous data produces the results in the Table. In Section A, lz is skewed. Response strings simulated to fit the model have values around .75, rather than the proclaimed 0.0.

In Section B, overfitting (Guttman-like) response strings reach only as far as 1.9, and so are not large enough to be flagged. Response patterns corresponding to misuse of the ratings scale (central tendency, extremism, erratic behavior), often detected by INFIT and OUTFIT, are not detected by lz.

In Section C, large negative values of lz do usefully pin-point grossly aberrant response patterns. Even so, there was only one instance in which lz flagged misbehavior which INFIT and OUTFIT missed. This was for the "one category" response set (Block V, line 1) which the lz value of -2.06 flags as unexpected. However, the RPM (point-measure correlation, similar to point-biserial) value of 0.0 provides a powerful and immediate diagnosis of this response string. In general, lz does not flag borderline response strings.

Alas, lz is markedly less useful than INFIT and OUTFIT mean-squares. Our joy must be deferred.

John M. Linacre

Hulin, C.L., Drasgow F., Parsons C. (1983) Item Response Theory: Applications to Psychological Measurement. Homewood Il: Dow & Jones Irwin.

Klauer K.C. (1995) The assessment of person fit. p. 97-110. In G.H. Fischer & I.W. Molenaar (Eds.) Rasch Models: Foundations, Recent Developments and Applications. New York: Springer Verlag.

Li M.F, Olejnik S. (1997) The power of Rasch person-fit statistics in detecting unusual response patterns. Applied Psychological Measurement 21:3, p. 215-231

Investigation of the lz Person Fit Statistic
Response String
Easy..........Hard
INFIT
MnSq
OUTFIT
MnSq
RPM
Corr.
lz Index Diagnosis
Diagnostic use: 1.0 typical
>1.3 noisy
<0.7 overfit
1.0 typical
>1.3 noisy
<0.7 overfit
<0.0 reversed
0.0 useless
0.0 typical
<-2.0 noisy
>2.0 overfit
A. Data fits model - lz biased positive, OUTFIT and INFIT at expectation
I. model-fitting:
33333132210000001011
31332332321220000000
33333331122300000000
33333331110010200001
 
.98
.98
1.06
1.03
 
.99
1.04
.97
1.00
 
.78
.81
.87
.81
 
.49
.75
1.12
.72
 
lz bias of 0.7 misleadingly diagnoses data as too deterministic
B. Poor fit, merits attention - lz flags 1 of 12, OUTFIT 9 of 12
II. overfitting (muted):
33222222221111111100
33333222221111100000
33333333221100000000
32222222221111111110
32323232121212101010
 
.18
.31
.80
.21
.52
 
.22
.35
.77
.26
.54
 
.92
.97
.93
.89
.82
 
1.95
2.67
1.95
1.39
1.27
 
most expected
most likely
high discrimination
low discrimination
tight progression
III. limited categories:
33333333332222222222
22222222221111111111
33333322222222211111
 
.24
.24
.16
 
.24
.34
.20
 
.87
.87
.93
 
1.27
.67
1.95
 
high (low) categories
2 central categories
only 3 categories
IV. informative-noisy:
32222222201111111130
33233332212333000000
33333333330000000000
33133330232300101000
 
.94
1.25
1.37
1.49
 
1.22
1.09
1.20
1.40
 
.55
.77
.87
.72
 
-.85
.21
.67
-.56
 
noisy outliers
erratic transitions
extreme categories
noisy progression
C. Obvious gross misfit, requires attention - lz and RPM flag 10 of 10
V. non-informative:
22222222222222222222
12121212121212121212
03202002101113311002
01230123012301230123
03030303030303030303
 
.85
1.50
2.99
3.62
5.14
 
1.21
1.96
3.59
4.61
6.07
 
.00
-.09
-.01
-.19
-.09
 
-2.06
-3.73
-6.73
-9.34
-12.54
 
one category
central flip-flop
random responses
rotate categories
extreme flip-flop
VI. contradictory:
11111122233222111111
22222222223333333333
11111111112222222222
00111111112222222233
00000000003333333333
 
1.75
2.11
2.56
4.00
8.30
 
2.02
4.13
3.20
5.58
9.79
 
.00
-.87
-.87
-.92
-.87
 
-4.13
-6.64
-7.33
-11.82
-23.35
 
folded pattern
high reversal
central reversal
Guttman reversal
extreme reversal
This is the BIGSTEPS control file for the data above:

&inst
TITLE='COMPUTING STATISTICS'
NI=20
ITEM1=1           ; include response strings in person name
name1=1
namlen=30
CODES=0123
ptbis=no          ; compute point-measure correlation
INUMB=YES         ; no item labels
TFILE=*
6        ; Table 6 - persons in fit order
18       ; table 18 - persons in entry order
*
IAFILE=*          ; item anchor values - uniform
1 -1.9
2 -1.7
3 -1.5
4 -1.3
5 -1.1
6 -0.9
7 -0.7
8 -0.5
9 -0.3
10 -0.1
11 0.1
12 0.3
13 0.5
14 0.7
15 0.9
16 1.1
17 1.3
18 1.5
19 1.7
20 1.9
*
SAFILE=*          ; step anchor values
0 0
1 -1
2 0
3 1
*
&end
33333132210000001011 modelled
31332332321220000000 modelled
33333331122300000000 modelled
33333331110010200001 modelled
33222222221111111100 most expected        
33333222221111100000 most likely          
33333333221100000000 high discrimination  
32222222221111111110 low discrimination   
32323232121212101010 tight progression    
33333333332222222222 high (low) categories
22222222221111111111 2 central categories 
33333322222222211111 only 3 categories    
32222222201111111130 noisy outliers       
33233332212333000000 erratic transitions  
33333333330000000000 extreme categories   
33133330232300101000 noisy progression    
22222222222222222222 one category         
12121212121212121212 central flip-flop    
03202002101113311002 random responses     
01230123012301230123 rotate categories    
03030303030303030303 extreme flip-flop    
11111122233222111111 folded pattern       
22222222223333333333 high reversal        
11111111112222222222 central reversal     
00111111112222222233 Guttman reversal     
00000000003333333333 extreme reversal     

An All-Purpose Person Fit Statistic? Linacre J.M. … Rasch Measurement Transactions, 1997, 11:3 p. 582-3.


The URL of this page is www.rasch.org/rmt/rmt113n.htm

Website: www.rasch.org/rmt/contents.htm