## DIF in Polytomous Items

Zwick & Thayer (Z&T, 1996) present elaborations of the Mantel-Haenszel dichotomous DIF method in order to estimate DIF (differential item functioning) in polytomous items. Since the Mantel-Haenszel statistic is a log-odds estimator, it produces similar DIF findings to Rasch techniques. How do the polytomous versions compare?

Extract from
Zwick & Thayer's Table 2
Rating Category
on Target Item
Subject Group 1 2 3
Low Performers: Reference
Focal
13
5
5
14
7
4
High Performers: Reference
Focal
28
1
54
2
98
10

Z&T present a small data set which can be easily analyzed with Rasch programs. Examining their Table, one can see that, though there are many more subjects in the Reference than the Focal groups (as expected), the average rating for both the Low and High Focal groups looks higher than for the corresponding Reference group. Could this be accidental?

```+------------------------------------------------------------+
|Obsvd    Exp.  Obsvd  Obs-Exp| Bias  Model        |         |
|Score   Score  Count  Average|Measure S.E. Z-Score|Group    |
+-----------------------------+--------------------+---------+
|  474    480.0   205     -.03|  -.05   .09    .57 |Reference|
|   80     74.0    36      .17|   .29   .22  -1.30 |Focal    |
+-----------------------------+--------------------+---------+
|  277.0  277.0   120.5    .07|  -.12   .16   -.36 |Mean     |
|  197.0  203.0    84.5    .10|   .17   .06    .93 |S.D.     |
+------------------------------------------------------------+
|Fixed (all = 0) chi-square: 2.0  d.f.: 2  significance: .36 |
+------------------------------------------------------------+
```

The Facets Rasch analysis program incorporates a post-hoc bias/interaction measurement routine. For this analysis, all low performers (regardless of group) are asserted to have the same measure, and similarly all high performers. All performers share the same three category rating scale. The analysis finds the rating scale to be not very discriminating with only .14 logits between the step difficulties for categories 12 and 23. The high performers measure .90 logits higher than the low performers.

The Facets Bias/Interaction Table (shown here) reports that the 205 ratings of the Reference group sum to 474 and the 36 for the Focal group total 80. On average the Reference group was rated .03 points lower and the Focal group .17 points higher than expected after allowing for the relative performance of the high and low strata and the structure of the rating scale. This points difference gives a measured advantage (DIF) to the Focal group on this item of .29 - -.05 = .34 logits with a joint standard error of sqrt(.09^2+.22^2) = .24 logits. The Z statistic for the DIF is .34/.24 = 1.42, slightly more conservative than Z&T's two Z statistics of 1.45 and 1.55, but equivalent in meaning as "not significantly improbable". Rasch and Z&T's methods produce similar results.

Facets also tests the hypothesis that the two reported Biases represent the same zero bias value. This fixed chi-square test yields a significance of .36, suggesting that, though the group values are somewhat far apart, it is reasonable to consider them as reflecting the same underlying common value.

Z&T are also concerned about how differences in item discrimination across groups affect DIF. In Rasch methodology this is easily investigated. Merely allow each group to define its own rating scale structure and compare results.

John Michael Linacre

Zwick R, Thayer DT (1996) Evaluating the magnitude of Differential Item Functioning in polytomous items. Journal of Educational Statistics 21:3 187-201.

DIF in polytomous items. Linacre J.M. … Rasch Measurement Transactions, 1996, 10:3 p. 520.

