DIF detection: Rasch versus Mantel-Haenszel

Schulz et al. (1989) compare the Rasch (RM) and Mantel-Haenszel (MH) procedures for detecting differential item functioning (DIF) (also RMT 1989 3:2 51-53). The RM procedure, following Wright et al. (1976), was implemented with computer programs MSCALE and LINK (Schulz 1984). The MH procedure (Holland and Thayer 1988) was implemented with program MHDIP (Raju 1988).

Sensitivity to DIF: With small groups, MH was significantly less sensitive to DIF than RM. MH indicates significance with null hypothesis chi-squares. The observed variance of these chi-squares was less than modelled. Male/Female DIF detected by both RM and MH when groups were N=1000 were lost by MH when groups were randomly reduced to N=100, but still detected by RM.

Reliability: Contrary to MH claims, empirical results show RM to be more reliable than MH when groups are small (N=100 to 200), and always as reliable when groups are large (N>300).

Validity: When groups are comparable in achievement, RM and MH detect "the same thing". Since RM and MH DIF indices from the male/female contrast correlate at their statistical maximum, .99, one cannot explain the greater sensitivity and reliability of RM as due to the two methods detecting "something different".

DIF versus Between-Group Achievement Differences: DIF must not be confused with real group differences in achievement. To be acceptable, a DIF procedure must produce "no net DIF" over items. Three of the four MH variants fail this criterion. The MH variants differ in 1) whether the studied item is included or excluded from the total score used for matching, and 2) whether matching is fat or fine (fat: seven or less levels of total score; fine: all possible levels of total score). When contrast groups differ significantly in achievement, RM yields "no net DIF". The only MH variant which yields "no net DIF" is the one which includes the studied item in the total score and uses all levels of total score for matching. This is the MH variant most similar to RM. When contrast groups differ in achievement, the other three MH variants yield net DIF across items that is significantly different from zero (p>.001).

DIF versus Item-by-Achievement Interactions: DIF is intended to detect item-by-group interaction exclusively. When groups differ in achievement too, some DIF indices confound item-by-group and item-by- achievement interactions. The correlation between RM and MH DIF indices was at its theoretical maximum of 0.99 for equal achievement contrasts. But it was substantially less (r=.81) than the theoretical maximum (.98) for unequal achievement contrasts. Now RM and MH DIF procedures no longer detect "the same thing".

The differences between RM and MH DIF indices estimated from unequal achievement contrasts are systematically related to item-by- achievement interactions of the kind detected by RM item fit statistics. Highly discriminating (low infit) items are biased in favor of high achievers while poorly discriminating (high infit) items are biased in favor of low achievers. Thus RM DIF indices correlate positively with RM infit statistics (r=.32), but, inexplicably, MH DIF indices correlate negatively (r=-.32).

Recommendations: When contrast groups differ in achievement, then construct achievement-matched samples of the largest possible size. When contrast groups are achievement-matched, RM item Z-scores:

are more sensitive to DIF than MH chi-squares and at least as reliable. A practical advantage of RM is that it measures DIF in the same units as person achievement.

Holland PW & Thayer DT 1988 Differential item performance and Mantel- Haenszel. In H Wainer and H Braun (Eds.), Test Validity. Hillsdale NJ: Lawrence Erlbaum

Schulz EM 1984 LINK. A program for comparing paired Rasch estimates and linking tests. Chicago: MESA Press.

Schulz EM, Perlman CP, Rice WK, Wright BD 1989 Empirical Comparison of Rasch and Mantel-Haenszel Procedures. AERA

Wright BD, Mead RJ, Draba R 1976 Detecting and correcting test item bias with a logistic model. Chicago: MESA.

DIF detection: Rasch versus Mantel-Haenszel, E M Schulz … Rasch Measurement Transactions, 1990, 4:2 p. 107

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn. George Engelhard, Jr. & Jue Wang	Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Apr. 21 - 22, 2025, Mon.-Tue.	International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025	On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025	On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
July 21 - 23, 2025, Mon.-Wed.	Pacific Rim Objective Measurement Symposium (PROMS) 2025, www.proms2025.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com