# An Adjustment for Sample Size in DIF Analysis

A statistical test for differential item functioning (DIF, item bias) between two groups, 1 and 2, is:

[1]

where di(1) - di(2) is the shift of the item i measures between groups 1 and 2; sei(1) and sei(2) are the standard errors of the item measures. A shift greater than 0.5 logits is considered evidence of DIF, and a t value of 1.96 or more is statistically significant (p<.05) for large samples (Draba, 1977). The ETS DIF Classification is similar, but more elaborate. Draba's rule approximates the ETS Category B rule.

ETS DIF Categorywith DIF Size (Logits) and DIF Statistical Significance
C = moderate to large|DIF| ≥ 0.64 logitsprob( |DIF| ≤ 0.43 logits ) < .05
approximately: |DIF| > 0.43 logits + 2 * DIF S.E.
B = slight to moderate|DIF| ≥ 0.43 logitsprob( |DIF| = 0 logits ) < .05
approximately: |DIF| > 2 * DIF S.E.
A = negligible--
C-, B- = DIF against focal group; C+, B+ = DIF against reference group
Note: ETS (Educational Testing Service) use Delta units. 1 logit = 2.35 Delta δ units. 1 Delta δ unit = 0.426 logits.
Zwick, R., Thayer, D.T., Lewis, C. (1999) An Empirical Bayes Approach to Mantel-Haenszel DIF Analysis. Journal of Educational Measurement, 36, 1, 1-28

Formula [1] has been employed in a 21 items test, answered by 21820 persons, and the purpose is to detect gender bias. Standard errors are deliberately shown with 4 decimal places. It appears that even for small shift values, 20 of the 21 items produce significant ti(12) values, indicating that practically all the items show gender bias according to the significance rule alone. The same analysis was developed with sub-samples of 250 women and 250 men, now only 5 of the 21 items produce significant t values and so it is a minority of items that show gender bias. The main reason of the difference in results is that the standard errors of the item measures depend on the sizes of the focus and reference groups. But, according to Clauser & Hambleton (1994), "DIF analysis should be based on the largest sample available", so their guideline implies that even the smallest difference could be significant, nullifying the ETS Significance rule.

It can be seen that the conventional t values, ti(12), differ considerably across calculations. The measure estimates for the large sample are more precise than for the sub-sample, so the large sample produces smaller standard errors of measurement and higher t values. This suggests that a DIF Significance test is needed that is robust against sample size. Here is one based on computationally normalizing the empirical sample size of N to a standard sample size of 100:

[2]

The reference value of 100 has been chosen because it is not only a simple number to remember, but also because the mean S.E.normalized for item p-values between 0.001 and 0.999 is 0.965, i.e., close to 1, and, as can be seen in Fig. 1, the minimum possible value of S.E.normalized is 0.2. These values are of the same order of magnitude as the t values expected according to formula [1]. In addition, t values bigger than 1.96 may occur for shifts of 0.55 (Fig. 2), which closely corresponds to the half logit rule, so the conclusions of the DIF size and DIF significance rules are in accord. For a closer match with the ETS criteria for DIF Category B, instead of 100, normalize with 100*(0.43/0.55)^2 ≈ 60.

Figure 1. Values of S.E.normalized for different item p-values.

Figure 2. Theoretical t values as a function of shift and S.E. normalized

Figure 2 shows that, independently of the S.E., shifts below 0.55 imply that no item bias will be present and that shifts above 1.3 imply item bias (both are close to the 0.5 logit rule and the 1.5 logit rule). Therefore bias depends on S.E.normalized (i.e., item-sample targeting) only for shifts between 0.55 and 1.3 logits.

The right-hand columns of Table 1 show the normalized standard errors and t-tests. It can be seen that the size of S.E.normalized is comparable between the complete sample and the sub-sample and that the t values, ti(12)n, are similar. The DIF sizes and significances are now in closer accord. It can now be seen that is it meaningful to apply both the size and significance criteria in classifying items for DIF.

Agustín Tristán

Clauser B.E. & Hambleton, R.K. (1994) Review of Differential Item Functioning, P. W. Holland, H. Wainer. Journal of Educational Measurement, 31, 1, 88-92.

Draba R.E. (1977) The identification and interpretation of item bias. Educational Statistics Laboratory. Memo 25. University of Chicago. www.rasch.org/memo25.htm

An Adjustment for Sample Size in DIF Analysis, Tristan A. … Rasch Measurement Transactions, 2006, 20:3 p. 1070-1

Rasch Publications
Rasch Measurement Transactions (free, online) Rasch Measurement research papers (free, online) Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Applying the Rasch Model 3rd. Ed., Bond & Fox Best Test Design, Wright & Stone
Rating Scale Analysis, Wright & Masters Introduction to Rasch Measurement, E. Smith & R. Smith Introduction to Many-Facet Rasch Measurement, Thomas Eckes Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. Statistical Analyses for Language Testers, Rita Green
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Journal of Applied Measurement Rasch models for measurement, David Andrich Constructing Measures, Mark Wilson Rasch Analysis in the Human Sciences, Boone, Stave, Yale
in Spanish: Análisis de Rasch para todos, Agustín Tristán Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez

 Forum Rasch Measurement Forum to discuss any Rasch-related topic

Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
March 21, 2019, Thur. 13th annual meeting of the UK Rasch user group, Cambridge, UK, http://www.cambridgeassessment.org.uk/events/uk-rasch-user-group-2019
April 4 - 8, 2019, Thur.-Mon. NCME annual meeting, Toronto, Canada,https://ncme.connectedcommunity.org/meetings/annual
April 5 - 9, 2019, Fri.-Tue. AERA annual meeting, Toronto, Canada,www.aera.net/Events-Meetings/Annual-Meeting
April 12, 2019, Fri. On-line course: Understanding Rasch Measurement Theory - Master's Level (G. Masters), https://www.acer.org/au/professional-learning/postgraduate/rasch
May 24 - June 21, 2019, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
May 22 - 30, 2019, Wed.-Thu. Measuring and scale construction (with the Rasch Model), University of Manchester, England, https://www.cmist.manchester.ac.uk/study/short/intermediate/measurement-with-the-rasch-model/
June 4 - 7, 2019, Tue.-Fri.In-Person Italian Rasch Analysis Workshop based on RUMM (entirely in Italian). For enquiries and registration email to workshop.rasch@gmail.com.
June 17-19, 2019, Mon.-Wed. In-person workshop, Melbourne, Australia: Applying the Rasch Model in the Human Sciences: Introduction to Rasch measurement (Trevor Bond, Winsteps), Announcement
June 20-21, 2019, Thurs.-Fri. In-person workshop, Melbourne, Australia: Applying the Rasch Model in the Human Sciences: Advanced Rasch measurement with Facets (Trevor Bond, Facets), Announcement
June 28 - July 26, 2019, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
July 2-5, 2019, Tue.-Fri. 2019 International Measurement Confederation (IMEKO) Joint Symposium, St. Petersburg, Russia,https://imeko19-spb.org
July 11-12 & 15-19, 2019, Thu.-Fri. A Course in Rasch Measurement Theory (D.Andrich), University of Western Australia, Perth, Australia, flyer - http://www.education.uwa.edu.au/ppl/courses
Aug 5 - 10, 2019, Mon.-Sat. 6th International Summer School "Applied Psychometrics in Psychology and Education", Institute of Education at HSE University Moscow, Russia.https://ioe.hse.ru/en/announcements/248134963.html
Aug. 9 - Sept. 6, 2019, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com
August 25-30, 2019, Sun.-Fri. Pacific Rim Objective Measurement Society (PROMS) 2019, Surabaya, Indonesia https://proms.promsociety.org/2019/
Oct. 11 - Nov. 8, 2019, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Nov. 3 - Nov. 4, 2019, Sun.-Mon. International Outcome Measurement Conference, Chicago, IL,http://jampress.org/iomc2019.htm
Jan. 24 - Feb. 21, 2020, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
May 22 - June 19, 2020, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 26 - July 24, 2020, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
Aug. 7 - Sept. 4, 2020, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com
Oct. 9 - Nov. 6, 2020, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 25 - July 23, 2021, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com