An Example of Grader Consistency using the Multi-Facet Model

The issue of consistent grader severity is an on-going concern for all who score performance examinations. This study explored the consistency of common grader severity across three performance examination administrations. Each performance examination administration was analyzed using the multi-facet Rasch model which produced calibrations of grader severity.

The data are from three annual administrations of a medical oral examination labeled administrations A, B, and C. Between administrations, there were some common graders and some non-common graders. To be included in the study, a common grader had to rate candidates in at least two of the three administrations, although some graders were common to all three administrations. In this study, there were 115 common graders who met this criterion. This examination also had standardized items and tasks which graders used to rate the candidates. The candidates for each of the three administrations were completely different; however, the examination process was the same.

Graders rate a random sample of the candidates who take the examination in a given administration. During the course of each examination administration each grader gives many ratings which are used to calibrate his/her severity. Because so many ratings are given by each examiner, the calibrations of grader leniency or severity are very precise.

The items in this oral examination were carefully developed for consistency and content coverage. The skills being rated were well defined and the same across all administrations. The rating scale is well defined for each rating level. Graders were trained prior to the examination with regard to the content of the items and examination procedures. Many of the graders have a great deal of experience in the examination process. The multi-facet formula used for this analysis was:

where Bn = ability of candidate n;
Di = difficulty of item i;
Cj = severity of grader j;
Hk = difficulty of task k; and
Fx = Rasch-Andrich threshold or step calibration.

Because the examination materials are so well standardized, differences in grader severity within examination administrations are most likely due to inherent differences in grader expectations and standards, which will probably not change substantially due to training. Grader severity was calibrated using the multifacets model for each of the three examination administrations. The center of each scale was anchored at 0.00 logits for all three exam administrations. Next the grader severity calibrations were compared across examination administrations using z-scores and correlations for the common graders.

Using the grader severity estimates and their measurement errors, the standardized difference between grader severities across administrations was calculated using zscores (Forsyth., Sarsangjan, and Gilmer, 1981). The formula used to obtain standardized differences for grader severity calibrations is:

where Cj1 and Cj2 are grader severity estimates for each administration, and Sj1 and Sj2 are the estimated measurement errors associated with these severity estimates.

The calibrated severity estimates for the common graders ranged from -1.78 to1.55 logits during administration A, from -2.07 to 1.50 logits during administration B and from -1.96 to 1.52 logits during administration C. Within each examination administration, the severity estimates among graders were significantly different from each other as indicated by a Chi-Square test and a Separation reliability. This difference in grader severity was significant even after training and working within a carefully structured examination process.

An absolute z-score of 1.96 or greater, indicates 95% confidence that there is a statistically significant difference in grader severity across administrations. Comparison of the grader severity estimates across administrations using the z-score analysis found that of the 115 common graders, only one was statistically significantly different in severity across administrations at the 95% confidence level. The common grader who was significantly different was very lenient during administration A, but significantly more severe during administrations B and C.

The graders within an administration were significantly different from each other in severity; however, they were consistent within themselves within and across examination administrations. This suggests that severity is a grader characteristic that should be included in the analysis of performance examinations to improve validity and reliability. The multi-facet model provides the opportunity to incorporate this facet into analysis of performance examinations and to better understand grader grading patterns.

Forsyth., Sarsangjan, and Gilmer, 1981, Forsyth, R., Sarsangjan, V. and Gilmer, J. (1981). Some empirical results related to the robustness of the Rasch model. Applied Psychological Measurement, 5, 175-186.

An Example of Grader Consistency using the Multi-Facet Model. Mary E. Lunz … Rasch Measurement Transactions, 2007, 21:2 p. 1101-1102

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn. George Engelhard, Jr. & Jue Wang	Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Apr. 21 - 22, 2025, Mon.-Tue.	International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025	On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025	On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
July 21 - 23, 2025, Mon.-Wed.	Pacific Rim Objective Measurement Symposium (PROMS) 2025, www.proms2025.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com