The Measurement of Examiner Consistency

MEASUREMENT RESEARCH ASSOCIATES

TEST INSIGHTS

November 2008

Greetings!

An oral certification examination is very complex and requires the interaction of examiners, candidates, cases, and tasks in the scoring process. Examiners are primarily responsible for scoring, so the consistency with which they score is an important aspect of the validity and reliability of the examination.

Lidia Martinez
Manager, Computer Based Testing and Analysis

The Measurement of Examiner Consistency

The multi-facet analysis provides measures of the consistency with which examiners score candidates. The outfit mean-square statistic, the one we find most useful, is based on the ratio of observed variance to expected variance. This fit statistic has an expectation of 1.00, that is, examiners rated as expected, given their severity, the ability of the candidate, and the difficulty of the task.

Various criteria for acceptable values have been used. For most medical oral examinations, we use the criteria of less than 0.5 or greater than 1.5. A fit statistic of less than 0.5 indicates a non-discriminating or overly consistent examiner who tends to give the same rating to all candidates. A fit statistic greater than 1.5 indicates an inconsistent examiner who gave some ratings that are higher than expected to some less able candidates, and/or some lower than expected ratings to some able candidates.

One of the reasons examiners can show up as misfitting is due to deviations from their pattern of ratings. For example, if an examiner is relatively lenient, but gives several unsatisfactory ratings to relatively able candidates, the examiner is likely to show up with a large outfit statistic that reflects that examiner's inconsistency in rating.

On the other hand, if an examiner rarely deviates from a particular rating, he/she is likely to have a low fit statistic. For example, if an examiner tends to give ratings of 3 (satisfactory) to 90% of the candidates scored, regardless of the candidates' abilities, this examiner is likely to show up with a outfit statistic less than 0.5. Since it is unlikely that this examiner had only satisfactory or better candidates, it is highly probable that the examiner is not using the rating scale to distinguish between the less able and more able candidates.

Fortunately, relatively few examiners show up as misfits in the oral certification examinations we typically analyze. The table shows the number of examiners for five different oral certification exams and the number and percent of examiners who misfit. This suggests that examiners are familiar with the scoring process and able to distinguish among the medical skill and expertise of the candidates they score. Since examiners' ratings are critical to oral examination scoring, it is re-assuring to know that they do their jobs effectively.

Exam	N of Examiners	N fit < 0.5 (Non Discriminating)	N fit > 1.5 (Inconsistent)	Total % Misfit Examiners
Exam 1	109	4	11	14%
Exam 2	159	10	5	9%
Exam 3	24	0	0	0%
Exam 4	33	2	4	18%
Exam 5	49	0	2	4%

Measurement Research Associates, Inc.

505 North Lake Shore Dr., Suite 1304

Chicago, IL 60611

Phone: (312) 822-9648 Fax: (312) 822-9650

www.MeasurementResearch.com

Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen
Rasch Books and Publications: Winsteps and Facets
Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Rasch Models for Solving Measurement Problems (Facets), George Engelhard, Jr. & Jue Wang	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan

Coming Rasch-related Events
May 17 - June 21, 2024, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 12 - 14, 2024, Wed.-Fri.	1st Scandinavian Applied Measurement Conference, Kristianstad University, Kristianstad, Sweden http://www.hkr.se/samc2024
June 21 - July 19, 2024, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
Aug. 5 - Aug. 6, 2024, Fri.-Fri.	2024 Inaugural Conference of the Society for the Study of Measurement (Berkeley, CA), Call for Proposals
Aug. 9 - Sept. 6, 2024, Fri.-Fri.	On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com
Oct. 4 - Nov. 8, 2024, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com