August 2009
By measuring levels of examiner rating severity, differences can be accounted for in the multi-facet analysis.  Even though examiners differ in levels of rating severity, they are able to distinguish among candidates' abilities.

Lidia Martinez
Manager, Test Development and Analysis

How Examiners of Different Severity Grade Candidates of Different Ability
Overall differences in rating severity have been shown for all oral performance examinations that require the intervention of examiners.  One of the assumptions of the multi-facet model is that more able candidates will earn higher scores regardless of the severity of the examiners or the difficulty of the task or item. The purpose of this study is to show how examiners of different measured severity grade candidates of different measured ability.

For this study, data were simulated based on oral examination data and the multi-facet analysis was completed. The multi-facet analysis program calculates a rating severity for each examiner based on all of the ratings they give to all of the candidates they encounter during the examination. The mean for examiner severity is 5.00 with a range of 2.99 to 9.34 scaled score points. Examiners were then divided into three groups based on their severity, and labeled lenient (1 SD below the mean), moderate (within 1 SD of the mean), and severe (1 SD above the mean).

The candidates were also divided into three groups based on their measured ability with low ability (1 SD below the mean), moderate ability (within 1 SD of the mean), and high ability (1 SD above the mean).  Each graph shows each candidate ability group, with bars for the percent of ratings given by lenient, moderate and severe examiners to that group. The rating scale is 1 = unsatisfactory (blue); 2 = marginal (green); 3 = satisfactory (tan), and 4 = excellent (purple). The graphs are based on counts of the raw ratings given by each examiner.

The first graph shows the high ability candidates (1SD above the mean).  As shown in the graph below, severe examiners used the full range of ratings, gave fewer 4s than other examiners, but many 3s (over 60%).  Moderate examiners gave primarily 3s and many 4s, while lenient examiners gave primarily 4s (over 60%) and many 3s.  While the patterns for all groups of examiners suggest able candidates, they are somewhat different depending upon the overall expectations of the examiners.

lidia graph 1
Figure 1.
The pattern for the least able candidates is quite different. Lenient and moderate examiners used the entire rating scale. All examiners gave the lower ratings of 1 and 2, but lenient examiners gave them less frequently than moderate and severe examiners. Lenient examiners gave 3s about 60% of the time, but 1s and 2s were given to some candidates. Moderate examiners gave primarily 2s and 3s. Only 35% of the ratings given by severe examiners were 3s and there were no 4s.
lidia graph 1
Figure 2.
For the moderate group of candidates, examiners of all levels of severity used the full range of the rating scale.  Examiners tended to give a high percentage of 3s, regardless of their severity. It is highly probable that 3s are appropriate for moderate ability candidates. The lenient examiners, as expected, gave the most 4s, but approximately 65% of their ratings were 3s, which is similar to the moderate and severe examiners.
lidia graph 1
Figure 3.
It appears that all examiners, regardless of their level of severity, can identify differences in candidate performance and give commensurate ratings, even though those ratings may lean toward a pattern of severity, moderation, or leniency.  The lenient examiners give higher ratings more frequently than other examiners, while the more severe examiners tend to give lower ratings more frequently than other examiners. Since the multi-facet analysis accounts for these differences in examiner rating patterns, all candidates who are able to pass have a comparable opportunity to pass regardless of the rating severity of the examiners they happen to encounter.
Measurement Research Associates, Inc.
505 North Lake Shore Dr., Suite 1304
Chicago, IL  60611
Phone: (312) 822-9648     Fax: (312) 822-9650

Please help with Standard Dataset 4: Andrich Rating Scale Model

Rasch Publications
Rasch Measurement Transactions (free, online) Rasch Measurement research papers (free, online) Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Applying the Rasch Model 3rd. Ed., Bond & Fox Best Test Design, Wright & Stone
Rating Scale Analysis, Wright & Masters Introduction to Rasch Measurement, E. Smith & R. Smith Introduction to Many-Facet Rasch Measurement, Thomas Eckes Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. Statistical Analyses for Language Testers, Rita Green
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Journal of Applied Measurement Rasch models for measurement, David Andrich Constructing Measures, Mark Wilson Rasch Analysis in the Human Sciences, Boone, Stave, Yale
in Spanish: Análisis de Rasch para todos, Agustín Tristán Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez

To be emailed about new material on
please enter your email address here:

I want to Subscribe: & click below
I want to Unsubscribe: & click below

Please set your SPAM filter to accept emails from welcomes your comments:
Please email inquiries about Rasch books to books \at/

Your email address (if you want us to reply):


FORUMRasch Measurement Forum to discuss any Rasch-related topic

Coming Rasch-related Events
July 31 - Aug. 3, 2017, Mon.-Thurs. Joint IMEKO TC1-TC7-TC13 Symposium 2017: Measurement Science challenges in Natural and Social Sciences, Rio de Janeiro, Brazil,
Aug. 7-9, 2017, Mon-Wed. In-person workshop and research coloquium: Effect size of family and school indexes in writing competence using TERCE data (C. Pardo, A. Atorressi, Winsteps), Bariloche Argentina. Carlos Pardo, Universidad Catòlica de Colombia
Aug. 7-9, 2017, Mon-Wed. PROMS 2017: Pacific Rim Objective Measurement Symposium, Sabah, Borneo, Malaysia,
Aug. 10, 2017, Thurs. In-person Winsteps Training Workshop (M. Linacre, Winsteps), Sydney, Australia.
Aug. 11 - Sept. 8, 2017, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets),
Aug. 18-21, 2017, Fri.-Mon. IACAT 2017: International Association for Computerized Adaptive Testing, Niigata, Japan,
Sept. 15-16, 2017, Fri.-Sat. IOMC 2017: International Outcome Measurement Conference, Chicago,
Oct. 13 - Nov. 10, 2017, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps),
Oct. 25-27, 2017, Wed.-Fri. In-person workshop: Applying the Rasch Model hands-on introductory workshop, Melbourne, Australia (T. Bond, B&FSteps), Announcement
Jan. 5 - Feb. 2, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps),
Jan. 10-16, 2018, Wed.-Tues. In-person workshop: Advanced Course in Rasch Measurement Theory and the application of RUMM2030, Perth, Australia (D. Andrich), Announcement
Jan. 17-19, 2018, Wed.-Fri. Rasch Conference: Seventh International Conference on Probabilistic Models for Measurement, Matilda Bay Club, Perth, Australia, Website
April 13-17, 2018, Fri.-Tues. AERA, New York, NY,
May 25 - June 22, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps),
June 29 - July 27, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps),
Aug. 10 - Sept. 7, 2018, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets),
Oct. 12 - Nov. 9, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps),
The HTML to add "Coming Rasch-related Events" to your webpage is:
<script type="text/javascript" src=""></script>