Investigating Judge Local Independence

Local independence is required of data that are to support Rasch measures. Local independence exists when the Rasch measures explain all systematic differences among the data, so that there is independence among the residual differences between the observed data and those expected from the Rasch measures. When judges award ratings, it may not be obvious whether their task is to act as independent experts or merely to code data. An investigation into local independence can help to clarify this.

"Analysis of the fit of data to [local independence] is the statistical device by which data are evaluated for their measurement potential - for their measurement validity" [Wright 1991 RMT 5:3 p.159]. Yet typical chi-square fit statistics, such as INFIT and OUTFIT, detect lack of local independence only indirectly. If the same item is repeated twice in an MCQ test, then each item predicts the responses to the other too well. This means that the residuals for both items are smaller than expected, leading to smaller than expected chi-square statistics. But no direct indication is given that the two small chi-squares are caused by an interaction between these two particular items. An investigation of response covariance would immediately flag the interdependency of the two items.

Can covariance investigation also detect a lack of judge independence? A carefully conducted study of judge behavior was Rasch analyzed. Examinees performed several writing tasks. Each examinee-task performance was rated separately by each judge.

Initial analysis indicated that the spread of judge severities was about one-third that of examinee abilities. Certainly too big to be ignored. The judge mean-square chi-square fit statistics for these well-trained judges ranged from 0.5 to 1.4 - not unusual for this type of rating situation. Even though these judges seemed to be exercising their expertise independently enough, judge rating covariances were investigated.

The actual judge rating covariances were calculated from the observed ratings. Then a simulation of independent ratings was generated from the Rasch estimates of judge severity, examinee ability, writing task difficulty, and rating scale structure. The judge covariances for the simulated data were also estimated. Comparison of the covariances is intriguing.

The judge plot shows the frequency of judge covariance size for the observed and simulated data sets. The covariances for the simulated, locally independent data are centered on 0, and rarely get above 0.5 score points. But none of the observed covariances are below 0, and one is just above 1 score point. The largest covariance is between two judges identified as most unpredictable (noisy) by the chi-square statistics. The covariances of the other judges with the most predictable judge are generally about 0.25 score points.

As a check on the study, the covariance of examinee responses was also computed. These are shown in the examinee plot. They raise no special concerns because their center is close to 0, with most covariances less than 0.5 score-points.

Positive judge covariances imply that when one judge gives a higher than expected rating to a particular examinee on a particular task, then the others also tend to, or when one gives a lower than expected rating, then so do the others. These tendencies are apart from any systematic rating patterns across examinees or tasks, which would raise or lower the corresponding measures. It seems there is something in particular examinee-task performances that prompts the judges, en masse, to raise or lower their severity levels. Perhaps this indicates that the judges are not exhibiting the local independence the model specifies, or perhaps it indicates local strength or weakness by subsets of examinees on tasks.

What are the measurement implications of judge over-conformity? Lack of local independence, just like other forms of misfit, degrades the measurement process and increases standard errors. The judges are acting like bathroom scales with the 0 calibrated at different weights. There must still be an adjustment for their relative severities. On the other hand, their ratings are not fully independent, so that each extra rating does not contain as much new statistical information as previous ones. This means that the precision of measurement is not as great as the number of ratings suggests. Consequently, model-based standard errors are too small.

John Michael Linacre

Investigating Judge Local Independence. Linacre J. M. … Rasch Measurement Transactions, 1997, 11:1 p. 546-7.

Rasch Publications
Rasch Measurement Transactions (free, online) Rasch Measurement research papers (free, online) Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Applying the Rasch Model 3rd. Ed., Bond & Fox Best Test Design, Wright & Stone
Rating Scale Analysis, Wright & Masters Introduction to Rasch Measurement, E. Smith & R. Smith Introduction to Many-Facet Rasch Measurement, Thomas Eckes Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. Statistical Analyses for Language Testers, Rita Green
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Journal of Applied Measurement Rasch models for measurement, David Andrich Constructing Measures, Mark Wilson Rasch Analysis in the Human Sciences, Boone, Stave, Yale
in Spanish: Análisis de Rasch para todos, Agustín Tristán Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez

To be emailed about new material on
please enter your email address here:

I want to Subscribe: & click below
I want to Unsubscribe: & click below

Please set your SPAM filter to accept emails from welcomes your comments:

Your email address (if you want us to reply):


ForumRasch Measurement Forum to discuss any Rasch-related topic

Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website,

Coming Rasch-related Events
May 17 - June 21, 2024, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps),
June 12 - 14, 2024, Wed.-Fri. 1st Scandinavian Applied Measurement Conference, Kristianstad University, Kristianstad, Sweden
June 21 - July 19, 2024, Fri.-Fri. On-line workshop: Rasch Measurement - Further Topics (E. Smith, Winsteps),
Aug. 5 - Aug. 6, 2024, Fri.-Fri. 2024 Inaugural Conference of the Society for the Study of Measurement (Berkeley, CA), Call for Proposals
Aug. 9 - Sept. 6, 2024, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets),
Oct. 4 - Nov. 8, 2024, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps),
Jan. 17 - Feb. 21, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps),
May 16 - June 20, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps),
June 20 - July 18, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets),
Oct. 3 - Nov. 7, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps),


The URL of this page is