Test Equating For Comparable Passing Standards

MEASUREMENT RESEARCH ASSOCIATES

TEST INSIGHTS

November 2009

Greetings

Test equating is a method of insuring that candidates are measured against the same criterion-referenced standard regardless of the test administration they challenge. An exam meant to test the same area may vary in difficulty from administration to administration. Test equating accounts for these differences so that the same criterion-referenced standard can be used.

Mary E. Lunz, Ph.D.
Executive Director

The purpose of test equating is to place examination administrations on the same Benchmark Scale. The differences in the difficulty of the two administrations are accounted for so the same criterion-referenced standard can be used from administration to administration.

For certification testing, Rasch common item test equating is frequently used in conjunction with a criterion-referenced standard. First a criterion-referenced standard is established on a Benchmark Scale. The data from an exam is used to calibrate the Benchmark scale. The exam should match the test blueprint to assure content validity, and it should include a sufficient number of items that have been field-tested or previously used, to insure that the exam is a satisfactory measure of the construct. A criterion-referenced standard can be established using any of the accepted methods, such as a modified Angoff, objective standard setting, bookmark (if item calibrations are available), or other. After the Benchmark Scale is established, the criterion-referenced standard is established as a score on that scale.

Equating to that Benchmark Scale and criterion standard requires that subsequent test administrations include a number of items that are calibrated to the Benchmark Scale (commonly called equators). The group of items chosen to be equators should represent all content areas, and should include items with a range of difficulty calibrations. The purpose of the equators is to statistically identify differences in difficulty between the Benchmark Scale and the current test administration. The current test administration may be more difficult or easier than the Benchmark Scale. Test equating allows these differences to be taken into account, so that the criterion-referenced standard can be used.

Using the Rasch model, the initial mean difficulty of the Benchmark Scale is set at a scaled score of 5.00. The mean difficulty represents the average difficulty of all items on the test. Therefore, if a subsequent test form is more difficult, the mean difficulty will be more than 5.00, but if the test is easier, the mean difficulty will be less than 5.00.

The pass point that is determined by the standard setting is set as a scaled score on the Benchmark Scale. However, if we translate the scaled score back to a percent correct, it is easier to understand how test equating works. For example, if a test administration is more difficult, the percent correct necessary to pass would be lowered to be equivalent to the criterion standard. On the other hand, if a test administration is easier, the percent correct necessary to pass would be higher to be equivalent to the criterion standard. Test equating is the statistical process that accounts for the differences in test difficulty and then adjusts the scale of the current test administration so that the same criterion standard can be used.

The table below shows how the test equating process works. Five different exams are represented. The test forms are different administrations of the each exam, each of which includes equator items and is calibrated to the Benchmark Scale. Some test administrations of a particular exam are more difficult while others are easier. The results are simulated from samples of real data and the percent to pass is an approximation for demonstration purposes.

Mean Item Difficulty and Percent Correct Equivalent of the Criterion Standard
Exam	Benchmark Scale (% pass point)	Test Form #1 (% to pass)	Test Form #2 (% to pass)	Test Form #3 (% to pass)
1	5.00 (65%)	5.39 (harder, 62%)	4.87 (easier, 67%)	5.35 (harder, 63%)
2	5.00 (60%)	5.12 (harder, 57%)	4.99 (easier, 61%)	5.17 (harder, 56%)
3	5.00 (65%)	4.98 (easier, 66%)	4.83 (easier, 67%)	4.82 (easier, 68%)
4	5.00 (55%)	5.39 (harder, 53%)	4.99 (easier, 56%)	5.20 (harder, 52%)
5	5.00 (65%)	5.28 (harder, 63%)	5.20 (harder, 64%)	5.42 (harder, 61%)

Measurement Research Associates, Inc.

505 North Lake Shore Dr., Suite 1304

Chicago, IL 60611

Phone: (312) 822-9648 Fax: (312) 822-9650

Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen
Rasch Books and Publications: Winsteps and Facets
Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Rasch Models for Solving Measurement Problems (Facets), George Engelhard, Jr. & Jue Wang	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan

Coming Rasch-related Events
Apr. 21 - 22, 2025, Mon.-Tue.	International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025	On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025	On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
July 21 - 23, 2025, Mon.-Wed.	Pacific Rim Objective Measurement Symposium (PROMS) 2025, www.proms2025.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com