Calibration Matrices For Test Equating

A study of student performance across grades K through 8 required the equating of 17 test forms for each of two curriculum areas: Mathematics and Reading. Advances in Rasch technology enabled this to be achieved through the construction of one response matrix for each area. I will focus on the Mathematics analysis.

17 test forms, comprising Levels 6 through 14 (Grades K through 8) of the ITBS Form 7, and levels 7 through 14 (Grades 1 through 8) of CPS90, were equated in one step. Figure 1 shows the equating design. Each lettered rectangle corresponds to one test form. Some students took only one test form in the usual way. Some students took two test forms to provide common-person linking. Each of the 14 arrows in Figure 1 indicates a group of 100 to 150 students who took two test forms marked by arrow ends. These are the common-person links between pairs of forms. The test publishers designed these test forms so that adjacent levels between levels 9 and 14 share common items. This provides "common-item" equating at the higher levels.

Valid equating of math forms requires data that capture the math variable. Data contaminated by guessing, response set or disinterest, must be set aside from an equating study and only be reintroduced later for diagnostic or individual reporting.

Irrelevant test behavior was "cleaned out" of these data in five stages. Set aside were: (1) Answer forms with scanning or marking problems: (a) more than three double-marked responses. (b) lightly marked forms with more than 1 blank response followed by non-blank responses. (c) very lightly marked forms.

(2) Response strings indicating extreme student disinterest or out-of-level testing: (a) more than 25% of the items left blank. (b) many identical responses: "response sets". (c) repeating patterns of responses.

(3) When each test form at each grade level was analyzed separately, response strings showing excessive off-variable behavior, i.e., with infit and outfit mean squares above 2.5.

(4) When infit or outfit mean-squares were above 2.5, and there were many standardized residuals 3 or larger (suggesting guessing or carelessness).

(5) When students took two test forms, and standardized differences between their pairs of measures were above 2, and responses in their lower test performance showed evidence of irrelevant test-taking behavior, e.g., many omitted responses, response sets.

Standardized differences were obtained by:

where M_H and M_L are the higher and lower performance measures of a student relative to the mean of the common persons on that form, and S_H and S_L are the measures' standard errors.

The common-person and common-item links enabled all 17 test forms to be amalgamated into one block-diagonal "giant" matrix, shown schematically in Figure 2. Responses to different test items by the same person were aligned in the same row. Since many pairs of test forms had items in common, students often took the same item twice. In these cases, chronologically first responses were used. Responses to the same item by different persons were stacked in the same column.

Clerical mistakes were hard to avoid in setting up this equating design. Positioning the common items in the giant matrix required care. ITBS items are shared by two and sometimes three tests. Each different new item was assigned its own column in the matrix. When counting out columns, it proved easy to miscount. This threw subsequent item columns out of alignment. Sometimes miscounting went unnoticed until analysis reported the number of items to be different from that expected. When that happened, it was necessary to determine which columns were misplaced, and realign them.

Once the giant matrix was correctly constructed, it was analyzed by computer in the usual way. As discussed in RMT (5:3, p.172), obtaining good estimates from the block diagonal form of Figure 2, with 86% of the data missing, required fine convergence criteria. These criteria overcame the vertical-equating "range restriction" problems sometimes reported in the literature. Convergence required 263 iterations, 240 more than usual for a single test form. The decisive convergence criterion was the maximum marginal score residual. Convergence was not satisfactory until the largest marginal score residual was less than 0.5 score points.

The fact that all students and all test items were now part of the same connected data set, regardless of grade, test form or test publisher, enabled all student and item measures to be located on a single common scale of mathematics competency. The measures were then used for further investigation into such topics as the equivalence of test forms and the changes in math competency across grades.

Calibration Matrices For Test Equating. Lee O.K. … Rasch Measurement Transactions, 1992, 6:1, 202-203

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn. George Engelhard, Jr. & Jue Wang	Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Apr. 21 - 22, 2025, Mon.-Tue.	International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025	On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025	On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
July 21 - 23, 2025, Mon.-Wed.	Pacific Rim Objective Measurement Symposium (PROMS) 2025, www.proms2025.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com