Lessons from Statewide Assessment: Equating and Linking

Several measurement problems and practical issues have been solved in the 1991 Oregon Statewide Assessment through the use of Rasch methodology. The Oregon Statewide Assessment is administered to all students in grades 3, 5, 8 and 11 in Oregon's public schools. The emphasis is on the assessment of skill levels and curriculum evaluation, but the tests are not strictly "competency tests" and no passing scores are implemented. Through the help of extensive Rasch- calibrated, computerized item banks (NWEA, 1990) and the expertise of many people, solutions were obtained to several common testing problems.

Efficient Linking. Multiple test forms are needed in large scale assessment so that broad curriculum coverage can be obtained with brief, time-efficient tests. All item response theory models allow for pre-equating multiple test forms. The advantage of the Rasch model is that stable calibrations can be obtained from relatively small field-test samples for the new items added each year. With changes in curriculum emphasis and the need for test security, new items are always needed. Research by Fred Forster and George Ingebo has shown that excellent calibration stability can be obtained with samples of 200+. Our experience shows that linking should be done, however, with approximately 20 previously-calibrated items embedded in each field-test form (in addition to the new items). Field testing is completed several months in advance of the construction of the final test forms. The Oregon Statewide Assessment has used Fred Forster's "fixed parameter" model (which fixes the calibrations of the item-bank items during the iterative scaling of new items) to complete the linking, with excellent results.

Include "core" items. Limited testing time requires matrix sampling of items in order to keep test forms brief. Our experience, and that of others such as Richard Hill, is that a "core test" embedded in each multiple form is a wise idea. The core test allays concerns that statistical "wizardry" must be used to equate results. The core test can also be used to conduct final checks on the stability of calibrations. Occasionally, items are effected by historical events, such as a Dr. Martin Luther King Jr. reading-comprehension item which showed a calibration shift after national attention was given to the anniversary of his death.

Tie the scale to the curriculum. Longitudinal comparisons are enabled by Rasch scaling. A "curriculum-referenced" scale can be used to track the proportion of students at various skill levels at various times. Oregon uses calibrated items to define the types of skills students have mastered (e.g., with 80% probability of passing) at each of three levels (e.g., Basic, Proficient, and Advanced). Educators find such curriculum-referencing meaningful--perhaps more than the scale scores themselves. With Rasch scales, the proportion of students at each level can be compared across time. The proportions of students in the lower, middle and upper ranges, for example, can be useful for judging educational equity.

Cooperate! Linking state and local tests is essential because school districts often invest greatly in their own testing systems, Portland (OR) Public Schools have developed an important series of functional- levels tests. These tests allow emphasis to be placed on local curricula as well as state-mandated goals. With technical help from Ron Houser and Gage Kingsbury, a series of Rasch-based linkings were established between these Portland and State-wide testing systems. A core of 20 calibrated items at the functional center of each grade level were duplicated in State and Local tests to provide a "hard link" as well at the linkage formed from the scaling of the item banks.

"Often devising a measurement instrument is as important in what it teaches us about the variable as are the subsequent acts of measurement."
Andrich, D. Rasch Models for Measurement, 1988, Sage, p. 10

Lessons from Statewide Assessment, G Roid … Rasch Measurement Transactions, 1991, 5:1 p. 133

