Trials with Vertical Equating

I ran into some problems with vertical equating. My bank contains 1,000 items from beginner level to very advanced. What I found was that the spread of difficulties calculated for any one test form was too wide. Extremely high or low percentage-correct scores produced ability and item difficulty estimates that were too far from the test form mean. This meant that, when test forms were combined, the most difficult, but still easy, items in a dead easy test form came out at an intermediate level when all the forms were linked into the bank. The same problem occurs for person abilities. My solution has been to cut off the top and bottom 20% of raw scores. This, combined with excluding or retrialling items that were badly targeted first time round, considerably reduced the apparent logit range of item difficulties, and produced an item difficulty line-up that makes sense.

Ben Wright comments:
The Rasch model provides measures for items relevant to persons. Off- target items provoke guessing, carelessness and other misbehaviors that reduce the measurement capabilities of the test. Neil's sensible solution to this problem is motivated by a fundamental concern: results have got to make sense! No statistical sleight of hand is going to do better than Neil has done. Neil, have you tried building your bank with just one global calibration (one-step (concurrent) vertical equating)?



Trials with Vertical Equating, N Jones & B Wright … Rasch Measurement Transactions, 1992, 6:3 p. 240


The URL of this page is www.rasch.org/rmt/rmt63l.htm

Website: www.rasch.org/rmt/contents.htm