Item Recalibration and Stability

Item recalibration and stability. Lunz ME, Bergstrom BA. … 1995, 8:4 p.396

Immediate reporting of candidate abilities at the end of a computer-adaptive test (CAT) requires that abilities be estimated from banked item difficulties. These item difficulties are open to recalibration after a sample of candidates has been tested. For fairness and accuracy, it is important to project how different the ability estimates might have been, had they been based on recalibrated item difficulties.

In general, the impact of recalibration is small. For example, when 2 or 3 percent of the items in the item bank change difficulty by as much as 1.00 logit all in the same direction, this results, on average, in an ability estimate change for a candidate who responded to 100 items of:
2% x 1.0 logit = .02 logits ability change
3% x 1.0 logit = .03 logits ability change.

These changes are far less than the standard errors for candidates taking a CAT of 100 targeted items, because SEM for 100 items > (100*.25)**-1/2 = .2 logits. In practice, some items will recalibrate as more difficult, some as easier. Change in an ability estimate requires a change in the mean item difficulty of the items presented to the candidate.

As an empirical check, an investigation of estimation stability was conducted on CAT data collected in 1993 from 1,699 candidates responding to a pool of 792 items. A baseline group of 92 items and 549 candidates was identified. The criteria for inclusion were: 1) at least 100 baseline candidates answered each baseline item, and 2) at least 30 baseline items were administered to each baseline candidate.

Baseline ability measures and item calibrations were obtained using the entire baseline sample. The calibration for each item was based on the responses of baseline candidates to whom that item had been administered. Thus the number of relevant responses differed across items from 113 to 395. Then a series of independent Rasch analyses were performed for random samples of 30, 50, and 100 candidates from the baseline population of 549 candidates.

For each sample, item difficulties were estimated from whatever responses that sample's candidates had made. Thus, for the 30 candidate sample, 4 items had not been administered to any candidate in the sample. For the remaining 88 items, the number of responses to each item ranged from 8 to 24. For the 50 candidate sample, all 92 items were recalibrated from the responses of 9 to 40 candidates. For the 100 candidate sample, all 92 items were recalibrated from the responses of 16 to 73 candidates.

Candidate measures obtained from these three samples were compared to their baseline measures to investigate stability. The plots show the results. As expected, the item calibrations were quite unstable. Nevertheless, the ability estimates were stable, even under the most adverse conditions. No discrepancies exceeded the 0.3 logit S.E.of each ability measure. Though this finding is highly satisfactory, the impact of item recalibration on ability estimation in high-stakes situations can be reduced further. Keep the mean of the candidate ability estimates constant across recalibration, instead of setting the mean of the item difficulties equal to a constant (as was done here).

30 Candidates - Items

30 Candidates - Persons

50 Candidates - Items

50 Candidates - Persons

100 Candidates - Items

100 Candidates - Persons

Item recalibration and stability. Lunz ME, Bergstrom BA. … Rasch Measurement Transactions, 1995, 8:4 p.396

Please help with Standard Dataset 4: Andrich Rating Scale Model

Rasch Publications
Rasch Measurement Transactions (free, online) Rasch Measurement research papers (free, online) Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Applying the Rasch Model 3rd. Ed., Bond & Fox Best Test Design, Wright & Stone
Rating Scale Analysis, Wright & Masters Introduction to Rasch Measurement, E. Smith & R. Smith Introduction to Many-Facet Rasch Measurement, Thomas Eckes Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. Statistical Analyses for Language Testers, Rita Green
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Journal of Applied Measurement Rasch models for measurement, David Andrich Constructing Measures, Mark Wilson Rasch Analysis in the Human Sciences, Boone, Stave, Yale
in Spanish: Análisis de Rasch para todos, Agustín Tristán Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez

To be emailed about new material on
please enter your email address here:

I want to Subscribe: & click below
I want to Unsubscribe: & click below

Please set your SPAM filter to accept emails from welcomes your comments:

Your email address (if you want us to reply):


ForumRasch Measurement Forum to discuss any Rasch-related topic

Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website,

Coming Rasch-related Events
Sept. 15-16, 2017, Fri.-Sat. IOMC 2017: International Outcome Measurement Conference, Chicago,
Oct. 13 - Nov. 10, 2017, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps),
Oct. 25-27, 2017, Wed.-Fri. In-person workshop: Applying the Rasch Model hands-on introductory workshop, Melbourne, Australia (T. Bond, B&FSteps), Announcement
Jan. 5 - Feb. 2, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps),
Jan. 10-16, 2018, Wed.-Tues. In-person workshop: Advanced Course in Rasch Measurement Theory and the application of RUMM2030, Perth, Australia (D. Andrich), Announcement
Jan. 17-19, 2018, Wed.-Fri. Rasch Conference: Seventh International Conference on Probabilistic Models for Measurement, Matilda Bay Club, Perth, Australia, Website
April 13-17, 2018, Fri.-Tues. AERA, New York, NY,
May 25 - June 22, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps),
June 29 - July 27, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps),
Aug. 10 - Sept. 7, 2018, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets),
Oct. 12 - Nov. 9, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps),
The HTML to add "Coming Rasch-related Events" to your webpage is:
<script type="text/javascript" src=""></script>


The URL of this page is