Computer Adaptive Tests (CAT), Item Selection, Standard Errors and Stopping Rules

The standard error of measurement (S.E.) is widely used for stopping a computer-adaptive test. For instance, if the current measure estimate is more than 1.96 S.E.s from the pass-fail measure, then there is 95% confidence in the pass-fail decision. Or 2.58 S.E.s for 99% confidence. But how many items are needed to reach a desired S.E.?

If a person has probability, P, of succeeding on a dichotomous item (such as a multiple-choice question), then the statistical information in the response is P*(1-P). The standard error of the estimated measure is
S.E. = 1/sqrt(information) = 1/ sqrt(sum(P*(1-P)))

The largest information, and so the smallest standard error, occurs when P=0.5, i.e., when the CAT items are targeted exactly on the persons. But this can produce an unsatisfactory testing experience for the examinee so higher probabilities of success are targeted, such as P=.7 (for 70% success: items are selected so that the person achieves about 70% success on the administered items) and P=.8 (for 80% success). Here is a Table showing the targeting, standard error, and minimum number of items administered for a specific S.E.:

Minimum number of CAT Items Administered
Targeting
Probability
of Success
S.E. (Logits)
0.50.40.30.20.150.1
P=0.5162545100178400
0.6172747105186417
0.7203053120212477
0.8254070157278625
0.945701242784941112

It is seen that the penalty for going from P=0.5 to P=0.6 targeting is the administration of about 5% more items. From P=0.5 to P=0.7 is about 20% more items. From P=0.5 to P=0.8 is 60% more items. P=0.9 almost triples the test length. An S.E. of 0.15 logits requires about 10 times as many items as an S.E. of 0.5 logits.

Minimum Number of Items for 95% Confidence (|t|>=1.96) in Pass-Fail Decision
Targeting
Probability
of Success
Logit Distance of Ability Estimate from Pass-Fail Point
10.90.80.70.60.50.40.30.20.1
P=0.5161925324362971713851537
0.61720263345651011784011601
0.71923293851741152044581830
0.82530384967971512676012401
0.94353678811917126747510684269

When administering many items in a CAT test, it is also wise to consider item response times: "Utilizing Response Time Distributions for Item Selection in CAT," Zhewen Fan, Chun Wang, Hua-Hua Chang, and Jeffrey Douglas, Journal of Education and Behavioral Statistics, 2012.

John Michael Linacre


Computer-Adaptive Tests (CAT), Standard Errors and Stopping Rules, Linacre J.M. … Rasch Measurement Transactions, 2006, 20:2 p. 1062

The URL of this page is www.rasch.org/rmt/rmt202f.htm

Website: www.rasch.org/rmt/contents.htm