March 2010

Since candidate sample sizes vary for certification exams, the study explores the impact of small sample sizes.  The results show that while item calibration is less accurate, candidate separation reliability can be acceptable regardless of the candidate sample size. 

Mary E. Lunz, Ph.D.
Executive Director

Comparison of Item Performance with Large and Small Samples
Usually a candidate population less than 50 is considered a small sample for multiple choice exams.  Indeed, the concept of the multiple choice exam was developed to test large samples of candidates effectively and efficiently.  However, the multiple choice format has become so well accepted that it is now often applied to small as well as large samples.  In order to better understand the impact of candidate sample size on test item performance, a test with a large sample of over 1,000 candidates and a test with a small sample of less than 20 candidates were analyzed using the Rasch model.  The criteria for reviewing item performance were 1) item separation reliability, 2) item discrimination and 3) the error associated with the calibrated difficulty of each item.

Item separation reliability is an indication of the reproducibility of the item difficulties.  High item separation reliability means that there is a high probability that items will maintain the same difficulty estimates across examinations.  For the large sample test, the item separation reliability was 1.00, while for the small sample test it was .75 indicating that the test item variable is less well defined when a small sample is used to calibrate the items. 

An important factor in item assessment is the discrimination capability of each test item. It is sometimes difficult to interpret discrimination calculated from small samples because chance occurrences may affect a candidate's response to an item. Examples of a chance occurrence are an able candidate getting an easy item incorrect due to, say, reading the item too quickly and missing a better distractor, or misreading the item. Alternately, a less able candidate may get a difficult item correct, perhaps because they happen to be familiar with the particular fact an item asks about, or just make a lucky guess.  Item discrimination for the small sample ranged from -.44 to .65 with an average of .19. Item discrimination for the large sample ranged from -.12 to .51 with an average of .22.  Thus, the pattern of item discrimination for large and small samples is different in range, but fairly similar on average.

The biggest difference in the performance of items for large and small samples is the error of measurement.  For the large sample the average error of measurement for item calibrations was .06 logits with a range of .06 to .09 logits, while for the small sample, the average error of measurement for item calibrations was .68 logits with a range of .47 to 1.84 logits.  Items with calibrated difficulties in the center of the scale (p-values 50%-60%) have lower measurement errors for both large and small samples, than items with extreme high or low difficulty calibrations (p-values of 5% or 100%).  The small sample analysis had more items that all candidates answered correctly and therefore higher measurement errors.

In summary, item difficulty and discrimination are not as accurately measured using a small sample; however, the items are calibrated with sufficient accuracy to produce an acceptable level of candidate separation (.89 for the large sample, and .86 for the small sample) suggesting acceptably accurate measurement of candidate performance regardless of sample size.

Measurement Research Associates, Inc.
505 North Lake Shore Dr., Suite 1304
Chicago, IL  60611
Phone: (312) 822-9648     Fax: (312) 822-9650

Coming Rasch-related Events
Oct. 7 - Nov. 4, 2022, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps),
Nov. 2 - 30, 2022, Wed.-Wed. On-line course: Intermediate/Advanced Rasch Analysis (M. Horton, RUMM2030),
Dec. 1 - 3, 2022, Thur.-Sat. In-person Conference: Pacific Rim Objective Measurement Symposium (PROMS) 2022
Jan. 25 - March 8, 2023, Wed..-Wed. On-line course: Introductory Rasch Analysis (M. Horton, RUMM2030),
June 23 - July 21, 2023, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps),
Aug. 11 - Sept. 8, 2023, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets),