March 2010

Since candidate sample sizes vary for certification exams, the study explores the impact of small sample sizes.  The results show that while item calibration is less accurate, candidate separation reliability can be acceptable regardless of the candidate sample size. 

Mary E. Lunz, Ph.D.
Executive Director

Comparison of Item Performance with Large and Small Samples
Usually a candidate population less than 50 is considered a small sample for multiple choice exams.  Indeed, the concept of the multiple choice exam was developed to test large samples of candidates effectively and efficiently.  However, the multiple choice format has become so well accepted that it is now often applied to small as well as large samples.  In order to better understand the impact of candidate sample size on test item performance, a test with a large sample of over 1,000 candidates and a test with a small sample of less than 20 candidates were analyzed using the Rasch model.  The criteria for reviewing item performance were 1) item separation reliability, 2) item discrimination and 3) the error associated with the calibrated difficulty of each item.

Item separation reliability is an indication of the reproducibility of the item difficulties.  High item separation reliability means that there is a high probability that items will maintain the same difficulty estimates across examinations.  For the large sample test, the item separation reliability was 1.00, while for the small sample test it was .75 indicating that the test item variable is less well defined when a small sample is used to calibrate the items. 

An important factor in item assessment is the discrimination capability of each test item. It is sometimes difficult to interpret discrimination calculated from small samples because chance occurrences may affect a candidate's response to an item. Examples of a chance occurrence are an able candidate getting an easy item incorrect due to, say, reading the item too quickly and missing a better distractor, or misreading the item. Alternately, a less able candidate may get a difficult item correct, perhaps because they happen to be familiar with the particular fact an item asks about, or just make a lucky guess.  Item discrimination for the small sample ranged from -.44 to .65 with an average of .19. Item discrimination for the large sample ranged from -.12 to .51 with an average of .22.  Thus, the pattern of item discrimination for large and small samples is different in range, but fairly similar on average.

The biggest difference in the performance of items for large and small samples is the error of measurement.  For the large sample the average error of measurement for item calibrations was .06 logits with a range of .06 to .09 logits, while for the small sample, the average error of measurement for item calibrations was .68 logits with a range of .47 to 1.84 logits.  Items with calibrated difficulties in the center of the scale (p-values 50%-60%) have lower measurement errors for both large and small samples, than items with extreme high or low difficulty calibrations (p-values of 5% or 100%).  The small sample analysis had more items that all candidates answered correctly and therefore higher measurement errors.

In summary, item difficulty and discrimination are not as accurately measured using a small sample; however, the items are calibrated with sufficient accuracy to produce an acceptable level of candidate separation (.89 for the large sample, and .86 for the small sample) suggesting acceptably accurate measurement of candidate performance regardless of sample size.

Measurement Research Associates, Inc.
505 North Lake Shore Dr., Suite 1304
Chicago, IL  60611
Phone: (312) 822-9648     Fax: (312) 822-9650

Please help with Standard Dataset 4: Andrich Rating Scale Model

Rasch Publications
Rasch Measurement Transactions (free, online) Rasch Measurement research papers (free, online) Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Applying the Rasch Model 3rd. Ed., Bond & Fox Best Test Design, Wright & Stone
Rating Scale Analysis, Wright & Masters Introduction to Rasch Measurement, E. Smith & R. Smith Introduction to Many-Facet Rasch Measurement, Thomas Eckes Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. Statistical Analyses for Language Testers, Rita Green
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Journal of Applied Measurement Rasch models for measurement, David Andrich Constructing Measures, Mark Wilson Rasch Analysis in the Human Sciences, Boone, Stave, Yale
in Spanish: Análisis de Rasch para todos, Agustín Tristán Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez

To be emailed about new material on
please enter your email address here:

I want to Subscribe: & click below
I want to Unsubscribe: & click below

Please set your SPAM filter to accept emails from welcomes your comments:
Please email inquiries about Rasch books to books \at/

Your email address (if you want us to reply):


FORUMRasch Measurement Forum to discuss any Rasch-related topic

Coming Rasch-related Events
Sept. 15-16, 2017, Fri.-Sat. IOMC 2017: International Outcome Measurement Conference, Chicago,
Oct. 13 - Nov. 10, 2017, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps),
Oct. 25-27, 2017, Wed.-Fri. In-person workshop: Applying the Rasch Model hands-on introductory workshop, Melbourne, Australia (T. Bond, B&FSteps), Announcement
Jan. 5 - Feb. 2, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps),
Jan. 10-16, 2018, Wed.-Tues. In-person workshop: Advanced Course in Rasch Measurement Theory and the application of RUMM2030, Perth, Australia (D. Andrich), Announcement
Jan. 17-19, 2018, Wed.-Fri. Rasch Conference: Seventh International Conference on Probabilistic Models for Measurement, Matilda Bay Club, Perth, Australia, Website
April 13-17, 2018, Fri.-Tues. AERA, New York, NY,
May 25 - June 22, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps),
June 29 - July 27, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps),
Aug. 10 - Sept. 7, 2018, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets),
Oct. 12 - Nov. 9, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps),
The HTML to add "Coming Rasch-related Events" to your webpage is:
<script type="text/javascript" src=""></script>