March 2010

Since candidate sample sizes vary for certification exams, the study explores the impact of small sample sizes.  The results show that while item calibration is less accurate, candidate separation reliability can be acceptable regardless of the candidate sample size. 

Mary E. Lunz, Ph.D.
Executive Director

Comparison of Item Performance with Large and Small Samples
Usually a candidate population less than 50 is considered a small sample for multiple choice exams.  Indeed, the concept of the multiple choice exam was developed to test large samples of candidates effectively and efficiently.  However, the multiple choice format has become so well accepted that it is now often applied to small as well as large samples.  In order to better understand the impact of candidate sample size on test item performance, a test with a large sample of over 1,000 candidates and a test with a small sample of less than 20 candidates were analyzed using the Rasch model.  The criteria for reviewing item performance were 1) item separation reliability, 2) item discrimination and 3) the error associated with the calibrated difficulty of each item.

Item separation reliability is an indication of the reproducibility of the item difficulties.  High item separation reliability means that there is a high probability that items will maintain the same difficulty estimates across examinations.  For the large sample test, the item separation reliability was 1.00, while for the small sample test it was .75 indicating that the test item variable is less well defined when a small sample is used to calibrate the items. 

An important factor in item assessment is the discrimination capability of each test item. It is sometimes difficult to interpret discrimination calculated from small samples because chance occurrences may affect a candidate's response to an item. Examples of a chance occurrence are an able candidate getting an easy item incorrect due to, say, reading the item too quickly and missing a better distractor, or misreading the item. Alternately, a less able candidate may get a difficult item correct, perhaps because they happen to be familiar with the particular fact an item asks about, or just make a lucky guess.  Item discrimination for the small sample ranged from -.44 to .65 with an average of .19. Item discrimination for the large sample ranged from -.12 to .51 with an average of .22.  Thus, the pattern of item discrimination for large and small samples is different in range, but fairly similar on average.

The biggest difference in the performance of items for large and small samples is the error of measurement.  For the large sample the average error of measurement for item calibrations was .06 logits with a range of .06 to .09 logits, while for the small sample, the average error of measurement for item calibrations was .68 logits with a range of .47 to 1.84 logits.  Items with calibrated difficulties in the center of the scale (p-values 50%-60%) have lower measurement errors for both large and small samples, than items with extreme high or low difficulty calibrations (p-values of 5% or 100%).  The small sample analysis had more items that all candidates answered correctly and therefore higher measurement errors.

In summary, item difficulty and discrimination are not as accurately measured using a small sample; however, the items are calibrated with sufficient accuracy to produce an acceptable level of candidate separation (.89 for the large sample, and .86 for the small sample) suggesting acceptably accurate measurement of candidate performance regardless of sample size.

Measurement Research Associates, Inc.
505 North Lake Shore Dr., Suite 1304
Chicago, IL  60611
Phone: (312) 822-9648     Fax: (312) 822-9650

Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse Rasch Measurement Theory Analysis in R, Wind, Hua Applying the Rasch Model in Social Sciences Using R, Lamprianou Journal of Applied Measurement
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Rasch Models for Measurement, David Andrich Constructing Measures, Mark Wilson Best Test Design - free, Wright & Stone
Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias Diseño de Mejores Pruebas - free, Spanish Best Test Design A Course in Rasch Measurement Theory, Andrich, Marais Rasch Models in Health, Christensen, Kreiner, Mesba Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen
Rasch Books and Publications: Winsteps and Facets
Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes Rasch Models for Solving Measurement Problems (Facets), George Engelhard, Jr. & Jue Wang Statistical Analyses for Language Testers (Facets), Rita Green Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind Rasch Measurement: Applications, Khine Winsteps Tutorials - free
Facets Tutorials - free
Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan

To be emailed about new material on
please enter your email address here:

I want to Subscribe: & click below
I want to Unsubscribe: & click below

Please set your SPAM filter to accept emails from welcomes your comments:
Please email inquiries about Rasch books to books \at/

Your email address (if you want us to reply):


FORUMRasch Measurement Forum to discuss any Rasch-related topic

Coming Rasch-related Events
Oct. 6 - Nov. 3, 2023, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Facets),
Oct. 12, 2023, Thursday 5 to 7 pm Colombian timeOn-line workshop: Deconstruyendo el concepto de validez y Discusiones sobre estimaciones de confiabilidad SICAPSI (J. Escobar, C.Pardo)
June 12 - 14, 2024, Wed.-Fri. 1st Scandinavian Applied Measurement Conference, Kristianstad University, Kristianstad, Sweden
Aug. 9 - Sept. 6, 2024, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets),