Comparison of Item Performance with Large and Small Samples

MEASUREMENT RESEARCH ASSOCIATES

TEST INSIGHTS

March 2010

Greetings,

Since candidate sample sizes vary for certification exams, the study explores the impact of small sample sizes. The results show that while item calibration is less accurate, candidate separation reliability can be acceptable regardless of the candidate sample size.

Mary E. Lunz, Ph.D.
Executive Director

Usually a candidate population less than 50 is considered a small sample for multiple choice exams. Indeed, the concept of the multiple choice exam was developed to test large samples of candidates effectively and efficiently. However, the multiple choice format has become so well accepted that it is now often applied to small as well as large samples. In order to better understand the impact of candidate sample size on test item performance, a test with a large sample of over 1,000 candidates and a test with a small sample of less than 20 candidates were analyzed using the Rasch model. The criteria for reviewing item performance were 1) item separation reliability, 2) item discrimination and 3) the error associated with the calibrated difficulty of each item.

Item separation reliability is an indication of the reproducibility of the item difficulties. High item separation reliability means that there is a high probability that items will maintain the same difficulty estimates across examinations. For the large sample test, the item separation reliability was 1.00, while for the small sample test it was .75 indicating that the test item variable is less well defined when a small sample is used to calibrate the items.

An important factor in item assessment is the discrimination capability of each test item. It is sometimes difficult to interpret discrimination calculated from small samples because chance occurrences may affect a candidate's response to an item. Examples of a chance occurrence are an able candidate getting an easy item incorrect due to, say, reading the item too quickly and missing a better distractor, or misreading the item. Alternately, a less able candidate may get a difficult item correct, perhaps because they happen to be familiar with the particular fact an item asks about, or just make a lucky guess. Item discrimination for the small sample ranged from -.44 to .65 with an average of .19. Item discrimination for the large sample ranged from -.12 to .51 with an average of .22. Thus, the pattern of item discrimination for large and small samples is different in range, but fairly similar on average.

The biggest difference in the performance of items for large and small samples is the error of measurement. For the large sample the average error of measurement for item calibrations was .06 logits with a range of .06 to .09 logits, while for the small sample, the average error of measurement for item calibrations was .68 logits with a range of .47 to 1.84 logits. Items with calibrated difficulties in the center of the scale (p-values 50%-60%) have lower measurement errors for both large and small samples, than items with extreme high or low difficulty calibrations (p-values of 5% or 100%). The small sample analysis had more items that all candidates answered correctly and therefore higher measurement errors.

In summary, item difficulty and discrimination are not as accurately measured using a small sample; however, the items are calibrated with sufficient accuracy to produce an acceptable level of candidate separation (.89 for the large sample, and .86 for the small sample) suggesting acceptably accurate measurement of candidate performance regardless of sample size.

Measurement Research Associates, Inc.

505 North Lake Shore Dr., Suite 1304

Chicago, IL 60611

Phone: (312) 822-9648 Fax: (312) 822-9650

www.MeasurementResearch.com

Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen
Rasch Books and Publications: Winsteps and Facets
Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Rasch Models for Solving Measurement Problems (Facets), George Engelhard, Jr. & Jue Wang	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan

Coming Rasch-related Events
Apr. 21 - 22, 2025, Mon.-Tue.	International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025	On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025	On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
July 21 - 23, 2025, Mon.-Wed.	Pacific Rim Objective Measurement Symposium (PROMS) 2025, www.proms2025.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com