MEASUREMENT RESEARCH ASSOCIATES
TEST INSIGHTS
May 2008
Greetings!
 

The quest to improve reliability of certification examinations is ongoing.  The quality of the items is the basis for educational measurement.  Our observations are that removing poorly performing items (usually poorly written items) from scoring, actually reduces the error of measurement and improves the reliability of the examination.

 
Mary E. Lunz, Ph.D.

Deleting Items Improves Reliability on Multiple Choice Examinations

The purpose of written certification examinations is to identify the candidates who are qualified to practice effectively.  The mechanism for accomplishing this is usually four or five part multiple choice items.  The quality of the multiple choice items included in an examination is the basis for the reliability or the accuracy of the decisions made about candidate performance.  In classical terms, this means the item should have a good p-value (percent correct) and point biserial correlation.  In Rasch terms it means the difficulty, as well as, the infit and outfit should be within acceptable limits.  Of course, the items must reasonably represent the pertinent content areas in the field of practice. Meeting the criteria for good item performance leads to a lower error of measurement, and more accurate outcomes for candidates.  Candidate separation reliability ((Standard Deviation2 - Standard Error2)/Standard Deviation2) estimates the accuracy of the measured differences among candidate performance.

 

On items that are good measures, candidates who do well on the total test have the highest probability of answering the item correctly, while candidates who do poorly have the lowest probability of answering the item correctly.  There are many item writing guides that reiterate item writing principles (see Item Development Guidelines at www.MeasurementResearch.com). When multiple choice items are well written, they distinguish between more and less knowledgeable candidates, reduce the error of measurement, and consequently lead to a higher candidate separation reliability.

 

One way to reduce measurement error is to include a sufficient number of items on the examination, at least 100.  The conventional wisdom is that more items decrease the error of measurement and increase reliability. However, after reviewing the data from many examinations, we have found that it takes more than long tests to improve reliability.  The consistency of item content within sections and within the test is critical for good reliability.  Another issue is the statistical performance of the item on the test. Whether item performance is measured with classical statistics or with Rasch IRT, items that do not perform well introduce measurement error and subsequently reduce examination reliability.  In fact, we have found that deleting poorly performing items often increases the reliability of the examination, even though the total number of items decreases.  Some examples that confirm the value of deleting poorly performing items are shown in the Table below.

 

Exam

Number of items before deletion

Reliability of Candidate Separation before item deletion

Number of items after deletion

Reliability of Candidate Separation after item deletion

Exam 1  

150

0.89

133

0.91

Exam 2

351

0.88

313

0.90

Exam 3

225

0.77

217

0.80

Exam 4

200

0.82

190

0.83

Exam 5

150

0.83

142

0.85

 
Measurement Research Associates, Inc.
505 North Lake Shore Dr., Suite 1304
Chicago, IL  60611
Phone: (312) 822-9648     Fax: (312) 822-9650
 

Please help with Standard Dataset 4: Andrich Rating Scale Model



Rasch Publications
Rasch Measurement Transactions (free, online) Rasch Measurement research papers (free, online) Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Applying the Rasch Model 3rd. Ed., Bond & Fox Best Test Design, Wright & Stone
Rating Scale Analysis, Wright & Masters Introduction to Rasch Measurement, E. Smith & R. Smith Introduction to Many-Facet Rasch Measurement, Thomas Eckes Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. Statistical Analyses for Language Testers, Rita Green
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Journal of Applied Measurement Rasch models for measurement, David Andrich Constructing Measures, Mark Wilson Rasch Analysis in the Human Sciences, Boone, Stave, Yale
in Spanish: Análisis de Rasch para todos, Agustín Tristán Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez

To be emailed about new material on www.rasch.org
please enter your email address here:

I want to Subscribe: & click below
I want to Unsubscribe: & click below

Please set your SPAM filter to accept emails from Rasch.org

www.rasch.org welcomes your comments:
Please email inquiries about Rasch books to books \at/ rasch.org

Your email address (if you want us to reply):

 

FORUMRasch Measurement Forum to discuss any Rasch-related topic

Coming Rasch-related Events
June 30 - July 29, 2017, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
July 31 - Aug. 3, 2017, Mon.-Thurs. Joint IMEKO TC1-TC7-TC13 Symposium 2017: Measurement Science challenges in Natural and Social Sciences, Rio de Janeiro, Brazil, imeko-tc7-rio.org.br
Aug. 7-9, 2017, Mon-Wed. In-person workshop and research coloquium: Effect size of family and school indexes in writing competence using TERCE data (C. Pardo, A. Atorressi, Winsteps), Bariloche Argentina. Carlos Pardo, Universidad Catòlica de Colombia
Aug. 7-9, 2017, Mon-Wed. PROMS 2017: Pacific Rim Objective Measurement Symposium, Sabah, Borneo, Malaysia, proms.promsociety.org/2017/
Aug. 10, 2017, Thurs. In-person Winsteps Training Workshop (M. Linacre, Winsteps), Sydney, Australia. www.winsteps.com/sydneyws.htm
Aug. 11 - Sept. 8, 2017, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com
Aug. 18-21, 2017, Fri.-Mon. IACAT 2017: International Association for Computerized Adaptive Testing, Niigata, Japan, iacat.org
Sept. 15-16, 2017, Fri.-Sat. IOMC 2017: International Outcome Measurement Conference, Chicago, jampress.org/iomc2017.htm
Oct. 13 - Nov. 10, 2017, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Jan. 5 - Feb. 2, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Jan. 10-16, 2018, Wed.-Tues. In-person workshop: Advanced Course in Rasch Measurement Theory and the application of RUMM2030, Perth, Australia (D. Andrich), Announcement
Jan. 17-19, 2018, Wed.-Fri. Rasch Conference: Seventh International Conference on Probabilistic Models for Measurement, Matilda Bay Club, Perth, Australia, Website
April 13-17, 2018, Fri.-Tues. AERA, New York, NY, www.aera.net
May 25 - June 22, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 29 - July 27, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
Aug. 10 - Sept. 7, 2018, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com
Oct. 12 - Nov. 9, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
The HTML to add "Coming Rasch-related Events" to your webpage is:
<script type="text/javascript" src="http://www.rasch.org/events.txt"></script>