Consequences of Flawed Items

MEASUREMENT RESEARCH ASSOCIATES

TEST INSIGHTS

September 2009

Greetings

Item writers often find it difficult to write multiple choice items that comply with good-item writing guidelines. This study shows that it is worth the extra effort spent writing good items.

Ross Brown
Manager, Test Development and Analysis

Many guidelines for writing good multiple choice items are intended to reduce the measurement error that results when candidates who potentially know the information being tested get an item wrong due to the construction of the item. Two examples of item flaws that may introduce such measurement error are multiple true/false items, and items with negative stems.

Multiple true/false items violate the principle that items should be focused on a single idea or issue. Multiple true-false items usually consist of a minimal stem and distractors that are conceptually unrelated. Candidates are required to assess each distractor independently and determine whether each response is true or false. For example:

                        The common cold:
                        A. is transmitted through saliva only.
                        B. is evident in a chest X-ray
                        C. will most often clear up after two days.
                        D. is treatable with Tamiflu.

Items with negative stems require candidates to select from the distractors the one that does NOT answer the conditions described in the stem. Candidates may get these items incorrect because they skim over and miss the negative word in the stem, and mistakenly choose a response that meets the conditions in the stem. In addition, these items do not assess what the candidate actually knows, but rather if they can identify an incorrect response to the issue presented in the stem. For example, a candidate can answer the question below without knowing the color of a pomegranate.

                         Which of the following is NOT red?
                         A.     apples
                       B.     pomegranates
                         C.     pears
                         D.     tomatoes

This study looked at the consequences of using items with these flaws in terms of 1) item difficulty and 2) candidate outcomes. This study is patterned after a study of items administered to medical school students by Downing (2005). The analysis was conducted on a group of 138 items, of which 69 were flawed items and 69 were unflawed items. The item flaws were multiple true/false and negative items.

Item p-value is the percentage of candidates who answered the item correctly. The table below shows that the average p-value for the flawed items was lower than for the unflawed items and the total items, indicating these items are more difficult for candidates to answer correctly.

	69 Flawed Items	69 Unflawed Items	138 Total Items
P-value	.61	.69	.65

For purposes of this study the passing standard was set arbitrarily at a score of 65% correct. Candidates outcomes were then determined based on the total items, flawed items only and unflawed items only. Only 37% of the candidates pass when the flawed items are used, compared to 71% of the candidates passing when the unflawed items are used, and 52% passing based on total items.

While this study is simulated from real data, it confirms the impact of flawed items found by Downing. It also provides concrete evidence that supports eliminating multiple true/false and items with negative stems from examinations.

Reference
Downing, S. M. (2005). The effects of violating standard item writing principles on tests and students: The consequences of using flawed test items on achievement examinations in medical education. Advances in Health Sciences Education, 10, 133-143.

Measurement Research Associates, Inc.

505 North Lake Shore Dr., Suite 1304

Chicago, IL 60611

Phone: (312) 822-9648 Fax: (312) 822-9650

www.MeasurementResearch.com

Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen
Rasch Books and Publications: Winsteps and Facets
Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Rasch Models for Solving Measurement Problems (Facets), George Engelhard, Jr. & Jue Wang	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan

Coming Rasch-related Events
Apr. 21 - 22, 2025, Mon.-Tue.	International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025	On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025	On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
July 21 - 23, 2025, Mon.-Wed.	Pacific Rim Objective Measurement Symposium (PROMS) 2025, www.proms2025.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com