September 2009

Item writers often find it difficult to write multiple choice items that comply with good-item writing guidelines. This study shows that it is worth the extra effort spent writing good items.

Ross Brown
Manager, Test Development and Analysis

Consequences of Flawed Items
Many guidelines for writing good multiple choice items are intended to reduce the measurement error that results when candidates who potentially know the information being tested get an item wrong due to the construction of the item. Two examples of item flaws that may introduce such measurement error are multiple true/false items, and items with negative stems.

Multiple true/false items violate the principle that items should be focused on a single idea or issue. Multiple true-false items usually consist of a minimal stem and distractors that are conceptually unrelated. Candidates are required to assess each distractor independently and determine whether each response is true or false. For example:

                        The common cold:
                        A.  is transmitted through saliva only.
                        B.  is evident in a chest X-ray
                        C.  will most often clear up after two days.
                        D.  is treatable with Tamiflu.

Items with negative stems require candidates to select from the distractors the one that does NOT answer the conditions described in the stem. Candidates may get these items incorrect because they skim over and miss the negative word in the stem, and mistakenly choose a response that meets the conditions in the stem. In addition, these items do not assess what the candidate actually knows, but rather if they can identify an incorrect response to the issue presented in the stem. For example, a candidate can answer the question below without knowing the color of a pomegranate.
                         Which of the following is NOT red?
                         A.     apples
                         B.     pomegranates
                         C.     pears
                         D.     tomatoes

This study looked at the consequences of using items with these flaws in terms of 1) item difficulty and 2) candidate outcomes. This study is patterned after a study of items administered to medical school students by Downing (2005). The analysis was conducted on a group of 138 items, of which 69 were flawed items and 69 were unflawed items. The item flaws were multiple true/false and negative items.

Item p-value is the percentage of candidates who answered the item correctly. The table below shows that the average p-value for the flawed items was lower than for the unflawed items and the total items, indicating these items are more difficult for candidates to answer correctly.


Flawed Items
Unflawed Items
Total Items




For purposes of this study the passing standard was set arbitrarily at a score of 65% correct.  Candidates outcomes were then determined based on the total items, flawed items only and unflawed items only. Only 37% of the candidates pass when the flawed items are used, compared to 71% of the candidates passing when the unflawed items are used, and 52% passing based on total items. 
While this study is simulated from real data, it confirms the impact of flawed items found by Downing. It also provides concrete evidence that supports eliminating multiple true/false and items with negative stems from examinations. 

Downing, S. M. (2005). The effects of violating standard item writing principles on tests and students: The consequences of using flawed test items on achievement examinations in medical education. Advances in Health Sciences Education, 10, 133-143.
Measurement Research Associates, Inc.
505 North Lake Shore Dr., Suite 1304
Chicago, IL  60611
Phone: (312) 822-9648     Fax: (312) 822-9650

Coming Rasch-related Events
June 26 - July 24, 2020, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps),
June 29 - July 1, 2020, Mon.-Wed. Measurement at the Crossroads 2020, Milan, Italy ,
July - November, 2020On-line course: An Introduction to Rasch Measurement Theory and RUMM2030Plus (Andrich & Marais),
July 1 - July 3, 2020, Wed.-Fri. International Measurement Confederation (IMEKO) Joint Symposium, Warsaw, Poland,
Aug. 7 - Sept. 4, 2020, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets),
Oct. 9 - Nov. 6, 2020, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps),
June 25 - July 23, 2021, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps),