Multiple-choice items continue to dominate educational testing because they are an effective and relatively easy way to measure constructs like ability and achievement. The professional attention given to analyzing responses to multiple-choice test items is considerable and has lead to advances in item response theories. In contrast, there is little scientific basis for multiple-choice item writing (Haladyna & Downing 1988a,b). Most item writing knowledge is based on personal experience or wisdom passed on from particular mentors. A paradox exists: we place more emphasis on analyzing responses than we do on how we obtain them. This article works toward restoring the balance by focusing on advances in item writing.
Theories of Item Writing:
There are many benefits in item-writing theories like those of Bormuth, Guttman (facet theory), Hively, Tiemann & Markle, and Williams & Haladyna. First, they emphasize the operational definition of content in terms of the types and extent of cognitive behavior to be tested. Second, they reduce the idiosyncrasies of item writers by developing standardizing rules. Third, they facilitate the generation of items to cover all of the relevant content domain. Fourth, they provide evidence of construct and content validity. On the other hand, the laborious nature of their item writing rules limits their usefulness. Nevertheless, they offer the hope that item writing will surmount the problems that limit test development today.
Old Item Formats:
* Multiple-choice (MCQ). A stem, in question or partial-sentence form, and four or five options. There is some support for preferring three options.
For what is San Diego best known? A. Outstanding restaurants and fine dining B. Mild climate C. Major league baseball and football teams
* Matching. A modified MCQ in which a single set of options precedes the stems, thus focusing testing on one set of concepts.
Match the city with a distinguishing feature. A. San Diego B. San Francisco C. Seattle D. Los Angeles 1. Proximity to islands and protected ocean water 2. Entertainment and tourist attractions 3. Cool climate and unpolluted air
* True-false. This format has gained a negative reputation because of substantial evidence against its use. But there are revisions which seem worthwhile (see Alternate-choice and Multiple true-false below).
New Item Formats:
Here are some new multiple-choice formats, with my recommendations based on a survey of the research.
* Alternate-choice. A two-option MCQ with the limitation of a 50% probability of guessing the right answer. Offsetting this is efficiency. One can administer many more alternate-choice items than conventional MCQs in a fixed time. Lord argues that two-option testing is ideal for high achieving examinees, while four- or five- option MCQs work best with low achievers. When Steve Downing and I analyzed distractor use for three standardized tests, we found that most items contained only one or two working distractors. Recommended.
What is more popular in downtown San Diego? A. Horton Plaza B. Gaslite District
* Complex multiple-choice (Type K). A set of answers are combined to form the multiple-choice options. These items are usually more difficult, less discriminating, and require more development and administration time than MCQs. The National Board of Medical Examiners has discontinued the use of this format. Not recommended.
What best represent San Diego's attractiveness? 1. Climate and location 2. Beaches and water sports 3. Tourist attractions (e.g. Sea World) A. 1 & 2 B. 2 & 3 C. 1 & 3 D. 1, 2, & 3
* Multiple true-false. Like an MCQ, but the examinee evaluates the truthfulness of each option. Each option is numbered because each is a true-false item, while the stem is not numbered because it is the stimulus. The obstacle to this format is lack of familiarity. Another problem may be the tendency for one option to influence the response to another. This format is efficient, and any MCQ can be presented this way. Recommended.
What are major attractions in San Diego? 1. Sailing and water sports 2. Restaurants 3. Shopping 4. Tourists attractions (e.g. Sea World)
* Context-dependent item set (Testlet). Stimulus material and 5-12 related test items. Any item format may be used. There are four types of context-dependent item sets. (1) pictorial: pictures, maps, drawings, graphs, data, photographs, art, (2) interlinear: a passage with denotations which provide a basis for questioning, usually for the detection of grammatical, spelling, punctuation, and capitalization errors, (3) interpretive: for reading comprehension, and (4) problem-solving. The item set is inefficient to construct and administer, but it is versatile and able to measure higher-level thinking. Used extensively in certification and licensing testing programs. Recommended.
You are planning a one-week vacation to a West Coast city with a mild summer climate.
1. What is a reasonable estimate of daily food and lodging costs for two?
A. $50 B. $100 C. $200
2. What is a reasonable estimate of the minimum weekly rental rate of a sub-compact car?
A. $120 B. $150 C. $175
* Item Shell A successfully performing item from which the content
has been removed, leaving the syntactic structure. A mathematics
teacher wants to test problem solving in the context of financing the
purchase of an automobile. The original successful question states:
What is the annual interest charge on an auto loan of $10,000 at 8.5%?
The teacher strips out the loan amount and percentage rate, making an item shell, and replaces them with new values to generate similar items. This example may appear facile, but sophisticated item shells for medical problem solving and pharmacy have been developed that tap aspects of higher level thinking. The ease of constructing the item shell counteracts "writer's block" and makes it appealing. Recommended.
Haladyna TM, Downing SM (1988a) A taxonomy of multiple-choice item- writing rules. Applied Measurement in Education, 1, 37-50 Haladyna TM, Downing SM (1988b) The validity of a taxonomy of multiple-choice item-writing rules. Applied Measurement in Education, 1, 51-78
Advances in Item Design, T Haladyna Rasch Measurement Transactions, 1990, 4:2 p. 103-104
|Rasch Measurement Transactions (free, online)||Rasch Measurement research papers (free, online)||Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch||Applying the Rasch Model 3rd. Ed., Bond & Fox||Best Test Design, Wright & Stone|
|Rating Scale Analysis, Wright & Masters||Introduction to Rasch Measurement, E. Smith & R. Smith||Introduction to Many-Facet Rasch Measurement, Thomas Eckes||Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr.||Statistical Analyses for Language Testers, Rita Green|
|Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar||Journal of Applied Measurement||Rasch models for measurement, David Andrich||Constructing Measures, Mark Wilson||Rasch Analysis in the Human Sciences, Boone, Stave, Yale|
|in Spanish:||Análisis de Rasch para todos, Agustín Tristán||Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez|
|Forum||Rasch Measurement Forum to discuss any Rasch-related topic|
Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement
Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.
|Coming Rasch-related Events|
|June 23 - July 21, 2023, Fri.-Fri.||On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com|
|Aug. 11 - Sept. 8, 2023, Fri.-Fri.||On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com|
The URL of this page is www.rasch.org/rmt/rmt42b.htm