Advances in Item Design

Multiple-choice items continue to dominate educational testing because they are an effective and relatively easy way to measure constructs like ability and achievement. The professional attention given to analyzing responses to multiple-choice test items is considerable and has lead to advances in item response theories. In contrast, there is little scientific basis for multiple-choice item writing (Haladyna & Downing 1988a,b). Most item writing knowledge is based on personal experience or wisdom passed on from particular mentors. A paradox exists: we place more emphasis on analyzing responses than we do on how we obtain them. This article works toward restoring the balance by focusing on advances in item writing.

Theories of Item Writing:
There are many benefits in item-writing theories like those of Bormuth, Guttman (facet theory), Hively, Tiemann & Markle, and Williams & Haladyna. First, they emphasize the operational definition of content in terms of the types and extent of cognitive behavior to be tested. Second, they reduce the idiosyncrasies of item writers by developing standardizing rules. Third, they facilitate the generation of items to cover all of the relevant content domain. Fourth, they provide evidence of construct and content validity. On the other hand, the laborious nature of their item writing rules limits their usefulness. Nevertheless, they offer the hope that item writing will surmount the problems that limit test development today.

Old Item Formats:
* Multiple-choice (MCQ). A stem, in question or partial-sentence form, and four or five options. There is some support for preferring three options.

  For what is San Diego best known?
  A. Outstanding restaurants and fine dining
  B. Mild climate
  C. Major league baseball and football teams

* Matching. A modified MCQ in which a single set of options precedes the stems, thus focusing testing on one set of concepts.

  Match the city with a distinguishing feature.
  A. San Diego
  B. San Francisco
  C. Seattle
  D. Los Angeles

  1. Proximity to islands and protected ocean water
  2. Entertainment and tourist attractions
  3. Cool climate and unpolluted air

* True-false. This format has gained a negative reputation because of substantial evidence against its use. But there are revisions which seem worthwhile (see Alternate-choice and Multiple true-false below).

New Item Formats:
Here are some new multiple-choice formats, with my recommendations based on a survey of the research.

* Alternate-choice. A two-option MCQ with the limitation of a 50% probability of guessing the right answer. Offsetting this is efficiency. One can administer many more alternate-choice items than conventional MCQs in a fixed time. Lord argues that two-option testing is ideal for high achieving examinees, while four- or five- option MCQs work best with low achievers. When Steve Downing and I analyzed distractor use for three standardized tests, we found that most items contained only one or two working distractors. Recommended.

  What is more popular in downtown San Diego?
  A. Horton Plaza
  B. Gaslite District

* Complex multiple-choice (Type K). A set of answers are combined to form the multiple-choice options. These items are usually more difficult, less discriminating, and require more development and administration time than MCQs. The National Board of Medical Examiners has discontinued the use of this format. Not recommended.

  What best represent San Diego's attractiveness?
  1. Climate and location
  2. Beaches and water sports
  3. Tourist attractions (e.g. Sea World)

  A. 1 & 2
  B. 2 & 3
  C. 1 & 3
  D. 1, 2, & 3

* Multiple true-false. Like an MCQ, but the examinee evaluates the truthfulness of each option. Each option is numbered because each is a true-false item, while the stem is not numbered because it is the stimulus. The obstacle to this format is lack of familiarity. Another problem may be the tendency for one option to influence the response to another. This format is efficient, and any MCQ can be presented this way. Recommended.

  What are major attractions in San Diego?
  1. Sailing and water sports
  2. Restaurants
  3. Shopping
  4. Tourists attractions (e.g. Sea World)

* Context-dependent item set (Testlet). Stimulus material and 5-12 related test items. Any item format may be used. There are four types of context-dependent item sets. (1) pictorial: pictures, maps, drawings, graphs, data, photographs, art, (2) interlinear: a passage with denotations which provide a basis for questioning, usually for the detection of grammatical, spelling, punctuation, and capitalization errors, (3) interpretive: for reading comprehension, and (4) problem-solving. The item set is inefficient to construct and administer, but it is versatile and able to measure higher-level thinking. Used extensively in certification and licensing testing programs. Recommended.

You are planning a one-week vacation to a West Coast city with a mild summer climate.

1. What is a reasonable estimate of daily food and lodging costs for two?

    A. $50
    B. $100
    C. $200

2. What is a reasonable estimate of the minimum weekly rental rate of a sub-compact car?

    A. $120
    B. $150
    C. $175

* Item Shell A successfully performing item from which the content has been removed, leaving the syntactic structure. A mathematics teacher wants to test problem solving in the context of financing the purchase of an automobile. The original successful question states:
What is the annual interest charge on an auto loan of $10,000 at 8.5%?

The teacher strips out the loan amount and percentage rate, making an item shell, and replaces them with new values to generate similar items. This example may appear facile, but sophisticated item shells for medical problem solving and pharmacy have been developed that tap aspects of higher level thinking. The ease of constructing the item shell counteracts "writer's block" and makes it appealing. Recommended.

Haladyna TM, Downing SM (1988a) A taxonomy of multiple-choice item- writing rules. Applied Measurement in Education, 1, 37-50 Haladyna TM, Downing SM (1988b) The validity of a taxonomy of multiple-choice item-writing rules. Applied Measurement in Education, 1, 51-78

Advances in Item Design, T Haladyna … Rasch Measurement Transactions, 1990, 4:2 p. 103-104

Rasch Publications
Rasch Measurement Transactions (free, online) Rasch Measurement research papers (free, online) Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Applying the Rasch Model 3rd. Ed., Bond & Fox Best Test Design, Wright & Stone
Rating Scale Analysis, Wright & Masters Introduction to Rasch Measurement, E. Smith & R. Smith Introduction to Many-Facet Rasch Measurement, Thomas Eckes Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. Statistical Analyses for Language Testers, Rita Green
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Journal of Applied Measurement Rasch models for measurement, David Andrich Constructing Measures, Mark Wilson Rasch Analysis in the Human Sciences, Boone, Stave, Yale
in Spanish: Análisis de Rasch para todos, Agustín Tristán Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez

To be emailed about new material on
please enter your email address here:

I want to Subscribe: & click below
I want to Unsubscribe: & click below

Please set your SPAM filter to accept emails from welcomes your comments:

Your email address (if you want us to reply):


ForumRasch Measurement Forum to discuss any Rasch-related topic

Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website,

Coming Rasch-related Events
Oct. 6 - Nov. 3, 2023, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Facets),
Oct. 12, 2023, Thursday 5 to 7 pm Colombian timeOn-line workshop: Deconstruyendo el concepto de validez y Discusiones sobre estimaciones de confiabilidad SICAPSI (J. Escobar, C.Pardo)
June 12 - 14, 2024, Wed.-Fri. 1st Scandinavian Applied Measurement Conference, Kristianstad University, Kristianstad, Sweden
Aug. 9 - Sept. 6, 2024, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets),


The URL of this page is