RMT 21:2 Notes and Quotes

The Difficulty of an MCQ Item

"We shall define the difficulty of a multiple choice test item as being a function of that proportion of individuals answering the item which knows which of the alternatives is the best answer. This definition involves the assumption that there is some objective criterion which determines that one particular alternative is a better answer to the item than any of the others."

Paul Horst, "The Difficulty of a Multiple Choice Test Item" Journal of Educational Psychology, xxiv (1933), 229-232.


Effect of Misfit on Measures

Question: I have a fairly large sample of 5,000 subjects. As an experiment I ran the calibration with all subjects and then again with the 500 worst fitting (OUTFIT meansquare range from 2 to 9.9) subjects excluded. There was some change in parameter estimates and item fit, but not huge, not what I expected. This is comforting, but has this been the experience of others or is it probably a quirk of my data or the large sample size?

Answer: Yes, your experience with trimming misfitting persons is typical. You are removing the most unpredictable, the noisiest part of the data, so the remaining data must have a slightly more orderly, closerto- Guttman pattern. So expect to see a slight increase in the logit range of the measure estimates when you trim the data. But it is unusual for this slightly wider spread of the measures to have any substantive implications except where subject measures are adjacent to pre-set cut-points.


Fit Statistics: Size or Significance?

Question: Which one is most relevant to decide if an item is misfitting, the size of the mean-square statistic or its statistical significance?

Answer: When considering measurement dilemmas, it is always helpful to think of the equivalent situation in physical measurement. The statistical significance reports how certain we are that the measurement misrepresents with the data - but not how serious the misrepresentation is. The mean-square reports the size of the misrepresentation, but not how certain we are that this isn't merely reflecting the random component in the data predicted by the Rasch model.

In physical measurement, we are usually more concerned about the size of any possible misrepresentation ("measure twice, cut once") than about how certain we are that there is a misrepresentation ("I'm sure I measured it right, so there's no need to measure it again!"). If size of misrepresentation is more important than certainty, then the size of the mean-square is more crucial than its significance. But much of statistics is based on hypothesis testing, where only the probability of misrepresentation is seriously considered.


RMT 21:2 Notes and Quotes … Rasch Measurement Transactions, 2007, 21:2


The URL of this page is www.rasch.org/rmt/rmt212d.htm

Website: www.rasch.org/rmt/contents.htm