"The Rasch model became popular in the United Kingdom in the 1970s but has now fallen into disrepute as a result of its premature and rather ill-considered application on a wide scale. [Georg] Rasch was able to show that if it is assumed that any guessing factor is held constant (as is the case if all items are of the same type), and if only items with equal discrimination are accepted, then some of the restrictions of classical psychometrics are loosened. In particular the model seemed to be able to generate, on computer, a single statistic [the difficulty] for each item which enabled that item to be used in a wide variety of different situations, regardless of which other items were included in the test and of the particular respondents involved. This approach showed considerable promise in the calibration of item banks. ..." (Rust & Golombok, p. 58)
"In the 1970s it proved particularly attractive in the United Kingdom to a group set up by the Department of Education and Science to monitor academic standards, which became known as the Assessment of Performance Unit (APU). It seemed that if an item bank of appropriate items for measuring, say, mathematics achievement at age 11, was available, then by taking random sets of items from the bank for different groups of 11-year-olds, it would be possible to compare schools, teaching methods and even, with the lapse of years between testing, the rise or fall of standards. However, this was not to be. It soon became apparent that there were several serious flaws within the model." (R&G p. 59)
"First, it was noticed that a set of items identified as suitable for the finished test seemed to be more or less the same whether classical item analysis or the Rasch technique was used in selection. This generated a paradox. If the items in a classical test and in the Rasch scaled test were the same, how could it be that Rasch's claim that the test was item free and respondent free was justified in the one case, but not in the other?"
Misunderstanding: It is not the items that are test-free, it is the measures!
"The success of the Rasch model depended on selecting a special set of items, those with equal discrimination, while this was not a requisite of classical item analysis, which merely required good discrimination. In particular, an item which correlated very highly with the total test score should be accepted by the classical method and not by the Rasch method."
Misunderstanding: There is never a set of items of "equal discrimination". There must be a set of items of similar discrimination in order for a construct to be stable.
"It was particularly important for the claim of 'respondent freeness' that the items should have the same discrimination as each other, as it was on this assumption that properties of the item for a group at one level of ability were extrapolated to a group at another level."
Misunderstanding: A noticeably unequal discrimination manifests itself as a relative change item difficulty across levels. Such an item is routinely eliminated at the pilot testing stage. In fact, items in the British bank were only given to children at or near to their intended levels and were continually monitored for fit from the item and child perspectives (Choppin, 1985).
"If we find the two techniques accepting the same items, therefore, there is the implication that the test of equality of discrimination of the items is in fact not sufficiently powerful to eliminate items that are atypical in their discrimination."
Misunderstanding: Well-constructed items can be expected to have nearly equally discriminations, and be accepted by all methods. In fact, Rasch techniques did eliminate "bad items" (Choppin, p. 85).
"This indeed turns out to be the case. The test applied for parallelism within Rasch is a test of equivalent slope with acceptance based on proving the null hypothesis, a notoriously non-powerful statistic. If the test is not sufficiently powerful, then many of the items accepted into the bank will in fact not be parallel. If this is the case, then the claims of respondent freeness and item freeness fail."
Misunderstanding: From one perspective, this is not a criticism of the Rasch model, but of the choice of fit test. This can be easily remedied if, indeed, it really was a flaw. From another perspective, the admission that ICCs are not parallel is an admission that the raw score is not a sufficient statistic [meaningful summary of performance] for the tests then in use, but the raw score was routinely used for pass-fail decisions in Britain.
"If the [Rasch] technique were to be used, public policy on schools would be based on mistaken evidence."
Misunderstanding: The problem was that the evidence was too good! (Choppin, p. 91) It upset political sensibilities and motivated a witch hunt.
"A further difficulty for the technique arose from its being treated as if considerations of content validity were not necessary. It may have been felt that so long as a wide variety of item topics was used, taking a consensus of all the views of various educational theorists and practitioners, parents, politicians, religious leaders, etc., into consideration, then the item banks on a subject should be fairly representative. However, it turns out that comparisons between different groups of respondents are particularly sensitive even to small changes in consensus, and not in a fashion that can be ignored."
Misunderstanding: The fact that we can't decide what, for example, comprises "math" is not only a problem for Rasch, but for all techniques, a problem that has emerged again in TIMSS.
"... the consensus view on arithmetic would almost certainly be different 10 years on; and so, therefore, would be the arithmetic syllabus. New ideas will have been introduced, and old ideas dropped. The child taking the test at this later date will, however, be taking a test designed 10 years earlier."
Misunderstanding: This definitely does not occur with a well-constructed Rasch item bank. Items are continually reviewed. Obsolete and over-exposed items are dropped. New items are introduced. Teachers have input into item selection. (Choppin, p. 89).
"It is important to point out that most of the criticisms of the Rasch model do not apply to the two- and three-parameter models. These make no assumptions about the equality of discriminability of the items, and the three-parameter model additionally takes into account the effects of guessing." (R&G p. 61)
Misunderstanding: The use of 2-PL and 3-PL models to equate tests across levels, across time, or to construct item banks is far more complex and problematic than Rasch. Other objections to Rasch, such as "the same selection of items as traditional techniques", and "lack of content definition" apply equally, if not more strongly, to these models.
"It was for reasons of this type that the use of Rasch scaling techniques was discredited in British education. The use of some of the sub-tests on the British Ability Scale (Elliot 1983) also seemed to be discredited as these similarly had depended on Rasch scaling techniques for their construction." (R&G p.61)
Now, at last, our critics respond to themselves!
"One important example of the use of IRT models in test construction has been the British Ability Scale. [Published in the USA as the Differential Ability Scales (DAS), RMT 4:2, p. 108.] For a long time in the late 1970s, at a time when the Rasch model had fallen into disfavor, it was felt that the use of this model in the construction of the scales had been a mistake. However, the test itself proved to have been so well constructed overall, and so useful in practice, that the use of the Rasch-constructed subscales of numerical and other computational abilities continued to be recommended, albeit with caution. In fact, they have proved to be particularly robust in their use for the clinical assessment of children, and the generalization across ability levels from different sub-sets of items has, in spite of many misgivings, been found to be informative. Elliot (1983) argues that many of the doubts about the Rasch model have arisen from situations where it has been applied to pre-existing data, or to test data that had not been specifically designed to fit the model. If the Rasch model is used carefully and with a full knowledge of its limitations, as in the development of the British Ability Scale, then it is possible to make use of its subject-free and item-free characteristics." (R&G p. 183)
"The proof of the pudding is in the eating!"
Choppin, B. (1985) Bruce Choppin on Measurement and Education. Evaluation in Education (now Internal Journal of Educational Rsearch), 9, 1.
Elliot, C.D. (1983) British Ability Scales Technical Handbook. Windsor: NFER-Nelson.
Rust, J. & Golombok, S. (1999) Modern Psychometrics: The Science of Psychological Assessment. 2nd Ed. New York: Routledge.
Historic Misunderstandings of the Rasch Model. Linacre, J.M. … Rasch Measurement Transactions, 2000, 14:2 p.748-9
Forum | Rasch Measurement Forum to discuss any Rasch-related topic |
Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement
Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.
Coming Rasch-related Events | |
---|---|
May 17 - June 21, 2024, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
June 12 - 14, 2024, Wed.-Fri. | 1st Scandinavian Applied Measurement Conference, Kristianstad University, Kristianstad, Sweden http://www.hkr.se/samc2024 |
June 21 - July 19, 2024, Fri.-Fri. | On-line workshop: Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com |
Aug. 5 - Aug. 6, 2024, Fri.-Fri. | 2024 Inaugural Conference of the Society for the Study of Measurement (Berkeley, CA), Call for Proposals |
Aug. 9 - Sept. 6, 2024, Fri.-Fri. | On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com |
Oct. 4 - Nov. 8, 2024, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
Jan. 17 - Feb. 21, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
May 16 - June 20, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
June 20 - July 18, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com |
Oct. 3 - Nov. 7, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
The URL of this page is www.rasch.org/rmt/rmt142f.htm
Website: www.rasch.org/rmt/contents.htm