The Rasch Model cannot be "Disproved"!

Assaults on the Rasch model employ three strategies:

a) Attack the Rasch model's mathematical formulation.
Rebuttal: Since the Rasch model is a mathematical derivation from the requirement that linear measures be constructed from ordered qualities, it has the same standing as Pythagoras' theorem. It cannot be mathematically disproved.

b) Attack the Rasch model as a general solution to measurement problems.
Rebuttal: Proponents of multi-parameter IRT and raw score analysis claim that either Rasch is "inappropriate" or that it produces the same results as their more complex (IRT) or less complex (raw scores) analyses. Rasch does not incorporate features of the data known to be sample sensitive (guessing, discrimination), but does incorporate features known to be general (item difficulty, non-linearity of raw scores due to range restriction). If there is little guessing and item discriminations are similar, then IRT and Rasch produce similar results. If all scores are central, then raw score analysis and Rasch produce similar results. But neither IRT nor raw score analysis implement quality control of the construct and the data effectively.

c) Attack the Rasch model as a viable solution to a particular measurement problem.
Rebuttal: Complaints such as "the Rasch model just doesn't describe my data" or "My Rasch results don't make sense to me" are not criticisms of the Rasch model, but of the data. Failure of a data set to fit the Rasch model implies that the data do not support the construction of measures suitable for stable inference. Such data may be a compilation of historical incidents, but they don't add up to anything that lies along any one line of inquiry. Usually, if the data have any meaning at all, they can be segmented into meaningful subsets that do fit the Rasch model and do support inference.

A Case Study

Dickson and Köhler (D&K, 1996) use these strategies in commentary on their analysis of Functional Independence Measure (FIM) ratings. Here are their arguments:

Strategy (a) - the Rasch model is claimed to be mathematically flawed:

i) "The Rasch model is supposed to have probabilities of a correct response running from zero to one... The method of scoring the FIM does not allow a lower asymptote of 0, as the chance of producing a correct score is 1/7 for any FIM item."
FIM items are rated on a 7 level rating scale. D&K describe a model for 7 option dichotomous MCQ items. They rightly challenge its use for this application. But reported analysis of the FIM is based on Andrich's (1978) rating scale variant of the Rasch model, which does have 0 lower asymptotes for all categories.

ii) "Any system of measurement based on probabilities must necessarily be imprecise."
Every system of measurement is based on probabilities, and all are more or less imprecise. If just one measurement is made with a ruler, the assumption is "the most likely length of the object is the number I just obtained." More sophisticated measurement involves the estimation of standard errors, i.e., a distribution of probable error. Many numbers, such as raw scores, are reported as point estimates without their standard errors, but that does not make them perfectly precise. The Rasch model does not introduce probabilities or imprecision into the data, rather it capitalizes on their presence in the data to construct a measurement system.

iii) "Rasch analysis supposes a one-dimensional latent space or, in other words, that a single continuum of performance is being measured."
Here a virtue is presented as a vice. Physical measurement takes great pains to measure one thing at a time. We don't want the patient's temperature reading to be biased by his weight, or height, or blood pressure. It is only when we have clearly isolated one dimension that we can understand the meaning of the measure, and then study how that measure relates to measures on other dimensions. Rasch analysis enables items that participate in the one dimension to define and construct it. Misfitting items can be separated out for constructing other dimensions in other analyses.

iv) "First order indices of item misfit, such as described by Wright & Stone (1979), are said to be insensitive to multidimensionality."
Data can misfit any model in infinitely many ways. Each fit statistic is designed to focus on one aspect of misfit. Pearson-type statistics, such as Wright's INFIT and OUTFIT, have proved useful in detecting many types of aberrant behavior, but they look at one person or one item at a time. Multidimensionality requires a relationship among items, so that the items must be examined two or more at a time. Though explicit statistical tests of multidimensionality exist, factor analysis of residual matrices is more informative.

v) "Rasch analysis does not allow a post hoc scaling."
By the term "post hoc scaling", D&K appear to mean the choice of item weights that maximize the correlation(s) between the weighted scores obtained by a particular sample of patients and indicator variable(s). Though the analyst has freedom to weight the items, and then subject the weighted items to Rasch analysis (RMT 8:4 p.403), this contradicts the intention that this set of items be representative of all items, and this sample of persons be representative of the relevant population. If it is discovered that Item 5, say, is particularly highly correlated with an indicator variable, then this suggests that constructing measures based on a test of items like Item 5 may be useful. But weighting items is analogous to differential weighting of the various parts of the body. The tactic of down-weighting the extremities and up-weighting the torso may make "body weight" correlate more highly with nutritional status, but diminishes the overall usefulness of "weight" as a quantity.

Strategy (b) - the Rasch model is claimed not to work for real data:

i) "The Rasch model assumes all items have equal discriminative power."
Since the FIM items are intended to be representative of all similar probes of functional independence, item characteristics that contradict such intentions in the construction of measures are not in accord with Rasch model specifications. Consequently the Rasch model specifies that all items must have discriminations that are equal enough to be regarded as the same. Quality control misfit statistics flag items that fail to meet this measurement specification. In practice, unequal discrimination is diagnostic of various types of item malfunction and misinformation. Allowing or parameterizing discrimination, which is a sample-dependent index, limits the meaning of the measures to just that subset of items and persons producing these particular data. This prevents any general inferences over all possible items probing that construct among all possible relevant persons.

ii) "The Rasch model assumes that item parameters are the same across all samples."
Again virtue is presented as vice. Constant item parameters imply a constant construct. Different item parameters across samples of the relevant population imply that the construct has changed. Then measures can't be compared across samples, and we are reduced to a vague notion of what we are measuring. Rasch analysis specifies that item parameters be sample independent, but provides many methods for investigating where and by how much different samples fail to conform to this basic principle of measurement.

iii) "No item fits the model exactly."
True, the Rasch model is a theoretical ideal - a definition of measurement. Empirical data always fall short of theory - just as manufactured rulers fall short of their theoretical archetype. Carpenters are concerned only that their tape measures approximate ideal measuring instruments usefully. The relevant question is "Do the items fit the Rasch model well enough to construct useful measures?"

iv) "There is no agreed standard method of interpretation of misfit statistics."
There are numerous misfit statistics, each carefully formulated to investigate a particular type of failure in the data to meet Rasch model specifications. There is consensus about what each statistic investigates in the data. Actions based on the size of misfit statistics, however, depend on what purpose the measures are going to serve. There is no general answer to the question, "What size of misfit statistic disqualifies what part of the data as a basis for measurement construction?" Barely noticeable misfit can be screened out of well-controlled data, such as tests based on MCQ items. Large amounts of misfit must be expected and accommodated in largely uncontrollable data, such as when measuring the health status of gun-shot victims. Large misfit will make the measuring system less precise and less generalizable across different samples of patients, but the measures continue to approximate an objective and linear form.

v) "No description of the sample distribution exists in Rasch analysis."
Since every person receives a measure, a standard error and as many different fit statistics as the analyst cares to compute, there is an abundance of sample distribution descriptions available.

vi) "the statistical methodologies employed should be appropriate to the data."
Data are good servants, but bad masters. Our intention is to construct linear, objective measures from the data, not to have the data describe itself to us, in all its infinite complexity and confusion. The statistical methodologies employed should be chosen in accordance with our intentions.

Strategy (c) - Rasch does not "work" for our data:

i) "we have seen people who could... climb stairs [a difficult item], but not swallow [an easy item]. These patients do not fit the interval scale."
These data contradict Guttman's deterministic ordering of the items, but are only very unexpected with a Rasch-constructed interval scale. Data like these are predicted by the Rasch model to occur occasionally, but are always unexpected when they do occur - just as someone must win the lottery, but you would be surprised if you won it.

ii) "There is strong correlation between transfer items [to bed, to toilet, to tub], and very poor correlation between stair climbing and eating."
Transfer items probe similar activities so may exhibit some local dependency. More important, they are of about equal difficulty so responses to them are likely to be very similar on the 1 to 7 FIM scale and so highly correlated. Climbing is so much more difficult than eating that patients with problems eating [1-5 on the scale] probably can't climb at all [1]. Only when patients are eating independently [6-7] do they attempt climbing [2-7]. Consequently, since the reported data were for ratings on admission to rehabilitation, when few people can climb stairs, there would be very little variance in the stair climbing rating, and a poor correlation with eating is to be expected.

iii) "The principal components analysis shows that for all impairment groups ... the dimensionality is at least six. To explain 80% of the variance three factors are required."
In D&K's principal components analysis of the motor FIM rating data, the first six factors explained 64%, 10%, 7%, 4%, 3%, 3% of the variance. The first factor is 6.4 times the second. Compare this with the 6.8 value reported for Smith & Miao's (1994) "perfect" Rasch-simulated data. The motor FIM is unidimensional for practical purposes. Nevertheless, it is true that there are shadows of multidimensionality in these data. D&K's second factor is dominated by "bowel" and "bladder", which are known to misfit the motor scale due to their socio-emotional aspects, but are kept in the measurement system due their clinical importance.

iv) "Tests of dimensionality have not been reported by those applying Rasch analysis to FIM data."
In fact, the dimensional nature of the FIM is very well understood and described in many published papers. Perhaps, however, there is an opportunity here for some energetic analyst to investigate the FIM using different fit statistics.

John Michael Linacre

Andrich D (1978) A rating formulation for ordered response categories. Psychometrika 43: 561-573.

Dickson HG, Köhler F (1996) The multi-dimensionality of the FIM motor items precludes an interval scaling using Rasch analysis. Scandinavian Journal of Rehabilitation Medicine 26:159-162.

Smith RM, Miao CY (1994) Assessing unidimensionality for Rasch measurement. Ch. 18 in M. Wilson, Objective Measurement: Theory into Practice, Vol. 2. Norwood NJ: Ablex.

Wright BD, Stone MH (1979) Best Test Design. Chicago: MESA Press.


We like to hear reasoned objections to the Rasch model. They have prompted us to think in new ways and develop new techniques. For instance, McDonald's 1985 objection motivated a major advance in Rasch dimensionality analysis, the Principal Components Analysis (PCA) of residuals: "Unfortunately, some computer programs for fitting the Rasch model do not give any information about these. A choice would be to examine the covariance matrix of the item residuals, not the sizes of the residuals themselves, to see if the items are indeed conditionally uncorrelated, as required by the principle of local independence" (McDonald RP (1985) Factor Analysis and Related Methods. Hillsdale, NJ: Lawrence Erlbaum. p. 212).

Conspicuous objectors include Goldstein and Divgi:
Divgi D.R. (1986) Does the Rasch model really work for multiple choice items? Not if you look closely. Journal of Educational Measurement, 23 (4), pp 283-298
Goldstein, H. (1980). Dimensionality, bias, independence and measurement scale problems in latent trait test score models. British Journal of mathematical and statistical psychology 33: 234-246.

Goldstein's 1980 paper, as well as that of Divgi, 1986, have proved to be stimulating. They identify misunderstandings that can be usefully addressed. For instance, Divgi's objection that coin-tosses fit the Rasch model exemplifies a misunderstanding of what we expect to happen when a person's ability exactly matches an item's difficulty.

Goldstein's 1980 paper was primarily an attempt to undermine Bruce Choppin, see https://www.rasch.org/rmt/rmt84e.htm - Goldstein was the leading "statistician" mentioned in that piece. Goldstein continues to attack the Rasch model: Harvey Goldstein (2004) The Education World Cup: international comparisons of student achievement. Plenary talk to Association for Educational Assessment - Europe, Budapest, Nov. 4-6, 2004.

Objections based on correlations are discussed at www.rasch.org/rmt/rmt121b.htm

Objections based on test reliability are discussed at www.rasch.org/rmt/rmt113l.htm

An implied objection by a measurement theoretician is addressed at www.rasch.org/rmt/rmt52a.htm

Rasch philosophy is further discussed at www.rasch.org/rmt/rmt114h.htm

But we have observed that rebutting an objection rarely convinces an objector. It does, however, reassure those using or thinking of using Rasch methodology.

This comment also reflects our experience:
That is the true test of a brilliant theory, says a member of the Nobel Economics Prize committee. What first is thought to be wrong is later shown to be obvious. People see the world as they are trained to see it, and resist contrary explanations. That's what makes innovation unwelcome and discovery almost impossible. An important scientific innovation rarely makes its way by gradually winning over and converting its opponents, noted the physicist Max Planck. What does happen is that its opponents gradually die out and that the growing generation is familiarized with the [new] idea from the beginning. No wonder that the most profound discoveries are often made by the young or the outsider, neither of whom has yet learned to ignore the obvious or live with the accepted wisdom.
From a New York Times Editorial, "Naked Orthodoxy", October 17, 1985

"An important scientific innovation rarely makes its way by gradually winning over and converting its opponents: it rarely happens that Saul become Paul. What does happen is that its opponents gradually die out and that the growing generation is familiarized with the idea from the beginning: another instance of the fact that the future lies with youth"
[Max Planck. Philosophy of Physics. W.W. Norton & Company Inc., New York, 1936. p. 97].


The Rasch model cannot be "disproved"! Linacre J.M. … Rasch Measurement Transactions, 1996, 10:3 p. 512-514.



Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse Rasch Measurement Theory Analysis in R, Wind, Hua Applying the Rasch Model in Social Sciences Using R, Lamprianou El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Rasch Models for Measurement, David Andrich Constructing Measures, Mark Wilson Best Test Design - free, Wright & Stone
Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias Diseño de Mejores Pruebas - free, Spanish Best Test Design A Course in Rasch Measurement Theory, Andrich, Marais Rasch Models in Health, Christensen, Kreiner, Mesba Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen
Rasch Books and Publications: Winsteps and Facets
Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes Rasch Models for Solving Measurement Problems (Facets), George Engelhard, Jr. & Jue Wang Statistical Analyses for Language Testers (Facets), Rita Green Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind Rasch Measurement: Applications, Khine Winsteps Tutorials - free
Facets Tutorials - free
Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan

To be emailed about new material on www.rasch.org
please enter your email address here:

I want to Subscribe: & click below
I want to Unsubscribe: & click below

Please set your SPAM filter to accept emails from Rasch.org

www.rasch.org welcomes your comments:

Your email address (if you want us to reply):

 

ForumRasch Measurement Forum to discuss any Rasch-related topic

Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Oct. 4 - Nov. 8, 2024, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Jan. 17 - Feb. 21, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
May 16 - June 20, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com

 

The URL of this page is www.rasch.org/rmt/rmt103e.htm

Website: www.rasch.org/rmt/contents.htm