Assaults on the Rasch model employ three strategies:
a) Attack the Rasch model's mathematical formulation.
Rebuttal: Since the Rasch model is a mathematical
derivation from the requirement that linear measures be constructed
from ordered qualities, it has the same standing as Pythagoras'
theorem. It cannot be mathematically disproved.
b) Attack the Rasch model as a general solution to
measurement problems.
Rebuttal: Proponents of multi-parameter IRT and raw score
analysis claim that either Rasch is "inappropriate" or that it
produces the same results as their more complex (IRT) or less
complex (raw scores) analyses. Rasch does not incorporate features
of the data known to be sample sensitive (guessing,
discrimination), but does incorporate features known to be general
(item difficulty, non-linearity of raw scores due to range
restriction). If there is little guessing and item discriminations
are similar, then IRT and Rasch produce similar results. If all
scores are central, then raw score analysis and Rasch produce
similar results. But neither IRT nor raw score analysis implement
quality control of the construct and the data effectively.
c) Attack the Rasch model as a viable solution to a
particular measurement problem.
Rebuttal: Complaints such as "the Rasch model just doesn't
describe my data" or "My Rasch results don't make sense to me" are
not criticisms of the Rasch model, but of the data. Failure of a
data set to fit the Rasch model implies that the data do not
support the construction of measures suitable for stable inference.
Such data may be a compilation of historical incidents, but they
don't add up to anything that lies along any one line of inquiry.
Usually, if the data have any meaning at all, they can be segmented
into meaningful subsets that do fit the Rasch model and do support
inference.
Dickson and Köhler (D&K, 1996) use these strategies in commentary on their analysis of Functional Independence Measure (FIM) ratings. Here are their arguments:
Strategy (a) - the Rasch model is claimed to be mathematically flawed:
i) "The Rasch model is supposed to have probabilities of a
correct response running from zero to one... The method of scoring
the FIM does not allow a lower asymptote of 0, as the chance of
producing a correct score is 1/7 for any FIM item."
FIM items are rated on a 7 level rating scale. D&K describe a
model for 7 option dichotomous MCQ items. They rightly challenge
its use for this application. But reported analysis of the FIM is
based on Andrich's (1978) rating scale variant of the Rasch model,
which does have 0 lower asymptotes for all categories.
ii) "Any system of measurement based on probabilities must
necessarily be imprecise."
Every system of measurement is based on probabilities, and all are
more or less imprecise. If just one measurement is made with a
ruler, the assumption is "the most likely length of the object is
the number I just obtained." More sophisticated measurement
involves the estimation of standard errors, i.e., a distribution of
probable error. Many numbers, such as raw scores, are reported as
point estimates without their standard errors, but that does not
make them perfectly precise. The Rasch model does not introduce
probabilities or imprecision into the data, rather it capitalizes
on their presence in the data to construct a measurement
system.
iii) "Rasch analysis supposes a one-dimensional latent space or,
in other words, that a single continuum of performance is being
measured."
Here a virtue is presented as a vice. Physical measurement takes
great pains to measure one thing at a time. We don't want the
patient's temperature reading to be biased by his weight, or
height, or blood pressure. It is only when we have clearly
isolated one dimension that we can understand the meaning of the
measure, and then study how that measure relates to measures on
other dimensions. Rasch analysis enables items that participate in
the one dimension to define and construct it. Misfitting items can
be separated out for constructing other dimensions in other
analyses.
iv) "First order indices of item misfit, such as described by
Wright & Stone (1979), are said to be insensitive to
multidimensionality."
Data can misfit any model in infinitely many ways. Each fit
statistic is designed to focus on one aspect of misfit.
Pearson-type statistics, such as Wright's INFIT and OUTFIT, have
proved useful in detecting many types of aberrant behavior, but
they look at one person or one item at a time. Multidimensionality
requires a relationship among items, so that the items must be
examined two or more at a time. Though explicit statistical tests
of multidimensionality exist, factor analysis of residual matrices
is more informative.
v) "Rasch analysis does not allow a post hoc scaling."
By the term "post hoc scaling", D&K appear to mean the choice of
item weights that maximize the correlation(s) between the weighted
scores obtained by a particular sample of patients and indicator
variable(s). Though the analyst has freedom to weight the items,
and then subject the weighted items to Rasch analysis (RMT 8:4
p.403), this contradicts the intention that this set of items be
representative of all items, and this sample of persons be
representative of the relevant population. If it is discovered
that Item 5, say, is particularly highly correlated with an
indicator variable, then this suggests that constructing measures
based on a test of items like Item 5 may be useful. But weighting
items is analogous to differential weighting of the various parts
of the body. The tactic of down-weighting the extremities and
up-weighting the torso may make "body weight" correlate more highly
with nutritional status, but diminishes the overall usefulness of
"weight" as a quantity.
Strategy (b) - the Rasch model is claimed not to work for real data:
i) "The Rasch model assumes all items have equal discriminative
power."
Since the FIM items are intended to be representative of all
similar probes of functional independence, item characteristics
that contradict such intentions in the construction of measures are
not in accord with Rasch model specifications. Consequently the
Rasch model specifies that all items must have discriminations that
are equal enough to be regarded as the same. Quality control
misfit statistics flag items that fail to meet this measurement
specification. In practice, unequal discrimination is diagnostic
of various types of item malfunction and misinformation. Allowing
or parameterizing discrimination, which is a sample-dependent
index, limits the meaning of the measures to just that subset of
items and persons producing these particular data. This prevents
any general inferences over all possible items probing that
construct among all possible relevant persons.
ii) "The Rasch model assumes that item parameters are the same
across all samples."
Again virtue is presented as vice. Constant item parameters imply
a constant construct. Different item parameters across samples of
the relevant population imply that the construct has changed. Then
measures can't be compared across samples, and we are reduced to a
vague notion of what we are measuring. Rasch analysis specifies
that item parameters be sample independent, but provides many
methods for investigating where and by how much different samples
fail to conform to this basic principle of measurement.
iii) "No item fits the model exactly."
True, the Rasch model is a theoretical ideal - a definition of
measurement. Empirical data always fall short of theory - just as
manufactured rulers fall short of their theoretical archetype.
Carpenters are concerned only that their tape measures approximate
ideal measuring instruments usefully. The relevant question is "Do
the items fit the Rasch model well enough to construct useful
measures?"
iv) "There is no agreed standard method of interpretation of
misfit statistics."
There are numerous misfit statistics, each carefully formulated to
investigate a particular type of failure in the data to meet Rasch
model specifications. There is consensus about what each statistic
investigates in the data. Actions based on the size of misfit
statistics, however, depend on what purpose the measures are going
to serve. There is no general answer to the question, "What size
of misfit statistic disqualifies what part of the data as a basis
for measurement construction?" Barely noticeable misfit can be
screened out of well-controlled data, such as tests based on MCQ
items. Large amounts of misfit must be expected and accommodated
in largely uncontrollable data, such as when measuring the health
status of gun-shot victims. Large misfit will make the measuring
system less precise and less generalizable across different samples
of patients, but the measures continue to approximate an objective
and linear form.
v) "No description of the sample distribution exists in Rasch
analysis."
Since every person receives a measure, a standard error and as many
different fit statistics as the analyst cares to compute, there is
an abundance of sample distribution descriptions available.
vi) "the statistical methodologies employed should be
appropriate to the data."
Data are good servants, but bad masters. Our intention is to
construct linear, objective measures from the data, not to have the
data describe itself to us, in all its infinite complexity and
confusion. The statistical methodologies employed should be chosen
in accordance with our intentions.
Strategy (c) - Rasch does not "work" for our data:
i) "we have seen people who could... climb stairs [a difficult
item], but not swallow [an easy item]. These patients do not fit
the interval scale."
These data contradict Guttman's deterministic ordering of the
items, but are only very unexpected with a Rasch-constructed
interval scale. Data like these are predicted by the Rasch model
to occur occasionally, but are always unexpected when they do occur
- just as someone must win the lottery, but you would be surprised
if you won it.
ii) "There is strong correlation between transfer items [to bed,
to toilet, to tub], and very poor correlation between stair
climbing and eating."
Transfer items probe similar activities so may exhibit some local
dependency. More important, they are of about equal difficulty so
responses to them are likely to be very similar on the 1 to 7 FIM
scale and so highly correlated. Climbing is so much more difficult
than eating that patients with problems eating [1-5 on the scale]
probably can't climb at all [1]. Only when patients are eating
independently [6-7] do they attempt climbing [2-7]. Consequently,
since the reported data were for ratings on admission to
rehabilitation, when few people can climb stairs, there would be
very little variance in the stair climbing rating, and a poor
correlation with eating is to be expected.
iii) "The principal components analysis shows that for all
impairment groups ... the dimensionality is at least six. To
explain 80% of the variance three factors are required."
In D&K's principal components analysis of the motor FIM rating
data, the first six factors explained 64%, 10%, 7%, 4%, 3%, 3% of
the variance. The first factor is 6.4 times the second. Compare
this with the 6.8 value reported for Smith & Miao's (1994)
"perfect" Rasch-simulated data. The motor FIM is unidimensional
for practical purposes. Nevertheless, it is true that there are
shadows of multidimensionality in these data. D&K's second factor
is dominated by "bowel" and "bladder", which are known to misfit
the motor scale due to their socio-emotional aspects, but are kept
in the measurement system due their clinical importance.
iv) "Tests of dimensionality have not been reported by those
applying Rasch analysis to FIM data."
In fact, the dimensional nature of the FIM is very well understood
and described in many published papers. Perhaps, however, there is
an opportunity here for some energetic analyst to investigate the
FIM using different fit statistics.
John Michael Linacre
Andrich D (1978) A rating formulation for ordered response categories. Psychometrika 43: 561-573.
Dickson HG, Köhler F (1996) The multi-dimensionality of the FIM motor items precludes an interval scaling using Rasch analysis. Scandinavian Journal of Rehabilitation Medicine 26:159-162.
Smith RM, Miao CY (1994) Assessing unidimensionality for Rasch measurement. Ch. 18 in M. Wilson, Objective Measurement: Theory into Practice, Vol. 2. Norwood NJ: Ablex.
Wright BD, Stone MH (1979) Best Test Design. Chicago: MESA Press.
We like to hear reasoned objections to the Rasch model. They have prompted us to think in new ways and develop new techniques. For instance, McDonald's 1985 objection motivated a major advance in Rasch dimensionality analysis, the Principal Components Analysis (PCA) of residuals: "Unfortunately, some computer programs for fitting the Rasch model do not give any information about these. A choice would be to examine the covariance matrix of the item residuals, not the sizes of the residuals themselves, to see if the items are indeed conditionally uncorrelated, as required by the principle of local independence" (McDonald RP (1985) Factor Analysis and Related Methods. Hillsdale, NJ: Lawrence Erlbaum. p. 212).
Conspicuous objectors include Goldstein and Divgi:
Divgi D.R. (1986) Does the Rasch model really work for multiple choice items? Not if you look closely. Journal of Educational Measurement, 23 (4), pp 283-298
Goldstein, H. (1980). Dimensionality, bias, independence and measurement scale problems in latent trait test score models. British Journal of mathematical and statistical psychology 33: 234-246.
Goldstein's 1980 paper, as well as that of Divgi, 1986, have proved to be stimulating. They identify misunderstandings that can be usefully addressed. For instance, Divgi's objection that coin-tosses fit the Rasch model exemplifies a misunderstanding of what we expect to happen when a person's ability exactly matches an item's difficulty.
Goldstein's 1980 paper was primarily an attempt to undermine Bruce Choppin, see https://www.rasch.org/rmt/rmt84e.htm - Goldstein was the leading "statistician" mentioned in that piece. Goldstein continues to attack the Rasch model: Harvey Goldstein (2004) The Education World Cup: international comparisons of student achievement. Plenary talk to Association for Educational Assessment - Europe, Budapest, Nov. 4-6, 2004.
Objections based on correlations are discussed at www.rasch.org/rmt/rmt121b.htm
Objections based on test reliability are discussed at www.rasch.org/rmt/rmt113l.htm
An implied objection by a measurement theoretician is addressed at www.rasch.org/rmt/rmt52a.htm
Rasch philosophy is further discussed at www.rasch.org/rmt/rmt114h.htm
But we have observed that rebutting an objection rarely convinces an objector. It does, however, reassure those using or thinking of using Rasch methodology.
This comment also reflects our experience:
That is the true test of a brilliant theory, says a member of the Nobel Economics Prize committee. What first is thought to be wrong is later shown to be obvious. People see the world as they are trained to see it, and resist contrary explanations. That's what makes innovation unwelcome and discovery almost impossible. An important scientific innovation rarely makes its way by gradually winning over and converting its opponents, noted the physicist Max Planck. What does happen is that its opponents gradually die out and that the growing generation is familiarized with the [new] idea from the beginning. No wonder that the most profound discoveries are often made by the young or the outsider, neither of whom has yet learned to ignore the obvious or live with the accepted wisdom.
From a New York Times Editorial, "Naked Orthodoxy", October 17, 1985
"An important scientific innovation rarely makes its way
by gradually winning over and converting its opponents: it
rarely happens that Saul become Paul. What does happen
is that its opponents gradually die out and that the growing
generation is familiarized with the idea from the beginning:
another instance of the fact that the future lies with youth"
[Max Planck. Philosophy of Physics. W.W. Norton & Company
Inc., New York, 1936. p. 97].
The Rasch model cannot be "disproved"! Linacre J.M. … Rasch Measurement Transactions, 1996, 10:3 p. 512-514.
Forum | Rasch Measurement Forum to discuss any Rasch-related topic |
Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement
Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.
Coming Rasch-related Events | |
---|---|
Oct. 4 - Nov. 8, 2024, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
Jan. 17 - Feb. 21, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
May 16 - June 20, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
June 20 - July 18, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com |
Oct. 3 - Nov. 7, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
The URL of this page is www.rasch.org/rmt/rmt103e.htm
Website: www.rasch.org/rmt/contents.htm