In fitting data to the Rasch model in order to use them to establish measurement, our aim is to construct a system of invariant linear measures, to estimate their precision (standard errors) and to assess the degree to which these measures and their errors are confirmed in the data (accuracy, i.e., fit statistics). This is quite different from that "assignment of numbers to observed phenomena" so often cited as a definition of measurement in the social sciences. When "measurement" is by mere "assignment", the resulting numerical labels do not maintain their implied arithmetical meaning - the meaning necessary to use them to calculate differences, means, variances or regressions. This unnecessary numerical ambiguity causes social scientists a great deal of uncertainty and confusion.
A more useful aim for the construction of measurement is to enable the same kind of quantitative reasoning for the social sciences that has been so productive in the evolution of the physical sciences - namely, the careful construction and maintenance of invariant linear measures. Physicists do not dwell on linearity because they take for granted that their instruments maintain it. Everyone expects that "labelling" a mountain as "20,000 feet high" carries with it the well-known and universal measurement properties of length. In contrast, "labelling" a student as "3.0 in grade point average" carries with it no more than an ordinal classification of some grade of "B" in some particular context. This is clearly not a linear measure, and most certainly not invariant over teachers, let alone schools.
Physical units are defined prior to the current experiment, and then carefully implemented in the design of the instruments used to make the observations, such as yardsticks for measuring length. The resulting experimental observations are recorded as counts of well-defined, carefully maintained, and entirely artificial measurement units (such as inches) from arbitrary origins (such as one end of a yardstick). Since a different measurement unit, such as centimeters, produces a different count of length, even the count is an abstraction.
Physicists are unequivocal as to the quantitative length of an inch, but its substantive implication, its qualitative meaning, depends on the context: one inch added to the height of a mole-hill has different meaning than one inch added to the height of a mountain.
Rasch measurement can lead to the same kind of arithmetical numbers that physicists reason with. But to do this, we must address explicitly the first step in the construction of measurement - a step no longer explicit in most physical measures.
The initial experimental observations in the construction of any measurement system are counts of the occurrence of observable events, such as the number of correct responses achieved on a test. Once the indicative events have been defined, these counts are based on concrete observations. The only way to change one of these counts is to change the experiment, say, by dropping one item from the test and then recounting the correct responses.
The mathematical unit of Rasch measurement, the log-odds unit or "logit", is defined prior to the experiment. One logit is the distance along the line of the variable that increases the odds of observing the event specified in the measurement model by a factor of 2.718.., the value of "e", the base of "natural" or Napierian logarithms used for the calculation of "log-" odds. All logits are the same length with respect to this change in the odds of observing the indicative event.
As with an inch, the substantive length of a logit, i.e., what a logit means in terms of the composition of the underlying variable in any particular application, is not pre-determined. When benchmark elements are chosen to give meaning to a variable, the number of logits estimated between a pair of benchmarks depends on the particular distribution of counts obtained in the current experiment. The substantive length of the logit depends not only on its numerical value, but also on the conceptual distance between the benchmark elements. If a second experiment should lead to a different distribution of counts, then the number of logits between the pair of benchmarks will become different, even though their conceptual distance might remain unaltered. As a result, it is useful to represent the results of the measurement process in terms of a linear transformation of the initial logits which preserves the conceptual structure of the measurement system - the differences between benchmarks. Considerations along these lines are explained and applied by Wright & Stone (1979, Chap.8).
In order to expedite the realization that the substantive length of a logit is affected by the distribution of the observations, consider a judging situation. The more discriminating the judges, the more precisely and consistently will they assign ratings to performances, and the more peaked will be the distribution of the ratings given by each judge to each level of performance. The more peaked the distribution of observations, the larger the number of logits between levels of performance. This occurs irrespective of the ability of the persons, difficulty of the items, severity of the judges or construction of the rating scale.
The manner in which a particular rating scale works affects the distribution of responses across the categories, and so also affects the substantive length of the logit. The rating scale is part of the instrumentation of the test. Changing the form of the rating scale changes the experiment and changes the substantive length of the logit. If observations of persons are made on a three category scale, a particular set of logit measures will be estimated. Then, if the top two categories are combined into one category, making the test dichotomous, another set of logit measures will be estimated. The relative utility of these alternative sets of measures will depend on the fit and separation statistics they produce and on the meaning and purpose of the test. In general, the standard deviations of the two sets of measures for the same persons will differ, showing that, since it is not useful to think that the person abilities have changed, we must rescale our measurement units accordingly.
This realization alerts us to a step we must take which precedes those we see physicists taking. Since logit measures are estimated from the counts observed in the current experiment, the meaning of a logit in terms of the underlying variable need not be invariant between experiments. Inches are implemented so that every inch has the same length. Every logit has the same mathematical length in terms of log-odds, but not necessarily the same substantive length in terms of what it implies about the distances between the defining benchmarks of the underlying variable. Any linear transformation of the logit maintains its equal-interval status. Often the strict probabilistic interpretation of measurement units is of only incidental interest, particularly for rating scale data. Then the analyst is free to choose the most useful linear rescaling of the logit. A convenient transformation can be to rescale the lowest observable person measure to 0, and the highest to 100, so that reported measures can be interpreted as a kind of "percentage" progress up the effective range of the measurement instrument.
Since our intention is to maintain, by choice of item and response format, units of equal substantive length for tests constructed to measure what we intend to be the same underlying variable, the comparison of measures from tests intended to be commensurate requires, through an equating step, not only adjustment for differences in local origin, but also for variation in the substantive length of the measurement unit we have constructed for the underlying variable.
Equating of the interval scales constructed from two tests is confirmed when plots of the measures of elements common to the tests follow an identity line stochastically. When this verification fails, a necessary step is to linearly adjust the relative lengths of the logits constructed by the two tests (and intended to be based on the same underlying variable) by the ratio of the observed standard deviations of the measures common to those tests, so that both tests measure in the units with the same substantive meaning.
John M. Linacre and Benjamin D. Wright
Wright, B.D. & Stone, M.H. (1979). Best Test Design. Chicago: MESA Press.
The "Length" of a Logit. Linacre J.M. & Wright B.D. Rasch Measurement Transactions, 1989, 3:2 p.54-55
Forum | Rasch Measurement Forum to discuss any Rasch-related topic |
Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement
Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.
Coming Rasch-related Events | |
---|---|
Oct. 4 - Nov. 8, 2024, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
Jan. 17 - Feb. 21, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
May 16 - June 20, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
June 20 - July 18, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com |
Oct. 3 - Nov. 7, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
The URL of this page is www.rasch.org/rmt/rmt32b.htm
Website: www.rasch.org/rmt/contents.htm