The Length of a Logit

In fitting data to the Rasch model in order to use them to establish measurement, our aim is to construct a system of invariant linear measures, to estimate their precision (standard errors) and to assess the degree to which these measures and their errors are confirmed in the data (accuracy, i.e., fit statistics). This is quite different from that "assignment of numbers to observed phenomena" so often cited as a definition of measurement in the social sciences. When "measurement" is by mere "assignment", the resulting numerical labels do not maintain their implied arithmetical meaning - the meaning necessary to use them to calculate differences, means, variances or regressions. This unnecessary numerical ambiguity causes social scientists a great deal of uncertainty and confusion.

A more useful aim for the construction of measurement is to enable the same kind of quantitative reasoning for the social sciences that has been so productive in the evolution of the physical sciences - namely, the careful construction and maintenance of invariant linear measures. Physicists do not dwell on linearity because they take for granted that their instruments maintain it. Everyone expects that "labelling" a mountain as "20,000 feet high" carries with it the well-known and universal measurement properties of length. In contrast, "labelling" a student as "3.0 in grade point average" carries with it no more than an ordinal classification of some grade of "B" in some particular context. This is clearly not a linear measure, and most certainly not invariant over teachers, let alone schools.

Physical units are defined prior to the current experiment, and then carefully implemented in the design of the instruments used to make the observations, such as yardsticks for measuring length. The resulting experimental observations are recorded as counts of well-defined, carefully maintained, and entirely artificial measurement units (such as inches) from arbitrary origins (such as one end of a yardstick). Since a different measurement unit, such as centimeters, produces a different count of length, even the count is an abstraction.

Physicists are unequivocal as to the quantitative length of an inch, but its substantive implication, its qualitative meaning, depends on the context: one inch added to the height of a mole-hill has different meaning than one inch added to the height of a mountain.

Rasch measurement can lead to the same kind of arithmetical numbers that physicists reason with. But to do this, we must address explicitly the first step in the construction of measurement - a step no longer explicit in most physical measures.

The initial experimental observations in the construction of any measurement system are counts of the occurrence of observable events, such as the number of correct responses achieved on a test. Once the indicative events have been defined, these counts are based on concrete observations. The only way to change one of these counts is to change the experiment, say, by dropping one item from the test and then recounting the correct responses.

The mathematical unit of Rasch measurement, the log-odds unit or "logit", is defined prior to the experiment. One logit is the distance along the line of the variable that increases the odds of observing the event specified in the measurement model by a factor of 2.718.., the value of "e", the base of "natural" or Napierian logarithms used for the calculation of "log-" odds. All logits are the same length with respect to this change in the odds of observing the indicative event.

As with an inch, the substantive length of a logit, i.e., what a logit means in terms of the composition of the underlying variable in any particular application, is not pre-determined. When benchmark elements are chosen to give meaning to a variable, the number of logits estimated between a pair of benchmarks depends on the particular distribution of counts obtained in the current experiment. The substantive length of the logit depends not only on its numerical value, but also on the conceptual distance between the benchmark elements. If a second experiment should lead to a different distribution of counts, then the number of logits between the pair of benchmarks will become different, even though their conceptual distance might remain unaltered. As a result, it is useful to represent the results of the measurement process in terms of a linear transformation of the initial logits which preserves the conceptual structure of the measurement system - the differences between benchmarks. Considerations along these lines are explained and applied by Wright & Stone (1979, Chap.8).

In order to expedite the realization that the substantive length of a logit is affected by the distribution of the observations, consider a judging situation. The more discriminating the judges, the more precisely and consistently will they assign ratings to performances, and the more peaked will be the distribution of the ratings given by each judge to each level of performance. The more peaked the distribution of observations, the larger the number of logits between levels of performance. This occurs irrespective of the ability of the persons, difficulty of the items, severity of the judges or construction of the rating scale.

The manner in which a particular rating scale works affects the distribution of responses across the categories, and so also affects the substantive length of the logit. The rating scale is part of the instrumentation of the test. Changing the form of the rating scale changes the experiment and changes the substantive length of the logit. If observations of persons are made on a three category scale, a particular set of logit measures will be estimated. Then, if the top two categories are combined into one category, making the test dichotomous, another set of logit measures will be estimated. The relative utility of these alternative sets of measures will depend on the fit and separation statistics they produce and on the meaning and purpose of the test. In general, the standard deviations of the two sets of measures for the same persons will differ, showing that, since it is not useful to think that the person abilities have changed, we must rescale our measurement units accordingly.

This realization alerts us to a step we must take which precedes those we see physicists taking. Since logit measures are estimated from the counts observed in the current experiment, the meaning of a logit in terms of the underlying variable need not be invariant between experiments. Inches are implemented so that every inch has the same length. Every logit has the same mathematical length in terms of log-odds, but not necessarily the same substantive length in terms of what it implies about the distances between the defining benchmarks of the underlying variable. Any linear transformation of the logit maintains its equal-interval status. Often the strict probabilistic interpretation of measurement units is of only incidental interest, particularly for rating scale data. Then the analyst is free to choose the most useful linear rescaling of the logit. A convenient transformation can be to rescale the lowest observable person measure to 0, and the highest to 100, so that reported measures can be interpreted as a kind of "percentage" progress up the effective range of the measurement instrument.

Since our intention is to maintain, by choice of item and response format, units of equal substantive length for tests constructed to measure what we intend to be the same underlying variable, the comparison of measures from tests intended to be commensurate requires, through an equating step, not only adjustment for differences in local origin, but also for variation in the substantive length of the measurement unit we have constructed for the underlying variable.

Equating of the interval scales constructed from two tests is confirmed when plots of the measures of elements common to the tests follow an identity line stochastically. When this verification fails, a necessary step is to linearly adjust the relative lengths of the logits constructed by the two tests (and intended to be based on the same underlying variable) by the ratio of the observed standard deviations of the measures common to those tests, so that both tests measure in the units with the same substantive meaning.

The "Length" of a Logit. Linacre J.M. & Wright B.D. … Rasch Measurement Transactions, 1989, 3:2 p.54-55

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn. George Engelhard, Jr. & Jue Wang	Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Apr. 21 - 22, 2025, Mon.-Tue.	International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025	On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025	On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
July 21 - 23, 2025, Mon.-Wed.	Pacific Rim Objective Measurement Symposium (PROMS) 2025, www.proms2025.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com