Roberts (1994, pp. 625-6) points out that, in measuring:
"...one seeks to assign numbers to objects so that a is judged louder than b if and only if the number assigned to a is greater than the number assigned to b. Such a mapping from objects to numbers is called a homomorphism from the observed relation to the numerical relation. In measurement theory, scales are identified with homomorphisms. Formally, an admissible transformation of a scale is then a transformation of the numbers assigned so that one gets another homomorphism."
Thus, Roberts (p. 626) continues:
"One of the goals of research in the theory of measurement is to develop a collection of tools which can be used to determine what assertions can meaningfully be made and what conclusions can meaningfully be drawn, using scales of measurement. A statement involving scales of measurement is called meaningful if its truth value is unchanged whenever every scale in the statement is modified by an admissible transformation. This definition goes back to Suppes (1959) and Suppes & Zinnes (1963). (While it is not mentioned explicitly in the work of Stevens, it is inherent in his treatment of admissible transformations; see Mundy (1986).)"
Roberts (p. 627) concludes his summary of measurement theory saying:
"The notion of meaningfulness is concerned with which assertions it makes sense to make, and which ones are just artifacts of the particular version of the scale of measurement that happens to be in use. This notion of meaningfulness is closely related to the concept of invariance in classical geometry."
"The definitions we have given are reasonably well accepted, at least to the extent that it is widely agreed that 'invariance' is a desirable condition and that it is implied by 'meaningfulness'."
The Consequences of Ordinal Status for Measurement
Roberts (pp. 628-30) then gives a list of meaningful and meaningless statements, and shows the logical fallacy involved in averaging and comparing raw scores. One example shows that we have the strong tendency to treat ordinal scales as interval, contrary to the empirical fact that the spacing between the categories is unknown. When this fact of unknown and likely variable spacing is recognized, we see that the categories may be acceptably scored by any algorithm that maintains their order, no matter how different the spacing between them. That is, ordinal homomorphisms do not restrict the spacing of the categories, but only their order, since the spacing is unknown.
Roberts gives the example of two groups of three individuals each rated once on a five-point scale, scored, as is commonly deemed the natural way of proceeding, as 1, 2, 3, 4, 5. Group 1 scores 4, 4, and 4; group 2 scores 5, 4, and 1. Group 1's mean of 4 is higher than group 2's mean of 3.33.
Now, given that we recognize and accept our scale's status as ordinal, the ratings may be transformed in any way that invariantly preserves their order. A logical and scientific way of proceeding to test the hypothesis of the group difference would then require that we try out different admissible transformations of the scale to see if we obtain the same result. Roberts accordingly rescores 5s as 200, 4s as 100, 3s as 50, 2s as 20, and 1s as 3. Now group 1 has a mean of 100, and group 2 has a mean of 101.
The change in the ordering of the groups in the context of an admissible transformation of the raw scores renders any test of a hypothetical average difference between the groups undecidable; the failure of invariance makes any statement about the groups' order meaningless. Roberts notes that comparing the group medians would be meaningful, since the order would always be preserved across admissible transformations.
Though Roberts does not go into it, we see in this example why ordinal comparisons are commonly justified within the context of normal distributions and similar standard deviations. The two groups of scores in Roberts' example have significantly different standard deviations (Group 1 SD = 0; Group 2, 2.08). Were the scores in Group 1 more dispersed, or those in Group 2 less so, the original scoring's order would more likely be preserved across permissible transformations.
Even though similar and normally distributed variation across groups can aid in preventing meaningless assertions, ones that "are just artifacts of the particular version of the scale of measurement that happens to be in use," a number of other problems dog ordinal scores (Wright & Linacre, 1989). As was recognized by Wilson (1971):
"The ordinal level of measurement prohibits all but the weakest inferences concerning the fit between data and a theoretical model formulated in terms of interval variables.... The task of developing valid, reliable interval measurement is not a technical detail that can be postponed indefinitely while the main efforts in sociological research are devoted to substantive theory construction; rather it is the central theoretical and methodological problem in scientifically-oriented sociology."
It is in this context that one sees the real truth and value of an opinion widely held among natural scientists and often attributed (Wise, 1995, p. 11) to Ernest Rutherford, winner of the 1908 Nobel Prize in Chemistry, namely, that if your experiment requires statistics, then you should have designed a better experiment. This opinion is expressed by Feinstein (1995), the long-time editor of the Journal of Clinical Epidemiology, in his critical examination of meta-analytic methodology. The implication is that when measurement is realized, it provides all the relevant information needed to make informed judgments about more and less.
Roberts provides another 30 pages of analyses concerning the kinds of conclusions that may be logically drawn from different scales of measurement in different contexts. He does not take up the problem of how interval/ratio scales might be calibrated on the basis of ordinal observations.
Ordinal to Ratio
To take up this question ourselves requires first of all recognizing that the rating scale is simply a generic way of labeling observations that we suppose involve some increasing amount of something. At the start of a new investigation into a new construct, we do not know how much increase is represented by any transition across categories, or even whether any increase at all is represented by these transitions. It may, after all, turn out that the construct cannot be quantified, or that the items and/or people brought together to explore the construct's quantitative status do not work well together, and so falsify the quantitative hypothesis.
Accordingly, how the categories are labeled is irrelevant. The labels are there only so that we can unambiguously distinguish them from one another and place them in ascending qualitative order according to some construct theory. The object of our interest is how many observations are labeled by each category. When that is ascertained, then we can estimate the log-odds that any respondent will reply in any one of the categories relative to any other category, for any item or group of items. The numbering of the rating scale categories is merely a convenience to facilitate thinking and to simplify the log-odds estimation procedure.
As is demonstrated in numerous developments in Rasch measurement theory and applications (Andrich, 1978a, 1978b; Linacre, 1999, 2002; Wright & Masters, 1982), this analysis reveals whether the rating categories are in fact ordered as hypothesized, and, if so, what their actual spacing is. Each numeric unit increase in the measures homomorphically maps the observed relation onto the numeric relation. The log-odds unit provides a ratio scale in the sense that any meaningful difference between two ratings, two items, two respondents, a respondent and an item, or a respondent and a category on an item could be identified as the smallest meaningful unit of measurement, and all other differences could be scaled in that unit. In other words, any magnitude difference can be divided up into any number of smaller ratio-unit differences, or divided into any number of larger ones, with no change in either the order or the proportionate spacing of any individual measures or group averages.
Admissible transformations for ratio scales are then those that preserve both the order of the relations as well as the magnitude of their proportionate spacing. Had the measures given in Roberts' example of the meaninglessness of averaged ordinal scores been ratio, all permissible transformations would have invariantly maintained the same proportionate difference between the individual measures and between the groups' average measures.
The Structure of Scientific Laws
Roberts closes his article with speculations based in Luce's (1959, 1990) classic article on the ratio form of scientific laws in general. When both independent and dependent variables are ratio scales, scientific laws are power laws. In Ohm's Law, for instance, voltage is proportional to current when resistance is fixed.
These comments echo similar observations made by Rasch (1960, pp. 110-5) concerning the identical form shared by his model for reading measurement and Maxwell's model for the relations of mass, force, and acceleration. Just as force is proportional to acceleration when mass is fixed, so, too, is reading ability proportional to reading comprehension when the reading difficulty of the text is fixed.
Roberts (p. 664) points out that researchers have been able to establish psychological laws that conform with Luce's method "only in rather limited circumstances." This conclusion would seem to clash with the widespread applicability to an enormous variety of data types enjoyed by Rasch's models. Rasch software routinely 1) scales both the independent and dependent variables in ratio form, and 2) assesses and isolates failures of invariance via fit analysis, overcoming both of the major barriers to identifying and testing scientific power laws.
Perhaps because construct theory continues to be underdeveloped, the value of the laws established by means of Rasch scaling remains under-appreciated. The invariant stability of the qualitative relations quantified in Rasch measurement constitutes a fundamental form of capital. But much remains to be done before the human and economic value of that capital is leveraged in practical applications.
William P. Fisher, Jr.
Andrich, D. A. (1978a). A binomial latent trait model for the study of Likert-style attitude questionnaires. British Journal of Mathematical and Statistical Psychology, 31, 84-98.
Andrich, D. A. (1978b). A rating formulation for ordered response categories. Psychometrika, 43, 357-374.
Feinstein, A. R. (1995). Meta-analysis: Statistical alchemy for the 21st century. Journal of Clinical Epidemiology, 48(1), 71-79.
Linacre, J. M. (1999). Investigating rating scale category utility. Journal of Outcome Measurement, 3(2), 103-22.
Linacre, J. M. (2002). Understanding Rasch measurement: Optimizing rating scale category effectiveness. Journal of Applied Measurement, 3(1), 85-106.
Luce, R. D. (1959). On the possible psychophysical laws. Psychological Review, 66, 81-95.
Luce, R. D. (1990). "On the possible psychophysical laws" revisited: Remarks on cross-modal matching. Psychological Review, 97, 66-77.
Mundy, B. (1986). On the general theory of meaningful representation. Synthese, 67, 391-437.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980).
Roberts, F. S. (1994). Limitations on conclusions using scales of measurement. In A. Barnett, S. Pollock & M. Rothkopf (Eds.), Operations research and the public sector (pp. 621-671). Amsterdam: Elsevier.
Suppes, P. (1959). Measurement, empirical meaningfulness, and three-valued logic. In C. W. Churchman & P. Ratoosh (Eds.), Measurement: Definitions and theories (pp. 129-143). New York, New York: Wiley.
Suppes, P., & Zinnes, J. L. (1963). Basic measurement theory. In R. D. Luce, R. R. Bush & E. Galanter (Eds.), Handbook of mathematical psychology (pp. 1-76). New York, New York: Wiley.
Wilson, T. P. (1971). Critique of ordinal variables. Social Forces, 49, 432-444.
Wise, M. N. (Ed.). (1995). The values of precision. Princeton, New Jersey: Princeton University Press.
Wright, B. D., & Linacre, J. M. (1989). Observations are always ordinal; measurements, however, must be interval. Archives of Physical Medicine and Rehabilitation, 70(12), 857-867. www.rasch.org/memo44.htm
Wright, B. D., & Masters, G. N. (1982). Rating scale analysis: Rasch measurement. Chicago: MESA Press.
Ordinal vs. Ratio Revisited Again, Fisher W. P. Jr. … Rasch Measurement Transactions, 2004, 18:2 p.980-982
Forum | Rasch Measurement Forum to discuss any Rasch-related topic |
Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement
Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.
Coming Rasch-related Events | |
---|---|
Oct. 4 - Nov. 8, 2024, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
Jan. 17 - Feb. 21, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
May 16 - June 20, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
June 20 - July 18, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com |
Oct. 3 - Nov. 7, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
The URL of this page is www.rasch.org/rmt/rmt182e.htm
Website: www.rasch.org/rmt/contents.htm