Does the Rasch Model Convert an Ordinal Scale into an Interval Scale?

Promoting the Rasch Model

Empirical research papers advocating the use of the Rasch model (Rasch, 1960) typically emphasize the unique properties of Rasch measurement, for example, specific objectivity (Rasch, 1977), invariance, sample independence or raw score sufficiency, which are, in fact, closely related. While researchers using factor analytic approaches often downplay the role of specific objectivity or invariance, the scale level of the raw score remains a serious problem in factor analysis. Factor analysis is typically applied to the matrix of Pearson correlations, which require interval-scaled item scores. Therefore, the fact that the Rasch model does not depend on interval-scaled item scores is often put forward as an important property of the model.

Often it is explicitly stated that the Rasch model transforms ordinal raw scores into interval-scaled measures: Chien et al. (2009, p.418) say that "Rasch (1960) analysis transforms ordinal scores into the logit scale ...". Tennant and Conaghan (2007, p.1359) argue that Rasch analysis provides "a transformation of an ordinal score into a linear, interval-level variable, given fit of data to Rasch model expectations." Sometimes the argument is presented implicitly. Ewing et al. (2005, p.26) state that "Rasch measurement assumes responses on an ordinal level", Salzberger and Sinkovics (2006, p.412) point out that "[t]he manifest responses are assumed to be ordinal and need not be interval-scaled." Similarly, Pallant et al. (2006) as well as Pallant and Tennant (2007) speak of an ordinal raw score.

The scale level of the raw score

This claim deserves closer attention since, as a matter of principle, in statistics a lower scale level cannot be transformed to a higher level. Does the Rasch model travel faster than the speed of light? And if so, should we then not be allowed to transform the raw score in any way we want as long as the order is preserved? Actually the aforementioned claims are rather pragmatic and aimed at a non-Rasch audience. The statements simply express the fact that the item category scores merely have to be ordered with reference to the property to be measured. When applying the Rasch model, we actually do not have to be concerned with the scale level of the raw score. Based on a solid theoretical definition of a latent variable, fit of the data to the model assures us of having successfully measured a quantitative variable. However, Stevens' (1946, 1951) scheme of scale levels have been so influential that unavoidably the question arises as to which scale level we should actually ascribe to the raw score.

The raw score in the dichotomous Rasch Model

In the dichotomous case, the raw score is the observed number of items that are answered correctly (or agreed with) by the respondent. In other words, we count the number of correct items (Linacre and Wright, 1993). Counting, however, is distinctly different from measurement. The fact that it is often considered to be some sort of measurement is due to the misleading definition by Stevens (1946, 1951), who argued that measurement is achieved by assigning numerals to objects. Then the raw score would indeed succeed measurement, as the latter would be effected by coding the responses. Therefore, in the factor analytic world, manifest items are often referred to as measures of a latent variable.

In Rasch measurement, we define measurement as the successful discovery of the structure of quantity in the data (Michell, 1990, 1997), tantamount to data fitting the model. The raw score is actually the input to the analysis; it precedes rather than succeeds measurement. The raw score is the basis of an attempt to infer measures of a linear, interval-scaled latent variable. However, it is not some sort of "crude measurement" or "an approximation" per se. The scale level of the raw score is not an unconditional property of the score. It depends on what the scale level refers to. What we read off a measuring tape represents a ratio-scaled value of people's height, but as a raw score to be used in the measurement of Table of Contents intelligence, it would not even be ordinal-scaled. It is therefore improper to argue that the raw score a priori has a particular scale level pertaining to the quantitative property to be measured.

Given that the raw score is a count, its scale level is the highest possible, that is absolute. The point of origin is given by the extreme score of all items incorrect, while the unit is "one item". Obviously, it is permissible and meaningful to conclude that, for example, Mary has answered correctly twice as many items as John, if Mary mastered, say, six items, while John only got three items right. Statements of this sort are permitted regardless of whether the data fit the model or not, as we do not infer anything from the comparison of Mary and John beyond the number of correctly answered items. The absolute scale level of the raw score also implies that, and explains why, scale transformations of any sort are not allowed. It also justifies the fact that the raw score is calculated as the sum of individual item scores. If the individual item scores were ordinal, they could not be added up, since ordinal scale properties do not allow for addition. Hence, the Rasch model does not "travel faster than light". Specifically, it does not transform an ordinal raw score into an interval-scaled measure. However, the Rasch model does not downgrade a higher scale level (absolute) to a lower one (interval), either. The fit of the data to the model tells us that an interval-scaled measure of a latent variable can be inferred from an a priori absolutely scaled observed raw score. If and only if data fit the model, we may ask what the scale level of the raw score a posteriori is with reference to the latent variable measured. The interval-scaled measures are derived from the raw score by a unique non-linear, s-shaped transformation. If the raw score were ordinal, such a transformation would not be possible. Consequently, the scale level of the raw score is higher than ordinal but lower than interval-scaled, as the unit is not preserved across the continuum. Thus, the Rasch model tests whether an a priori absolutely scaled raw score represents an a posteriori (that is after having demonstrated that a quantitative latent variable can be inferred from the data) non-linear raw score, which can be transformed into a linear interval-scaled measure of the latent variable (see table 1). Prior to the assessment of fit to the Rasch model, or in case of misfit, the scale level of the raw score with reference to the latent variable is undefined.

The raw score in generalized IRT

The term "generalized IRT" shall refer to all IRT models which are not Rasch models. In the Rasch model, the raw score does not depend on model estimates. By contrast, in the two-parameter logistic model (Birnbaum, 1968), the raw score is weighted by model parameters, which are a result of the model calibration. Thus, in the Rasch model, the input to and the output of the measurement analysis are strictly separated (which is just another way to express that the Rasch model features invariance). In generalized IRT the input and the output are entangled, unless the item discrimination parameters are known constants like in the one parameter logistic model (OPLM, Verhelst and Glas, 1995). Since the raw score in generalized IRT is not completely defined by the simple observation of items answered correctly, it is not a simple count. The distinction between an a priori raw score which is independent of the model estimates and an a posteriori raw score which has a scale level with reference to the latent variable is not possible, either. Since the fit of data to general IRT models cannot support the hypothesis of a quantitative variable, the scale level of the latent variable and of the weighted raw score remains questionable.

Raw score in the polytomous Rasch Model

Multicategorical responses have to be scored with successive integers starting at zero (Andersen, 1977; Andrich, 1978). This is compatible with the interpretation of the raw score as a count of all thresholds a respondent has passed. Consequently, the raw score is scaled absolutely in the polytomous case as well, provided the scoring of the categories adequately reflects the order of the thresholds (see Andrich, 1995a, 1995b). Strictly speaking, this qualification applies to the dichotomous model, too. If the response categories are wrongly scored, that is a score of one implies less of the property to be measured rather than more, the item will misfit. Rescoring the item will then resolve the problem, unless other reasons for misfit persist. In the polytomous model, the empirical thresholds may be reversed, signifying that the scoring is inappropriate. Then categories should be collapsed. However, rescoring the response categories alters the raw score. It is argued that both the original raw score as well as the revised raw score based on the amended scoring scheme are absolutely scaled, since both scores do not imply any meaning beyond the sheer count. Once the data have been shown to fit the polytomous Rasch model, we can ascribe meaning to the raw score with reference to the latent variable.

Fit of the data to the Rasch model Scale level a priori, with reference to the observed responses Inference of measures of a quantitative latent variable Scale level a posteriori, with reference to the quantitative variable
not tested yet absolute not applicable not applicable
misfit absolute impossible not applicable
fit absolute possible > ordinal, non-linear
Table 1: Scale level of the raw score

Conclusion

In summary, the fit of the data to the Rasch model implies that the raw score, which is scaled absolutely, conveys meaning regarding the quantitative property to be measured. With reference to the latent variable, the raw score is non-linear but clearly more than ordinal. In the case of misfit, though, the raw score has no such meaning at all. It is therefore recommended to better refrain from claims that the Rasch model transforms or converts ordinal scales into interval scales. Rather it should be pointed out that the Rasch model is capable of constructing linear measures from counts of qualitatively-ordered observations (Linacre and Wright, 1993), provided the structure of quantity is present in the data. The difference between ordered observations and an ordinal scale may seem subtle, but counts as such are certainly not merely ordinal, nor is the raw score merely ordinal with reference to the property to be measured once fit of the data to the model has been demonstrated. It goes without saying that those who apply the Rasch model are aware of this, at least implicitly. Alluding to ordinal scales of measurement may accommodate the traditional way of thinking, but it is misleading in the end.

The essential difference between the Rasch model and models rooted in classical test theory lies in the definition of measurement. In the Rasch model, the assignment of numerals to response categories merely enables us to properly count the number of correct items, or passed thresholds, but it is not equivalent to measurement. Measurement is achieved by successfully demonstrating that the latent variable complies with the structure of quantity. In factor analysis, measurement is essentially still based on assignment in Stevens' tradition. Therefore, scale levels of codes assigned to response categories are so important, while in fact testing the correspondence of the data to the structure of quantity is the core problem of measurement.

Thomas Salzberger

References

Andersen, E.B. (1977). Sufficient Statistics and Latent Trait Models. Psychometrika, 42, 69-81.

Andrich, D. (1978). Application of a Psychometric Rating Model to Ordered Categories which are Scored with Successive Integers. Applied Psychological Measurement, 2 (4), 581-594.

Andrich, D. (1995a). Models for Measurement, Precision and the Non-Dichotomization of Graded Responses. Psychometrika, 60 (1), 7-26.

Andrich, D. (1995b). Further Remarks on Non-Dichotomization of Graded Responses. Psychometrika, 60 (1), 37-46.

Birnbaum, A. (1968). Some Latent Trait Models and Their Use in Inferring an Examinee's Ability. In F.M. Lord and M.R. Novick (eds), Statistical Theories of Mental Test Scores, Reading, MA: Addison-Wesley, Chapters 17-20.

Chien, T.-W., Hsu, S.-Y., Chein, T., Guo, H.-R., & Su, S.B. (2008). Using Rasch Analysis to Validate the Revised PSQI to Assess Sleep Disorders in Taiwan's Hi-tech Workers. Community Mental Health Journal, 44:417–425.

Ewing, M., Salzberger, T., & Sinkovics, R. (2005). An Alternate Approach to Assessing Cross-Cultural Measurement Equivalence in Advertising Research. Journal of Advertising, 34 (1), 17-36.

Linacre, M., & Wright, B. (1993). Constructing linear measures from counts of qualitative observations. Paper presented at the Fourth International Conference on Bibliometrics, Informetrics and Scientometrics, Berlin, Germany.

Michell, J. (1990). An Introduction to the Logic of Psychological Measurement. Hillsdale: Erlbaum.

Michell, J. (1997). Quantitative Science and the Definition of Measurement in Psychology. British Journal of Psychology, 88, 355-383.

Pallant, J.F., & Tennant, A. (2007). An introduction to the Rasch measurement model: An example using the Hospital Anxiety and Depression Scale (HADS). British Journal of Clinical Psychology, 46 (1), 1-18.

Pallant, J.F., Miller, R.L., & Tennant, A. (2006). Evaluation of the Edinburgh Post Natal Depression Scale using Rasch analysis. BMC Psychiatry, 6:28.

Rasch, G. (1960). Probabilistic Models for Some Intelligence and Attainment Tests. Copenhagen: Danish Institute for Educational Research, expanded edition (1980) with foreword and afterword by B.D. Wright. Chicago: The University of Chicago Press.

Rasch, G. (1977). On Specific Objectivity: an Attempt at Formalizing the Request for Generality and Validity of Scientific Statements. Danish Yearbook of Philosophy, 14, 58-93.

Salzberger, T., & Sinkovics, R. (2006). Reconsidering the Problem of Data Equivalence in International Marketing Research – Contrasting Approaches Based on CFA and the Rasch Model for Measurement. International Marketing Review, 23 (4), 390-417.

Stevens, S.S. (1946). On the Theory of Scales of Measurement. Science, 103, 667-680.

Stevens, S.S. (1951). Mathematics, Measurement, and Psychophysics. In S.S. Stevens (ed), Handbook of Experimental Psychology, New York, NY: Wiley, 1-49.

Tennant, A., & Conaghan, P.G. (2007). The Rasch Measurement Model in Rheumatology: What Is It and Why Use It? When Should It Be Applied, and What Should One Look for in a Rasch Paper? Arthritis & Rheumatism (Arthritis Care & Research), 57 (8), 1358–1362.

Verhelst, N.D., & Glas, C.A.W. (1995). The one parameter logistic model. In G.H. Fischer and I.W. Molenaar (eds), Rasch Models, Foundations Recent Developments, and Applications, New York: Springer, pp. 215-237.

 The Albertina Rasch Dancers demonstrate equal-interval scaling. "The photo above shows the Albertina Rasch Dancers in costume for the Florenz Ziegfeld produced musical Rio Rita in 1927. They are credited to photographer Florence Vandamm." As displayed on songbook1.wordpress.com/pp/fx/features-2-older-2/albertina-rasch-dancers

Does the Rasch Model Convert an Ordinal Scale into an Interval Scale?, T. Salzberger ... Rasch Measurement Transactions, 2010, 24:2 p. 1273-5

Rasch Publications
Rasch Measurement Transactions (free, online) Rasch Measurement research papers (free, online) Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Applying the Rasch Model 3rd. Ed., Bond & Fox Best Test Design, Wright & Stone
Rating Scale Analysis, Wright & Masters Introduction to Rasch Measurement, E. Smith & R. Smith Introduction to Many-Facet Rasch Measurement, Thomas Eckes Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. Statistical Analyses for Language Testers, Rita Green
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Journal of Applied Measurement Rasch models for measurement, David Andrich Constructing Measures, Mark Wilson Rasch Analysis in the Human Sciences, Boone, Stave, Yale
in Spanish: Análisis de Rasch para todos, Agustín Tristán Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez

 Forum Rasch Measurement Forum to discuss any Rasch-related topic

Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
June 30 - July 29, 2017, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
July 31 - Aug. 3, 2017, Mon.-Thurs. Joint IMEKO TC1-TC7-TC13 Symposium 2017: Measurement Science challenges in Natural and Social Sciences, Rio de Janeiro, Brazil, imeko-tc7-rio.org.br
Aug. 7-9, 2017, Mon-Wed. In-person workshop and research coloquium: Effect size of family and school indexes in writing competence using TERCE data (C. Pardo, A. Atorressi, Winsteps), Bariloche Argentina. Carlos Pardo, Universidad Catòlica de Colombia
Aug. 7-9, 2017, Mon-Wed. PROMS 2017: Pacific Rim Objective Measurement Symposium, Sabah, Borneo, Malaysia, proms.promsociety.org/2017/
Aug. 10, 2017, Thurs. In-person Winsteps Training Workshop (M. Linacre, Winsteps), Sydney, Australia. www.winsteps.com/sydneyws.htm
Aug. 11 - Sept. 8, 2017, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com
Aug. 18-21, 2017, Fri.-Mon. IACAT 2017: International Association for Computerized Adaptive Testing, Niigata, Japan, iacat.org
Sept. 15-16, 2017, Fri.-Sat. IOMC 2017: International Outcome Measurement Conference, Chicago, jampress.org/iomc2017.htm
Oct. 13 - Nov. 10, 2017, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Jan. 5 - Feb. 2, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Jan. 10-16, 2018, Wed.-Tues. In-person workshop: Advanced Course in Rasch Measurement Theory and the application of RUMM2030, Perth, Australia (D. Andrich), Announcement
Jan. 17-19, 2018, Wed.-Fri. Rasch Conference: Seventh International Conference on Probabilistic Models for Measurement, Matilda Bay Club, Perth, Australia, Website
April 13-17, 2018, Fri.-Tues. AERA, New York, NY, www.aera.net
May 25 - June 22, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 29 - July 27, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
Aug. 10 - Sept. 7, 2018, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com
Oct. 12 - Nov. 9, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com