Does the Rasch Model Convert an Ordinal Scale into an Interval Scale?

Empirical research papers advocating the use of the Rasch model (Rasch, 1960) typically emphasize the unique properties of Rasch measurement, for example, specific objectivity (Rasch, 1977), invariance, sample independence or raw score sufficiency, which are, in fact, closely related. While researchers using factor analytic approaches often downplay the role of specific objectivity or invariance, the scale level of the raw score remains a serious problem in factor analysis. Factor analysis is typically applied to the matrix of Pearson correlations, which require interval-scaled item scores. Therefore, the fact that the Rasch model does not depend on interval-scaled item scores is often put forward as an important property of the model.

Often it is explicitly stated that the Rasch model transforms ordinal raw scores into interval-scaled measures: Chien et al. (2009, p.418) say that "Rasch (1960) analysis transforms ordinal scores into the logit scale ...". Tennant and Conaghan (2007, p.1359) argue that Rasch analysis provides "a transformation of an ordinal score into a linear, interval-level variable, given fit of data to Rasch model expectations." Sometimes the argument is presented implicitly. Ewing et al. (2005, p.26) state that "Rasch measurement assumes responses on an ordinal level", Salzberger and Sinkovics (2006, p.412) point out that "[t]he manifest responses are assumed to be ordinal and need not be interval-scaled." Similarly, Pallant et al. (2006) as well as Pallant and Tennant (2007) speak of an ordinal raw score.

This claim deserves closer attention since, as a matter of principle, in statistics a lower scale level cannot be transformed to a higher level. Does the Rasch model travel faster than the speed of light? And if so, should we then not be allowed to transform the raw score in any way we want as long as the order is preserved? Actually the aforementioned claims are rather pragmatic and aimed at a non-Rasch audience. The statements simply express the fact that the item category scores merely have to be ordered with reference to the property to be measured. When applying the Rasch model, we actually do not have to be concerned with the scale level of the raw score. Based on a solid theoretical definition of a latent variable, fit of the data to the model assures us of having successfully measured a quantitative variable. However, Stevens' (1946, 1951) scheme of scale levels have been so influential that unavoidably the question arises as to which scale level we should actually ascribe to the raw score.

In the dichotomous case, the raw score is the observed number of items that are answered correctly (or agreed with) by the respondent. In other words, we count the number of correct items (Linacre and Wright, 1993). Counting, however, is distinctly different from measurement. The fact that it is often considered to be some sort of measurement is due to the misleading definition by Stevens (1946, 1951), who argued that measurement is achieved by assigning numerals to objects. Then the raw score would indeed succeed measurement, as the latter would be effected by coding the responses. Therefore, in the factor analytic world, manifest items are often referred to as measures of a latent variable.

In Rasch measurement, we define measurement as the successful discovery of the structure of quantity in the data (Michell, 1990, 1997), tantamount to data fitting the model. The raw score is actually the input to the analysis; it precedes rather than succeeds measurement. The raw score is the basis of an attempt to infer measures of a linear, interval-scaled latent variable. However, it is not some sort of "crude measurement" or "an approximation" per se. The scale level of the raw score is not an unconditional property of the score. It depends on what the scale level refers to. What we read off a measuring tape represents a ratio-scaled value of people's height, but as a raw score to be used in the measurement of Table of Contents intelligence, it would not even be ordinal-scaled. It is therefore improper to argue that the raw score a priori has a particular scale level pertaining to the quantitative property to be measured.

Given that the raw score is a count, its scale level is the highest possible, that is absolute. The point of origin is given by the extreme score of all items incorrect, while the unit is "one item". Obviously, it is permissible and meaningful to conclude that, for example, Mary has answered correctly twice as many items as John, if Mary mastered, say, six items, while John only got three items right. Statements of this sort are permitted regardless of whether the data fit the model or not, as we do not infer anything from the comparison of Mary and John beyond the number of correctly answered items. The absolute scale level of the raw score also implies that, and explains why, scale transformations of any sort are not allowed. It also justifies the fact that the raw score is calculated as the sum of individual item scores. If the individual item scores were ordinal, they could not be added up, since ordinal scale properties do not allow for addition. Hence, the Rasch model does not "travel faster than light". Specifically, it does not transform an ordinal raw score into an interval-scaled measure. However, the Rasch model does not downgrade a higher scale level (absolute) to a lower one (interval), either. The fit of the data to the model tells us that an interval-scaled measure of a latent variable can be inferred from an a priori absolutely scaled observed raw score. If and only if data fit the model, we may ask what the scale level of the raw score a posteriori is with reference to the latent variable measured. The interval-scaled measures are derived from the raw score by a unique non-linear, s-shaped transformation. If the raw score were ordinal, such a transformation would not be possible. Consequently, the scale level of the raw score is higher than ordinal but lower than interval-scaled, as the unit is not preserved across the continuum. Thus, the Rasch model tests whether an a priori absolutely scaled raw score represents an a posteriori (that is after having demonstrated that a quantitative latent variable can be inferred from the data) non-linear raw score, which can be transformed into a linear interval-scaled measure of the latent variable (see table 1). Prior to the assessment of fit to the Rasch model, or in case of misfit, the scale level of the raw score with reference to the latent variable is undefined.

The term "generalized IRT" shall refer to all IRT models which are not Rasch models. In the Rasch model, the raw score does not depend on model estimates. By contrast, in the two-parameter logistic model (Birnbaum, 1968), the raw score is weighted by model parameters, which are a result of the model calibration. Thus, in the Rasch model, the input to and the output of the measurement analysis are strictly separated (which is just another way to express that the Rasch model features invariance). In generalized IRT the input and the output are entangled, unless the item discrimination parameters are known constants like in the one parameter logistic model (OPLM, Verhelst and Glas, 1995). Since the raw score in generalized IRT is not completely defined by the simple observation of items answered correctly, it is not a simple count. The distinction between an a priori raw score which is independent of the model estimates and an a posteriori raw score which has a scale level with reference to the latent variable is not possible, either. Since the fit of data to general IRT models cannot support the hypothesis of a quantitative variable, the scale level of the latent variable and of the weighted raw score remains questionable.

Multicategorical responses have to be scored with successive integers starting at zero (Andersen, 1977; Andrich, 1978). This is compatible with the interpretation of the raw score as a count of all thresholds a respondent has passed. Consequently, the raw score is scaled absolutely in the polytomous case as well, provided the scoring of the categories adequately reflects the order of the thresholds (see Andrich, 1995a, 1995b). Strictly speaking, this qualification applies to the dichotomous model, too. If the response categories are wrongly scored, that is a score of one implies less of the property to be measured rather than more, the item will misfit. Rescoring the item will then resolve the problem, unless other reasons for misfit persist. In the polytomous model, the empirical thresholds may be reversed, signifying that the scoring is inappropriate. Then categories should be collapsed. However, rescoring the response categories alters the raw score. It is argued that both the original raw score as well as the revised raw score based on the amended scoring scheme are absolutely scaled, since both scores do not imply any meaning beyond the sheer count. Once the data have been shown to fit the polytomous Rasch model, we can ascribe meaning to the raw score with reference to the latent variable.

In summary, the fit of the data to the Rasch model implies that the raw score, which is scaled absolutely, conveys meaning regarding the quantitative property to be measured. With reference to the latent variable, the raw score is non-linear but clearly more than ordinal. In the case of misfit, though, the raw score has no such meaning at all. It is therefore recommended to better refrain from claims that the Rasch model transforms or converts ordinal scales into interval scales. Rather it should be pointed out that the Rasch model is capable of constructing linear measures from counts of qualitatively-ordered observations (Linacre and Wright, 1993), provided the structure of quantity is present in the data. The difference between ordered observations and an ordinal scale may seem subtle, but counts as such are certainly not merely ordinal, nor is the raw score merely ordinal with reference to the property to be measured once fit of the data to the model has been demonstrated. It goes without saying that those who apply the Rasch model are aware of this, at least implicitly. Alluding to ordinal scales of measurement may accommodate the traditional way of thinking, but it is misleading in the end.

The essential difference between the Rasch model and models rooted in classical test theory lies in the definition of measurement. In the Rasch model, the assignment of numerals to response categories merely enables us to properly count the number of correct items, or passed thresholds, but it is not equivalent to measurement. Measurement is achieved by successfully demonstrating that the latent variable complies with the structure of quantity. In factor analysis, measurement is essentially still based on assignment in Stevens' tradition. Therefore, scale levels of codes assigned to response categories are so important, while in fact testing the correspondence of the data to the structure of quantity is the core problem of measurement.

Andersen, E.B. (1977). Sufficient Statistics and Latent Trait Models. Psychometrika, 42, 69-81.

Andrich, D. (1978). Application of a Psychometric Rating Model to Ordered Categories which are Scored with Successive Integers. Applied Psychological Measurement, 2 (4), 581-594.

Andrich, D. (1995a). Models for Measurement, Precision and the Non-Dichotomization of Graded Responses. Psychometrika, 60 (1), 7-26.

Andrich, D. (1995b). Further Remarks on Non-Dichotomization of Graded Responses. Psychometrika, 60 (1), 37-46.

Birnbaum, A. (1968). Some Latent Trait Models and Their Use in Inferring an Examinee's Ability. In F.M. Lord and M.R. Novick (eds), Statistical Theories of Mental Test Scores, Reading, MA: Addison-Wesley, Chapters 17-20.

Chien, T.-W., Hsu, S.-Y., Chein, T., Guo, H.-R., & Su, S.B. (2008). Using Rasch Analysis to Validate the Revised PSQI to Assess Sleep Disorders in Taiwan's Hi-tech Workers. Community Mental Health Journal, 44:417-425.

Ewing, M., Salzberger, T., & Sinkovics, R. (2005). An Alternate Approach to Assessing Cross-Cultural Measurement Equivalence in Advertising Research. Journal of Advertising, 34 (1), 17-36.

Linacre, M., & Wright, B. (1993). Constructing linear measures from counts of qualitative observations. Paper presented at the Fourth International Conference on Bibliometrics, Informetrics and Scientometrics, Berlin, Germany.

Michell, J. (1990). An Introduction to the Logic of Psychological Measurement. Hillsdale: Erlbaum.

Michell, J. (1997). Quantitative Science and the Definition of Measurement in Psychology. British Journal of Psychology, 88, 355-383.

Pallant, J.F., & Tennant, A. (2007). An introduction to the Rasch measurement model: An example using the Hospital Anxiety and Depression Scale (HADS). British Journal of Clinical Psychology, 46 (1), 1-18.

Pallant, J.F., Miller, R.L., & Tennant, A. (2006). Evaluation of the Edinburgh Post Natal Depression Scale using Rasch analysis. BMC Psychiatry, 6:28.

Rasch, G. (1960). Probabilistic Models for Some Intelligence and Attainment Tests. Copenhagen: Danish Institute for Educational Research, expanded edition (1980) with foreword and afterword by B.D. Wright. Chicago: The University of Chicago Press.

Rasch, G. (1977). On Specific Objectivity: an Attempt at Formalizing the Request for Generality and Validity of Scientific Statements. Danish Yearbook of Philosophy, 14, 58-93.

Salzberger, T., & Sinkovics, R. (2006). Reconsidering the Problem of Data Equivalence in International Marketing Research - Contrasting Approaches Based on CFA and the Rasch Model for Measurement. International Marketing Review, 23 (4), 390-417.

Stevens, S.S. (1946). On the Theory of Scales of Measurement. Science, 103, 667-680.

Stevens, S.S. (1951). Mathematics, Measurement, and Psychophysics. In S.S. Stevens (ed), Handbook of Experimental Psychology, New York, NY: Wiley, 1-49.

Tennant, A., & Conaghan, P.G. (2007). The Rasch Measurement Model in Rheumatology: What Is It and Why Use It? When Should It Be Applied, and What Should One Look for in a Rasch Paper? Arthritis & Rheumatism (Arthritis Care & Research), 57 (8), 1358-1362.

Verhelst, N.D., & Glas, C.A.W. (1995). The one parameter logistic model. In G.H. Fischer and I.W. Molenaar (eds), Rasch Models, Foundations Recent Developments, and Applications, New York: Springer, pp. 215-237.

Fit of the data to the Rasch model	Scale level a priori, with reference to the observed responses	Inference of measures of a quantitative latent variable	Scale level a posteriori, with reference to the quantitative variable
not tested yet	absolute	not applicable	not applicable
misfit	absolute	impossible	not applicable
fit	absolute	possible	> ordinal, non-linear
Table 1: Scale level of the raw score

Does the Rasch Model Convert an Ordinal Scale into an Interval Scale?, T. Salzberger ... Rasch Measurement Transactions, 2010, 24:2 p. 1273-5

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn. George Engelhard, Jr. & Jue Wang	Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Jan. 16 - Feb. 13, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Apr. 8 - Apr. 11, 2026, Wed.-Sat.	National Council for Measurement in Education - Los Angeles, CA, ncme.org/events/2026-annual-meeting
Apr. 8 - Apr. 12, 2026, Wed.-Sun.	American Educational Research Association - Los Angeles, CA, www.aera.net/AERA2026
May. 15 - June 12, 2026, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 19 - July 25, 2026, Fri.-Sat.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com