Vanishing Tricks and Intellectualist Condescension: Measurement, Metrology, and the Advancement of Science

What exactly does it mean for data to fit a Rasch model? Satisfaction of Rasch's separability theorem provides access to sufficient statistics, invariant metrics, separable parameters, etc., but isn't there a more tangible and practical sense of these technical accomplishments?

I think there is and that many of us who value Rasch's models and who use them routinely may not have grasped the full meaning of one of the primary concrete consequences of fit to a Rasch model. To begin to trace out this full meaning, let's start with a statement from Jane Loevinger's 1965 review of Rasch's book:

"Rasch must be credited with an outstanding contribution to one of the two central psychometric problems, the achievement of non-arbitrary measures" (Loevinger, 1965, p. 151).

"Non-arbitrary measures": measures that are not arbitrary, that are not capriciously based in fleeting preferences or whims, or left to individual judgment. And indeed, "non-arbitrary" is the right word to choose for describing the way individual instruments function across multiple samples, and the way multiple instruments can converge on a common construct.

But how far does this non-arbitrariness go? Consider Rasch's (1960, pp. 110-5) own sense of the results he obtained from reading test data. He observes that the multiplicative form of the model he employed has the same structure as that used by Maxwell in the study of mass, force, and acceleration, meaning that the model actually states a law concerning the relation of reading ability, text difficulty, and comprehension rate.

"Where this law can be applied it provides a principle of measurement on a ratio scale of both stimulus parameters and object parameters, the conceptual status of which is comparable to that of measuring mass and force. Thus, ... the reading accuracy of a child ... can be measured with the same kind of objectivity as we may tell its weight ...."

This bold statement has recently been further substantiated by Burdick, Stone, and Stenner (2006), who draw an analogy between the Rasch reading law and the combined gas law's prediction of how temperature and pressure relate to a constant volume. Burdick, et al. close with the statement that "the implications of this kind of law-making for construct validity should be evident."

Yes, the implications for construct validity and for construct theory should indeed be evident. However, if it were evident, would it not be universally apparent that, insofar as a reading test measures reading ability and calibrates the reading difficulties of texts, it must follow the Rasch Reading Law? And does it not also then follow, that if the test follows this law, whatever reading test is used, and no matter what range of numbers is used as the metric, insofar as the test really measures reading ability, it will measure in Lexiles?

Given ongoing research into the measurement of reading and writing performance, these implications are apparently not self-evident. The implications seem to remain so far unperceived, unapprehended, that no one has even been disturbed by what some might find to be the grandiosity of the claim. Is no one provoked enough to challenge the hegemony of this Rasch Reading Law and show readers and tests for which it does not apply?

Perhaps no one is so provoked, and that may be because one implication of laws for the measurement of valid constructs seems quite lost, and not just on the Rasch measurement audience, but on researchers in general, as well as the general public. One of the first to bring out this lost implication was Thomas Kuhn (1977, p. 219), who observed that

"The road from scientific law to scientific measurement can rarely be traveled in the reverse direction. To discover quantitative regularity one must normally know what regularity one is seeking and one's instruments must be designed accordingly; even then nature may not yield consistent or generalizable results without a struggle."

A few pages before this passage, Kuhn points out that examples of productive measurement in the Scientific Revolution are found only in longstanding areas of research, such as optics, mechanics, and astronomy. Measurement in areas involving heat, electricity, magnetism, and chemistry did not come into their own as sciences until the 19th century, because of the extensive qualitative understandings that had to be developed before quantification could be achieved.

Building on decades of others' qualitative understandings applied to reading measurement, Rasch made a point of ensuring that his measurement research and model formulation would arrive at predetermined ends capable of supporting the kinds of mathematical conclusions and scientific generalizations that he wanted to be able to support. In so doing, he arrived at a formulation that is nothing less than a law of laws, a model of models, and a very broad basis for generalization.

That is an incredible accomplishment. But, contrary to what seems assumed in common practice, Rasch models do not automatically articulate a construct theory for whatever kind of data happen to be analyzed using Rasch software. That is, data that fit a Rasch model certainly provide evidence supporting the existence of a lawlike structure relative to the construct measured. But the law itself goes unstated as long as no one explicitly says it out loud. And it furthermore goes untested as long as no one uses it to generate new items that exhibit the properties predicted by the theory.

So Rasch used an intuited or implicit sense of what makes a scientific law valuable in formulating a model, and then observed that his reading test data behaved in conformity with that law. He made bold assertions about being able to measure reading ability with the same kind of objectivity that has been long since established to hold in measuring weight.

But Rasch did not articulate a predictive theory of the reading construct. Neither did the authors of the Anchor Test Study (Jaeger, 1973) conducted 20 years later. Instead, what we had were decades of routinely repeated expressions of the reading construct, with the Anchor Test Study showing conclusively that the major reading tests were measuring the same thing, and that they could do so in the same universal, uniform metric.

The non-arbitrariness of the repeated emergence of the same construct over tests and samples culminated in the articulation of a reading construct specification equation, after the model devised by Stenner and Smith (1982), and Stenner, Smith, and Burdick (1983). The simple, elegant, and parsimonious description of the structure of texts and tests has a predictive structure that has been studied by dozens of psychometricians in multiple state departments of education, and book and test publishing companies, and that has been validated in reading tests taken by millions of students.

It is safe to say that no other construct measured in education or in the social sciences has either the empirical evidence or the theoretical stature enjoyed by the Rasch Reading Law (Burdick, Stone, and Stenner, 2006).

And so, researchers are avidly exploiting this boon for the advancement of science, aren't they? Given the assumption that researchers espouse and embody ostensibly mathematical research values, that is what one would expect. But how many research publications take advantage of, or seek to expose the flaws in, the Rasch Reading Law? How many grant applications are focused on determining how listening, reading, oral, and written comprehension relate to one another relative to the established lawlike relation between an individual's abilities and the difficulty of what is spoken or written?

Just about none. I'm no expert, and my search has been cursory, but it isn't happening. Reading research has been atheoretical for most of its history, reading theory traditionally has had little influence on reading tests, and measurement theory, usually in the form of classical test theory, has had too much influence on reading tests (Engelhard, 2001). Why should this be so?

Two factors come to bear. First, working from research into the history of science, Galison (1999) offers the possibility that experimentalists focused on data, technicians focused on instrumentation, and theoreticians focused on ideas each function within separate, distinct communities, with incommensurable beliefs and behaviors. In this scenario, no field of research is driven exclusively or even primarily by just empirical data (privileged by the positivists) or by theoretical expectations (privileged by the post-positivists).

Instead, each of these subcultures has its own criteria, standards, and methods. Galison suggests an open-ended model that allows partial autonomy to each area, with revolutionary transformations occurring with different periodizations, and with each in relative parity with the other two.

Galison's account of science in general seems to be in accord with Engelhard's observations of the situation with reading research, theory, and measurement.

Now, consider a second factor, namely that "in quoting quantitative empirical laws, scientists frequently neglect to specify the various scales entering in the equations" (Falmagne & Narens, 1983, p. 287). This unstated invariant proportionality of scientific laws underlies the value of standards, which, "to do their job...must operate as a set of shared assumptions, the unexamined background against which we strike agreements and make distinctions" (Alder, 2002, p. 2). In leaving measurement scales and standards unstated and unexamined, in the background, as shared assumptions, we find ourselves in a situation in which

"...the absence of metrological information in scientific papers is simply a part of the culture of science that effaces the work needed to make its universality self-evident. This culture is reinforced by the division of labor within the lab; metrological activity is largely invisible to the scientists who write papers simply because it is performed by their technicians, and at time different from when experiments are performed" (O'Connell, 1993, p. 159).

The technical work done by instrumentalists is not only done at times different from when experiments are performed, but is likely done in a different place by persons unknown to the experimentalist. In the history of science, the technical means by which experimental results were produced were sometimes literally cut out of the picture (Shapin, 1989) as were the roles of everyone but the propertied gentleman who sponsored and directed the research (Shapin, 1991).

"Metrology has not often been granted much historical significance. ... Intellectualist condescension distracts our attention from these everyday practices, from their technical staff, and from the work which makes results count outside laboratory walls" (Schaffer, 1992, pp. 23-4).

In the natural sciences, there are commercially available precision tools calibrated to universally uniform reference standards built up out of scale-free laws. The transparency of the substantive qualitative meanings shared in an elaborated metrological network renders the effects of metrology's lack of historical significance relatively harmless.

But what might the consequences of this intellectualist condescension be for the work that makes Rasch scaling results count outside laboratory walls? Given that metrological activity is largely invisible to scientists writing papers, what happens to fields in which this unexamined background activity is assumed, as it has always been assumed in all scientific fields, but is now being taken for granted in fields in which it does not exist, in which it has never existed?

In this scenario, should not we expect just what we have? Each psychosocial field's batteries of incommensurable instruments measuring in locally-dependent, nonadditive, statistically insufficient, and variable metrics are akin to so many Towers of Babel. Because metrological work is discounted, ignored, and cut out of the picture, no one noticed its importance to the success of science, and no one noticed its absence when it was not being done.

We have here nothing less than an answer to the question raised by Joel Michell (1990, 2000) concerning how psychology's methodological thought disorder became such a pathological episode in the history of science. That is, it became possible for psychology to establish itself as a putative quantitative science without systematic tests of the hypothesis that its variables are quantitative precisely because that work has historically never been done by theoreticians or experimental laboratorians. It has always been performed by metrological engineers and instrumentalists, working within their own subculture according to its standards and traditions. These subcultures, as Galison points out, are so separate that, in psychology's case, the absence of the metrologists was never noticed!

But what do we have without them? We can only begin to estimate what we are missing by comparing science's loftiest achievements to what would have been possible in a world without metrology. Is it, after all, a mere coincidence that the birth in the early 19th century of the second scientific revolution and of the industrial revolution coincides with the birth of the concept of objectivity as we understand it today (Daston and Galison, 1992) and also with the birth of metrology as a professional discipline? Metrology is a necessary factor in all monumental architectural accomplishments, from the Great Wall to the Great Pyramid, and in all major industrial and engineering accomplishments, from the auto industry to interstate highway systems. The rise of western Europe as a world power in the years from 1250 to 1600 is held to be due to the unity of mathematics and measurement in a quantitative model of the world; that model made it possible for Europeans "to organize large collections of people and capital and to exploit physical reality for useful knowledge and for power more efficiently than any other people of the time" (Crosby, 1997, p. x). Also consider that we spend two to three times as much on creating and maintaining measurement standards as we do on scientific research as a whole (Latour, 1987, p. 251). From all of this, we can surmise that the world would be vastly different without metrology and metrologists. The entire cumulative history of science would disappear in one fell swoop.

"Immense labor had been performed to achieve the vanishing trick through which the local practices needed to make standards had simply disappeared. ...the absolute system depended on no particular instrument, or technique, or institution. This helps account for metrology's power. Metrology involves work which sets up values and then makes their origin invisible."

At a deeper philosophical level than that plumbed by Michell, then, we can see a way toward accounting for what Husserl (1970) termed Galileo's "fateful omission" of the means by which mathematical understandings of nature were formulated (Fisher, 2003). It seems as though the greatest strength of transparent measurement-its capacity to bring encapsulated theoretical and inferential power to end users ignorant of theory and technicalities-is also its greatest weakness.

In not requiring an understanding of optics of telescope or microscope users, in making thermometers useful to those unschooled in thermodynamics, in bringing high fidelity music into the homes of millions with no clue as to how lasers can translate pits in plastic-coated aluminum foil into arias and drumbeats, technoscience simultaneously erases the conditions of its possibilities as it writes out the terms of new realities.

Rasch's probabilistic models tap into and exploit deeply rooted, widespread, and usually unarticulated and unexamined assumptions about what makes words, numbers, and measures meaningful and useful. As these assumptions are progressively and increasingly made more explicit, conceptually and practically, in theory and experiment, in a wide array of fields and applications, the value of the models will accordingly also become more apparent, and their range of application will deepen and broaden.

But we have more and higher hurdles to cross in the psychosocial sciences than in the natural sciences. In the psychosocial sciences, but not in the natural sciences, the invisibility of metrology is debilitating because of the way measurement becomes assumed even when it is absent. In addition, in the psychosocial sciences, there are many putative variables, and associated collections of observations hardly of sufficient value to call data. These would fail to scale in the natural sciences, and would be much less likely to form the basis of entire communities of research in the way they have in the psychosocial sciences.

How will these hurdles be surmounted? How might intellectualist condescension toward metrology and the discipline's own vanishing tricks be turned from weaknesses into strengths? Probably through the creation of value, value not obtainable anywhere else, or by any other means. What that value is and how it is produced is another story for another time.

Alder, K. (2002). The measure of all things: The seven-year odyssey and hidden error that transformed the world. New York: The Free Press.

Burdick, D. S., Stone, M. H., & Stenner, A. J. (2006). The Combined Gas Law and a Rasch Reading Law. Rasch Measurement Transactions, 20(2), 1059-60, https://www.rasch.org/rmt/rmt202.pdf

Crosby, A. W. (1997). The measure of reality: Quantification and Western society, 1250-1600. Cambridge: Cambridge University Press.

Daston, L., & Galison, P. (1992, Fall). The image of objectivity. Representations, 40, 81-128.

Engelhard, G., Jr. (2001). Historical view of the influences of measurement and reading theories on the assessment of reading. Journal of Applied Measurement, 2(1), 1-26.

Falmagne, J.-C., & Narens, L. (1983). Scales and meaningfulness of quantitative laws. Synthese, 55, 287-325.

Fisher, W. P., Jr. (2003, December). Mathematics, measurement, metaphor, metaphysics: Part II. Accounting for Galileo's "fateful omission." Theory & Psychology, 13(6), 791-828.

Galison, P. (1999). Trading zone: Coordinating action and belief. In M. Biagioli (Ed.), The science studies reader (pp. 137-160). New York, New York: Routledge.

Husserl, E. (1954, 1970). The crisis of European sciences and transcendental phenomenology: An introduction to phenomenological philosophy (D. Carr, Trans.). Evanston, Illinois: Northwestern University Press.

Jaeger, R. M. (1973). The national test equating study in reading (The Anchor Test Study). Measurement in Education, 4, 1-8.

Kuhn, T. S. (1961). The function of measurement in modern physical science. Isis, 52(168), 161-193. (Rpt. in T. S. Kuhn, (Ed.). (1977). The essential tension: Selected studies in scientific tradition and change (pp. 178-224). Chicago: University of Chicago Press.

Loevinger, J. (1965). Person and population as psychometric concepts. Psychological Review, 72(2), 143-155.

Michell, J. (1990). An introduction to the logic of psychological measurement. Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Michell, J. (2000, October). Normal science, pathological science and psychometrics. Theory & Psychology, 10(5), 639-667.

O'Connell, J. (1993). Metrology: The creation of universality by the circulation of particulars. Social Studies of Science, 23, 129-173.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Schaffer, S. (1992). Late Victorian metrology and its instrumentation: A manufactory of Ohms. In R. Bud & S. E. Cozzens (Eds.), Invisible connections: Instruments, institutions, and science (pp. 23-56). Bellingham, WA: SPIE Optical Engineering Press.

Shapin, S. (1989, November-December). The invisible technician. American Scientist, 77, 554-563.

Shapin, S. (1991). 'A Scholar and a Gentleman': The problematic identity of the scientific practitioner in early modern England. History of Science, 29, 279-327.

Stenner, A. J., & Smith III, M. (1982). Testing construct theories. Perceptual and Motor Skills, 55, 415-426.

Stenner, A. J., Smith, M., III, & Burdick, D. S. (1983, Winter). Toward a theory of construct definition. Journal of Educational Measurement, 20(4), 305-316.

Vanishing Tricks and Intellectualist Condescension: Measurement, Metrology, and the Advancement of Science. … W.P. Fisher, Jr., Rasch Measurement Transactions, 2008, 21:3 p. 1118-21

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn. George Engelhard, Jr. & Jue Wang	Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Jan. 16 - Feb. 13, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Apr. 8 - Apr. 11, 2026, Wed.-Sat.	National Council for Measurement in Education - Los Angeles, CA, ncme.org/events/2026-annual-meeting
Apr. 8 - Apr. 12, 2026, Wed.-Sun.	American Educational Research Association - Los Angeles, CA, www.aera.net/AERA2026
May. 15 - June 12, 2026, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 19 - July 25, 2026, Fri.-Sat.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com