Recent social studies of science show that many social scientists suffer from a mistaken belief regarding generalized systems of measurement in which variables are universally measured in a common scale-free metric unit (Bud & Cozzens, 1992; O'Connell, 1993). That mistaken belief is that agreement among scientists concerning the unit size, numeric range, and measurement validity of instruments in their fields somehow happens by itself, as if there were some "natural" property with which some objects of measurement are endowed and others are not.
When it is realized that metrological standards are created, maintained, and enforced via the circulation of reference standard data and instruments by professional societies, the problems of social measurement come into clearer focus. For instance, metrological standards based on classical true-score theory (CTT) and the method of summated ratings would be cumbersome, if not impossible, to implement. Based on this approach, common measurement standards require all users who measure a particular variable to do so with exactly the same collection of items and with complete data across the items for every measure.
The history of psychosocial measurement documents the failure of this approach. Because there are different reasons for measuring, tests with different numbers of items and rating scale points, and with different item phrasings, are required from application to application. Sometimes new instruments are designed simply because the designers do not know that an instrument meeting their needs already exists. The obstacles encountered in attempting to obtain complete data from all respondents are too numerous to list.
There is an alternative. Rasch's probabilistic conjoint measurement models not only tolerate missing data, they can be used to equate different instruments that measure the same variable into a unified measurement system. These facts are well known by readers of RMT. But something important, even crucial, has been lacking in the activities of the Rasch Measurement SIG. We have not overcome the habits of mind characteristic of the true score/summated ratings approach. We continue to expect that Rasch measurement will be adopted as the standard analytical approach for rating scale instrument calibration because of its simplicity, elegance, and parsimony, or because of its effectiveness and efficiency.
We have been lulled into this frame of mind by three factors. The first is a cultural bias that blinds us to the critical role played by technicians and their social organizations in making scientific knowledge (Shapin, 1989; Bud & Cozzens, 1992; O'Connell, 1993). The laboratory work underlying scientific experiment and observation has gone largely undocumented in the history of science. This is because technicians were invisible when the experimental apparatus worked as it should, but it also speaks to issues of social hierarchy and institutional authority. Social scientists have failed to notice how extensively networks of laboratories collaborate in deciding upon and maintaining common units of measurement. This factor will be fundamental in making the Rasch measurement alternative a reality.
The second factor is the transparency of instruments. Just as the work of technicians is ignored in the historical record, so too are instruments ignored in the act of observation and in the philosophical record. Metric units do not exist in nature, but are cultural constructs often derived from convenient bases of comparison, such as the length of a thumb, hand, arm, or stride. But well-designed and properly functioning instruments give a clear view of the thing of interest, to the point that scientists routinely speak of seeing things that are represented only in the form of images or readings on an instrument (Lynch, 1985; Heelan, 1983a, 1983b; Ackermann, 1985; Hacking, 1983; Ihde, 1991). Philosophers of science, in turn, have ignored the role of instruments in the production of scientific knowledge, speaking as though facts resided in nature, apart from any human devices, when these facts were completely inaccessible and could have no discernable impact on human social, political, or economic life until the relevant technical frameworks for observation were established.
The existence of the third factor depends on the prior existence of the other two. The first two occurred so early in the history of science that we have had a general philosophical and theoretical difficulty in coming to grips with what has been called Galileo's "fateful omission" (Husserl, 1970) of the means by which nature was mathematicized. As is shown by Michell (Michell, 1990), psychology and social science in general have been under the spell of the "quantitative imperative", which considers quantitative measurement the hallmark of science. According to Michell, psychologists have taken the quantitative imperative to mean quantification at any cost, even if measurement is reduced to simply fooling with numbers.
The enhanced understanding and practice of measurement offered by Thurstone, Guttman, Loevinger, Rasch, Luce & Tukey, and others accounts for the historical gaps in the third factor, our understanding of how natural and psychosocial phenomena can be mathematicized. The transparency of well-functioning instruments, the second factor, is well-described by Rasch's separability theorem and his notion of specific objectivity. But these factors, in themselves, appear to be insufficient for changing the measurement behavior of individual psychosocial researchers.
So what else can be done to advance the cause of improved measurement in the psychosocial sciences? The invisibility of the research technicians and their organizations, remains a factor unaccounted for. Rather than wait for unified measurement systems to organize themselves, or for existing organizations of professionals to see and capitalize on the opportunities Rasch measurement offers, it is up to us to form measurement networks in our respective fields, guiding others of like mind towards making the decisions that will make metrological standards a reality in the psychosocial sciences.
Explicit standards for measurement quality, similar to those spelled out by Hunter (Hunter, 1980) for physical measurement, and routines for conducting round robin instrument calibration trials (Bailar, 1985; Mandel, 1977, 1978; Veith, 1987; Wernimont, 1977, 1978), are becoming articulated (Fisher, et al., 1996). These standards specify statistical criteria that instruments measuring particular variables have to meet or surpass in order to be certified as measuring in the relevant unit. The statistics for instrument certification will include: 1) a correlation of at least .85 between the tested instrument's measures and the reference instrument's measures of the construct on a common sample (given that the new instrument is not intended to measure at one or the other extreme); 2) a correlation of at least .85 between any items' calibrations on the tested instrument and any on the reference instrument identified as addressing the same aspect of the construct; 3) graphical evidence supporting the meaning of the correlations; 4) measurement and calibration reliabilities of at least .85 (which indicate a ratio of variation to error of about 2.6 to 1); and 5) sufficient indication of statistical consistency (data-model fit) on the part of both the item calibration estimates and the person measures. Similar statistics on the instrument's performance on diverse samples taken from geographically separate locations and different kinds of facilities where observations are made will need to be included.
New instruments will set reference standards when they establish the best values for the measurement range, variation, error, ratio of variation to error, or data-model fit. It may happen that no one brand of instrument becomes the reference standard for a variable. Most useful would be to consider the entire collection of items from certified instruments as the reference standard.
Ackermann, J. R. (1985). Data, instruments, and theory: a dialectical approach to understanding science. Princeton, NJ: Princeton University Press.
Bailar, B. A. (1985). Quality issues in measurement. International Statistical Review, 53(2), 123-139.
Bud, R., & Cozzens, S. E. (Editors). (1992). SPIE Institutes. Vol 9: Invisible connections: instruments, institutions, and science (R. F. Potter, Ed.) (Vol. 9). Bellingham, WA: SPIE Optical Engineering Press.
Fisher, W. P., Jr., et al. (1996). Variable-specific standards for rating scale measurement.
Hacking, I. (1983). Representing and intervening: introductory topics in the philosophy of natural science. Cambridge, UK: Cambridge University Press.
Heelan, P. (1983). Natural science as a hermeneutic of instrumentation. Philosophy of Science, 50, 181-204.
Heelan, P. (1983). Space perception and the philosophy of science. Berkeley, CA: Univ. of California Press.
Hunter, J. S. (1980, November). The national system of scientific measurement. Science, pp. 869-874.
Husserl, E. (1970). The crisis of the European sciences and transcendental phenomenology. Evanston, IL: Northwestern University Press.
Ihde, D. (1991). Instrumental realism: the interface between philosophy of science and philosophy of technology. The Indiana Series in the Philosophy of Technology. Bloomington, IN: Indiana Univ. Press.
Lynch, M. (1985). Discipline and the material form of images: an analysis of scientific visibility. Social Studies of Science, 15(1), 37-66.
Mandel, J. (1977, March). The analysis of interlaboratory test data. ASTM Standardization News, pp. 17-20, 56.
Mandel, J. (1978, December). Interlaboratory testing. ASTM Standardization News, pp. 11-12.
Michell, J. (1990). An introduction to the logic of psychological measurement. Hillsdale, NJ: Lawrence Erlbaum Associates.
O'Connell, J. (1993). Metrology: The creation of universality by the circulation of particulars. Social Studies of Science, 23, 129-173.
Shapin, S. (1989, November-December). The invisible technician. American Scientist, pp. 554-563.
Veith, A. G. (1987). Precision in polymer testing: An important world-wide issue. Polymer Testing 7:239-267.
Wernimont, G. (1977, March). Ruggedness evaluation of test procedures. ASTM Standardization News, pp. 13-16.
Wernimont, G. (1978, December). Careful intralaboratory study must come first. ASTM Standardization News, pp. 11-12.
Rasch alternative. Fisher WP Jr. Rasch Measurement Transactions, 1996, 9:4 p.466
Forum | Rasch Measurement Forum to discuss any Rasch-related topic |
Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement
Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.
Coming Rasch-related Events | |
---|---|
Oct. 4 - Nov. 8, 2024, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
Jan. 17 - Feb. 21, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
May 16 - June 20, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
June 20 - July 18, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com |
Oct. 3 - Nov. 7, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
The URL of this page is www.rasch.org/rmt/rmt94g.htm
Website: www.rasch.org/rmt/contents.htm