Measurement is qualitatively and paradigmatically quite different from statistics, even though statistics obviously play important roles in measurement, and vice versa. The perception of measurement as conceptually difficult stems in part from its rearrangement of most of the concepts that we take for granted in the statistical paradigm as landmarks of quantitative thinking. When we recognize and accept the qualitative differences between statistics and measurement, they both become easier to understand.
Statistical analyses are commonly referred to as quantitative, even though the numbers analyzed most usually have not been derived from the mapping of an invariant substantive unit onto a number line. Measurement takes such mapping as its primary concern, focusing on the quantitative meaningfulness of numbers (Falmagne & Narens, 1983). Statistical models focus on group processes and relations among variables, while measurement models focus on individual processes and relations within variables (Duncan, 1992). Statistics makes assumptions about factors beyond its control, while measurement sets requirements for objective inference (Andrich, 1989). Statistics primarily involves data analysis, while measurement primarily calibrates instruments in common metrics for interpretation at the point of use (Cohen, 1994).
The scientific value of statistics resides largely in the reproducibility of cross-variable data relations. Statistics focuses on making the most of the data in hand, while measurement focuses on using the data in hand to inform (a) instrument calibration and improvement, and (b) the prediction and efficient gathering of meaningful new data on individuals in practical applications. Where statistical "measures" are defined inherently by a particular analytic method, measures read from calibrated instruments - and the raw observations informing these measures - need not be computerized for further analysis.
Because statistical "measures" are usually derived from ordinal raw scores, changes to the instrument change their meaning, resulting in a strong inclination to avoid improving the instrument. Measures, in contrast, take missing data into account, so their meaning remains invariant over instrument configurations, resulting in a firm basis for the emergence of a measurement quality improvement culture. So statistical "measurement" begins and ends with data analysis, where measurement from calibrated instruments is in a constant cycle of application, new item calibrations, and critical recalibrations that require only intermittent resampling.
The vast majority of statistical methods and models make strong assumptions about the nature of the unit of measurement, but provide either very limited ways of checking those assumptions, or no checks at all. Statistical models are descriptive in nature, meaning that models are fit to data, that the validity of the data is beyond the immediate scope of interest, and that the model accounting for the most variation is regarded as best. Finally, and perhaps most importantly, statistical models are inherently oriented toward the relations among variables at the level of samples and populations.
Measurement models, however, impose strong requirements on data quality in order to achieve the unit of measurement that is easiest to think with, one that stays constant and remains invariant across the local particulars of instrument and sample. Measurement methods and models, then, provide extensive and varied ways of checking the quality of the unit, and so must be prescriptive rather than descriptive. That is, measurement models define the data quality that must be obtained for objective inference. In the measurement paradigm, data are fit to models, data quality is of paramount interest, and data quality evaluation must be informed as much by qualitative criteria as by quantitative.
To repeat the most fundamental point, measurement models are oriented toward individual-level response processes, not group-level aggregate processes. Herbert Blumer pointed out as early as 1930 that quantitative method is not equivalent to statistical method, and that the natural sciences had conspicuous degrees of success long before the emergence of statistical techniques (Hammersley, 1989). Both the initial scientific revolution in the 16th-17th centuries and the second scientific revolution of the 19th century found a basis in measurement for publicly objective and reproducible results, but statistics played little or no role in the major discoveries of the times. Now we are in a position to appreciate a comment by Ernst Rutherford, the winner of the 1908 Nobel Prize in Chemistry, who held that, if you need statistics to understand the results of your experiment, then you should have designed a better experiment (Wise, 1995).
The rarely appreciated point is that the generalizable replication and application of results depends heavily on the existence of a portable and universally uniform observational framework. The inferences, judgments, and adjustments that can be made at the point of use by clinicians, teachers, managers, etc. provided with additive measures expressed in a substantively meaningful common metric far outstrip those that can be made using ordinal measures expressed in instrument- and sample-dependent scores.
These contrasts show that the confounding of statistics and measurement is a problem of vast significance that persists in spite of repeated efforts to clarify the distinction. In business, marketing, health care, and quality improvement circles, we find near-universal repetition of the mantra, "You manage what you measure," with very little or no attention paid to the quality of the numbers treated as measures. And so, we find ourselves stuck with so-called measurement systems where,
· instead of linear measures defined by a unit that remains constant across samples and instruments we saddle ourselves with nonlinear scores and percentages defined by units that vary in unknown ways across samples and instruments;
· instead of availing ourselves of the capacity to take missing data into account, we hobble ourselves with the need for complete data;
· instead of dramatically reducing data volume with no loss of information, we insist on constantly re-enacting the meaningless ritual of poring over indigestible masses of numbers;
· instead of calibrating instruments in an experimental test of the hypothesis that the intended construct is in fact structured in such a way as to make its mapping onto a number line meaningful, we assign numbers and make quantitative inferences with no idea as to whether they relate at all to anything real;
· instead of checking to see whether rating scales work as intended, with higher ratings consistently representing more of the variable, we make assumptions that may be contradicted by the order and spacing of the way rating scale categories actually work in practice;
· instead of defining a comprehensive framework for interpreting measures relative to a construct, we accept the narrow limits of frameworks defined by the local sample and items;
· instead of capitalizing on the practicality and convenience of theories capable of accurately predicting item calibrations and measures apart from data, we counterproductively define measurement empirically in terms of data analysis;
· instead of putting calibrated tools into the hands of front-line managers, service representatives, teachers and clinicians, we require them to submit to cumbersome data entry, analysis, and reporting processes that defeat the purpose of measurement by ensuring the information provided is obsolete by the time it gets back to the person who could act on it; and
· instead of setting up efficient systems for communicating meaningful measures in common languages with shared points of reference, we settle for inefficient systems for communicating meaningless scores in local incommensurable languages.
We ought not accept the factuality of data as the sole criterion of objectivity, with all theory and instruments constrained by and focused on the passing ephemera of individual data sets of local particularities. Properly defined and operationalized via a balanced interrelation of theory, data, and instrument, advanced measurement is not a mere mathematical exercise but offers a wealth of advantages and conveniences that cannot otherwise be obtained. We ignore its potentials at our peril.
William P. Fisher, Jr.
References
Andrich, D. (1989). Distinctions between assumptions and requirements in measurement in the social sciences. In J. A. Keats, R. Taft, R. A. Heath & S. H. Lovibond (Eds.), Mathematical and Theoretical Systems: Vol. 4 (pp. 7-16). North-Holland: Elsevier Science Publishers.
Cohen, J. (1994). The earth is round (p < 0.05). American Psychologist, 49, 997-1003.
Duncan, O. D. (1992, September). What if? Contemporary Sociology, 21(5), 667-668.
Falmagne, J.-C., & Narens, L. (1983). Scales and meaningfulness of quantitative laws. Synthese, 55, 287-325.
Hammersley, M. (1989). The dilemma of qualitative method: Herbert Blumer and the Chicago Tradition. New York: Routledge.
Wise, M. N. (Ed.). (1995). The values of precision. Princeton, New Jersey: Princeton University Press.
Fisher W.P. Jr. (2010) Statistics and Measurement: Clarifying the Differences, Rasch Measurement Transactions, 2010, 23:4, 1229-1230
Rasch Publications | ||||
---|---|---|---|---|
Rasch Measurement Transactions (free, online) | Rasch Measurement research papers (free, online) | Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch | Applying the Rasch Model 3rd. Ed., Bond & Fox | Best Test Design, Wright & Stone |
Rating Scale Analysis, Wright & Masters | Introduction to Rasch Measurement, E. Smith & R. Smith | Introduction to Many-Facet Rasch Measurement, Thomas Eckes | Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. | Statistical Analyses for Language Testers, Rita Green |
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar | Journal of Applied Measurement | Rasch models for measurement, David Andrich | Constructing Measures, Mark Wilson | Rasch Analysis in the Human Sciences, Boone, Stave, Yale |
in Spanish: | Análisis de Rasch para todos, Agustín Tristán | Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez |
Forum | Rasch Measurement Forum to discuss any Rasch-related topic |
Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement
Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.
Coming Rasch-related Events | |
---|---|
June 23 - July 21, 2023, Fri.-Fri. | On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com |
Aug. 11 - Sept. 8, 2023, Fri.-Fri. | On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com |
The URL of this page is www.rasch.org/rmt/rmt234a.htm
Website: www.rasch.org/rmt/contents.htm