When the twentieth century is viewed from the perspective of mental test technology, the Rasch model stands out as a watershed between earlier forms of empirical investigation and the construction of objective social research. By its elimination of reference groups and its emphasis on objective measurement which statistically models the properties of linearity and additivity, the Rasch model offers researchers the opportunity to undertake quantitative studies of mental growth and development with a precision and clarity that, even now, we only expect in the physical sciences.
Unfortunately, advances in methodology, however significant, are generally resisted by a research community (Cohen, 1985). New ideas and methods, despite their benefits, require reappraisal of prevailing practice and the acquisition of new concepts and skills. In general, the more concepts and techniques that must be given up, the more resistance there is against a new methodology. This resistance to change is necessary to protect the practice of science from frivolity and triviality but, not surprisingly, it also inhibits the dissemination and dispersal of genuine advances in scientific method and thinking.
Four topics associated with traditional mental testing obscure the advantages that objective measurement has for the study of educational and mental growth: (1) the distinction between qualitative and quantitative observations, (2) empirical analyses based on grade equivalents, (3) the longing for an absolute zero for the measurement of mental characteristics, and (4) a rigid conceptualization of reliability.
Qualitative versus quantitative:
Few issues in contemporary discussions of research methodology, and arguably in the history of social research, are more artificial and have led to more muddled thinking than a futile distinction that some researchers make between qualitative and quantitative observations. Miles & Huberman (1984) argue that social reality consists of a fundamental dichotomy between quantitative and qualitative observations which logically prohibits the application of statistical methods to particular social observations. Their approach attempts to preserve the "qualitative" integrity of social phenomena from a feared debasement by quantitative methods. Many philosophers have analyzed this misbegotten perspective and have concluded that the claimed distinction between qualitative and quantitative observations has no logical basis (Kaplan, 1964; Richardt & Cook, 1979, 1980; Walker & Evers, 1988). Unfortunately, most measurement specialists in contemporary social research, unable or unwilling to address this conclusion, have abandoned vast areas of empirical study to inadequate methods which cannot even approximate objectivity.
In the 1920's, Thurstone concluded that, while every measure begins as a qualitative experience, all empirical investigations, on close inspection, involve the application of quantitative reasoning. A measure focuses on a single aspect of experience, associated with some quality of particular interest, and describes it numerically in order to accomplish a fundamental scientific goal -- the precise description of variation.
Observations do not fall into mutually exclusive quantitative and qualitative classes. All observations are at first qualitative. The methods by which observations are used, however, are almost all quantitative. An important distinction between methods is the degree to which observations are summarized numerically. At one extreme are the "qualitative" methods that employ only non-numerical description, such as personal impression and subjective opinion. Other methods achieving greater generality apply increasingly numerical description, e.g. rank orders. At the other extreme, there are methods that rely exclusively on scientifically modelled linear measures.
In 1972, Angoff noted the severe shortcomings of grade equivalents (GE) when measuring intellectual growth. The definition of GE's enforces an equal amount of growth each year, forcing all growth curves to be straight lines of predetermined slope, thus completely concealing variations in growth rate. Angoff explained how differences in GE could not be interpreted as differences in ability and so urged that GE's be avoided. Twenty years later, scholarly journals, public school systems and government agencies, otherwise committed to clarity and precision, continue to study and report growth in GE's. What a scientific embarrassment!.
For many, researchers and lay-persons, measurement in the social sciences will always seem fundamentally flawed because measures of non-physical characteristics, such as mental ability or attitude, do not seem to have the "natural" absolute zeroes so plentiful in physics. Even measurement specialists, knowledgeable about their particular techniques, fail to provide an adequate response to this naive apprehension. In fact, the "no zero" criticism is frequently accepted as an inherent limitation on the application of science to human affairs. This, in turn, perpetuates a myth, associated with Descartes, that the human aspects of experience are not suitable for scientific investigation.
While the role of zero in measurement has several perspectives , I offer the reader two from the physical sciences. First is the simple fact that many measurement applications in the physical sciences, such as pitch, hue, loudness and hardness, do well without any absolute zeroes. The Mohs hardness scale is not even a measure, but a physical operation for ranking geological specimens! In fact, the familiar measures of length and time only acquire their zeroes through the context in which they are applied. Neither length nor time have natural origins or absolute zeroes. What they have is agreed upon starting points - the points from which differences are measured. The practical importance of "natural" zeroes is vastly overrated.
Second is a lesson from thermodynamics where researchers use scales with various zeroes, each of which has its own theoretical significance. In a social research devoted to the expansion of scientific knowledge, this is the central concern. At the simplest level, say for measurement of temperature, the correlation of an observation with the physical expansion of a criterion requires only a convention to establish the numerical values on a scale such as centigrade or fahrenheit. The zero is no more than a convenient means of anchoring the numbers on the scale.
At higher levels, speculation on theoretical constructs that might underlie the interaction of observation and instrument become central. The instrument developer puts greater emphasis on assigning numbers to a scale according to a reproducible consistency between numerical order and hypothesized theoretical terms. In proposing that temperature scales be based on molecular activity and heat exchange, the concept of zero acquires a theoretical context the utility of which can be investigated through empirical research. When successful, this approach results in a measure with broad empirical implications, an outcome not possible when measurement is based on no more than a correlation with a criterion. The importance of conceptual insight to the development of the theoretical context for a scale of temperature with a meaningful zero applies equally well to the development of social measures.
Reliability is a term that has taken on a sacred and obscure status in contemporary social research. Few measurement terms have wider application and less meaning. Researchers rely upon reliability to qualify the fundamental adequacy of their research. They use it as the blanket criterion for success or failure. From the perspective of objective measurement, however, the implications of any particular reliability are revealed as ambiguous at best. The reliability of a test is determined by a local and by no means general or necessary mixture of item difficulties and person abilities. A minor, even trivial, change in any part of this mixture will change the value of the reliability coefficient. Indeed, it is not possible to decide from the value of a reliability coefficient alone whether the test in question is useful or useless. This widespread misunderstanding about reliability leads to confusion at best and to entirely erroneous conclusions at worst.
Faulty Thinking by Educational Researchers, N Bezruczko Rasch Measurement Transactions, 1990, 4:3 p. 114-115
|Rasch Measurement Transactions (free, online)||Rasch Measurement research papers (free, online)||Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch||Applying the Rasch Model 3rd. Ed., Bond & Fox||Best Test Design, Wright & Stone|
|Rating Scale Analysis, Wright & Masters||Introduction to Rasch Measurement, E. Smith & R. Smith||Introduction to Many-Facet Rasch Measurement, Thomas Eckes||Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr.||Statistical Analyses for Language Testers, Rita Green|
|Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar||Journal of Applied Measurement||Rasch models for measurement, David Andrich||Constructing Measures, Mark Wilson||Rasch Analysis in the Human Sciences, Boone, Stave, Yale|
|in Spanish:||Análisis de Rasch para todos, Agustín Tristán||Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez|
|Forum||Rasch Measurement Forum to discuss any Rasch-related topic|
Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement
Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.
|Coming Rasch-related Events|
|Aug. 14 - 16, 2019. Wed.-Fri.||An Introduction to Rasch Measurement: Theory and Applications (workshop led by Richard M. Smith) https://www.hkr.se/pmhealth2019rs|
|August 25-30, 2019, Sun.-Fri.||Pacific Rim Objective Measurement Society (PROMS) 2019, Surabaya, Indonesia https://proms.promsociety.org/2019/|
|Oct. 11 - Nov. 8, 2019, Fri.-Fri.||On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com|
|Nov. 3 - Nov. 4, 2019, Sun.-Mon.||International Outcome Measurement Conference, Chicago, IL,http://jampress.org/iomc2019.htm|
|Jan. 24 - Feb. 21, 2020, Fri.-Fri.||On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com|
|May 22 - June 19, 2020, Fri.-Fri.||On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com|
|June 26 - July 24, 2020, Fri.-Fri.||On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com|
|Aug. 7 - Sept. 4, 2020, Fri.-Fri.||On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com|
|Oct. 9 - Nov. 6, 2020, Fri.-Fri.||On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com|
|June 25 - July 23, 2021, Fri.-Fri.||On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com|
The URL of this page is www.rasch.org/rmt/rmt43e.htm