Faulty Thinking by Educational Researchers

When the twentieth century is viewed from the perspective of mental test technology, the Rasch model stands out as a watershed between earlier forms of empirical investigation and the construction of objective social research. By its elimination of reference groups and its emphasis on objective measurement which statistically models the properties of linearity and additivity, the Rasch model offers researchers the opportunity to undertake quantitative studies of mental growth and development with a precision and clarity that, even now, we only expect in the physical sciences.

Unfortunately, advances in methodology, however significant, are generally resisted by a research community (Cohen, 1985). New ideas and methods, despite their benefits, require reappraisal of prevailing practice and the acquisition of new concepts and skills. In general, the more concepts and techniques that must be given up, the more resistance there is against a new methodology. This resistance to change is necessary to protect the practice of science from frivolity and triviality but, not surprisingly, it also inhibits the dissemination and dispersal of genuine advances in scientific method and thinking.

Four topics associated with traditional mental testing obscure the advantages that objective measurement has for the study of educational and mental growth: (1) the distinction between qualitative and quantitative observations, (2) empirical analyses based on grade equivalents, (3) the longing for an absolute zero for the measurement of mental characteristics, and (4) a rigid conceptualization of reliability.

Qualitative versus quantitative:
Few issues in contemporary discussions of research methodology, and arguably in the history of social research, are more artificial and have led to more muddled thinking than a futile distinction that some researchers make between qualitative and quantitative observations. Miles & Huberman (1984) argue that social reality consists of a fundamental dichotomy between quantitative and qualitative observations which logically prohibits the application of statistical methods to particular social observations. Their approach attempts to preserve the "qualitative" integrity of social phenomena from a feared debasement by quantitative methods. Many philosophers have analyzed this misbegotten perspective and have concluded that the claimed distinction between qualitative and quantitative observations has no logical basis (Kaplan, 1964; Richardt & Cook, 1979, 1980; Walker & Evers, 1988). Unfortunately, most measurement specialists in contemporary social research, unable or unwilling to address this conclusion, have abandoned vast areas of empirical study to inadequate methods which cannot even approximate objectivity.

In the 1920's, Thurstone concluded that, while every measure begins as a qualitative experience, all empirical investigations, on close inspection, involve the application of quantitative reasoning. A measure focuses on a single aspect of experience, associated with some quality of particular interest, and describes it numerically in order to accomplish a fundamental scientific goal -- the precise description of variation.

Observations do not fall into mutually exclusive quantitative and qualitative classes. All observations are at first qualitative. The methods by which observations are used, however, are almost all quantitative. An important distinction between methods is the degree to which observations are summarized numerically. At one extreme are the "qualitative" methods that employ only non-numerical description, such as personal impression and subjective opinion. Other methods achieving greater generality apply increasingly numerical description, e.g. rank orders. At the other extreme, there are methods that rely exclusively on scientifically modelled linear measures.

Grade equivalents:
In 1972, Angoff noted the severe shortcomings of grade equivalents (GE) when measuring intellectual growth. The definition of GE's enforces an equal amount of growth each year, forcing all growth curves to be straight lines of predetermined slope, thus completely concealing variations in growth rate. Angoff explained how differences in GE could not be interpreted as differences in ability and so urged that GE's be avoided. Twenty years later, scholarly journals, public school systems and government agencies, otherwise committed to clarity and precision, continue to study and report growth in GE's. What a scientific embarrassment!.

Absolute zero:
For many, researchers and lay-persons, measurement in the social sciences will always seem fundamentally flawed because measures of non-physical characteristics, such as mental ability or attitude, do not seem to have the "natural" absolute zeroes so plentiful in physics. Even measurement specialists, knowledgeable about their particular techniques, fail to provide an adequate response to this naive apprehension. In fact, the "no zero" criticism is frequently accepted as an inherent limitation on the application of science to human affairs. This, in turn, perpetuates a myth, associated with Descartes, that the human aspects of experience are not suitable for scientific investigation.

While the role of zero in measurement has several perspectives , I offer the reader two from the physical sciences. First is the simple fact that many measurement applications in the physical sciences, such as pitch, hue, loudness and hardness, do well without any absolute zeroes. The Mohs hardness scale is not even a measure, but a physical operation for ranking geological specimens! In fact, the familiar measures of length and time only acquire their zeroes through the context in which they are applied. Neither length nor time have natural origins or absolute zeroes. What they have is agreed upon starting points - the points from which differences are measured. The practical importance of "natural" zeroes is vastly overrated.

Second is a lesson from thermodynamics where researchers use scales with various zeroes, each of which has its own theoretical significance. In a social research devoted to the expansion of scientific knowledge, this is the central concern. At the simplest level, say for measurement of temperature, the correlation of an observation with the physical expansion of a criterion requires only a convention to establish the numerical values on a scale such as centigrade or fahrenheit. The zero is no more than a convenient means of anchoring the numbers on the scale.

At higher levels, speculation on theoretical constructs that might underlie the interaction of observation and instrument become central. The instrument developer puts greater emphasis on assigning numbers to a scale according to a reproducible consistency between numerical order and hypothesized theoretical terms. In proposing that temperature scales be based on molecular activity and heat exchange, the concept of zero acquires a theoretical context the utility of which can be investigated through empirical research. When successful, this approach results in a measure with broad empirical implications, an outcome not possible when measurement is based on no more than a correlation with a criterion. The importance of conceptual insight to the development of the theoretical context for a scale of temperature with a meaningful zero applies equally well to the development of social measures.

Reliability:
Reliability is a term that has taken on a sacred and obscure status in contemporary social research. Few measurement terms have wider application and less meaning. Researchers rely upon reliability to qualify the fundamental adequacy of their research. They use it as the blanket criterion for success or failure. From the perspective of objective measurement, however, the implications of any particular reliability are revealed as ambiguous at best. The reliability of a test is determined by a local and by no means general or necessary mixture of item difficulties and person abilities. A minor, even trivial, change in any part of this mixture will change the value of the reliability coefficient. Indeed, it is not possible to decide from the value of a reliability coefficient alone whether the test in question is useful or useless. This widespread misunderstanding about reliability leads to confusion at best and to entirely erroneous conclusions at worst.

Faulty Thinking by Educational Researchers, N Bezruczko … Rasch Measurement Transactions, 1990, 4:3 p. 114-115

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn. George Engelhard, Jr. & Jue Wang	Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Apr. 21 - 22, 2025, Mon.-Tue.	International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025	On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025	On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
July 21 - 23, 2025, Mon.-Wed.	Pacific Rim Objective Measurement Symposium (PROMS) 2025, www.proms2025.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com