Faulty Thinking by Educational Researchers

When the twentieth century is viewed from the perspective of mental test technology, the Rasch model stands out as a watershed between earlier forms of empirical investigation and the construction of objective social research. By its elimination of reference groups and its emphasis on objective measurement which statistically models the properties of linearity and additivity, the Rasch model offers researchers the opportunity to undertake quantitative studies of mental growth and development with a precision and clarity that, even now, we only expect in the physical sciences.

Unfortunately, advances in methodology, however significant, are generally resisted by a research community (Cohen, 1985). New ideas and methods, despite their benefits, require reappraisal of prevailing practice and the acquisition of new concepts and skills. In general, the more concepts and techniques that must be given up, the more resistance there is against a new methodology. This resistance to change is necessary to protect the practice of science from frivolity and triviality but, not surprisingly, it also inhibits the dissemination and dispersal of genuine advances in scientific method and thinking.

Four topics associated with traditional mental testing obscure the advantages that objective measurement has for the study of educational and mental growth: (1) the distinction between qualitative and quantitative observations, (2) empirical analyses based on grade equivalents, (3) the longing for an absolute zero for the measurement of mental characteristics, and (4) a rigid conceptualization of reliability.

Qualitative versus quantitative:
Few issues in contemporary discussions of research methodology, and arguably in the history of social research, are more artificial and have led to more muddled thinking than a futile distinction that some researchers make between qualitative and quantitative observations. Miles & Huberman (1984) argue that social reality consists of a fundamental dichotomy between quantitative and qualitative observations which logically prohibits the application of statistical methods to particular social observations. Their approach attempts to preserve the "qualitative" integrity of social phenomena from a feared debasement by quantitative methods. Many philosophers have analyzed this misbegotten perspective and have concluded that the claimed distinction between qualitative and quantitative observations has no logical basis (Kaplan, 1964; Richardt & Cook, 1979, 1980; Walker & Evers, 1988). Unfortunately, most measurement specialists in contemporary social research, unable or unwilling to address this conclusion, have abandoned vast areas of empirical study to inadequate methods which cannot even approximate objectivity.

In the 1920's, Thurstone concluded that, while every measure begins as a qualitative experience, all empirical investigations, on close inspection, involve the application of quantitative reasoning. A measure focuses on a single aspect of experience, associated with some quality of particular interest, and describes it numerically in order to accomplish a fundamental scientific goal -- the precise description of variation.

Observations do not fall into mutually exclusive quantitative and qualitative classes. All observations are at first qualitative. The methods by which observations are used, however, are almost all quantitative. An important distinction between methods is the degree to which observations are summarized numerically. At one extreme are the "qualitative" methods that employ only non-numerical description, such as personal impression and subjective opinion. Other methods achieving greater generality apply increasingly numerical description, e.g. rank orders. At the other extreme, there are methods that rely exclusively on scientifically modelled linear measures.

Grade equivalents:
In 1972, Angoff noted the severe shortcomings of grade equivalents (GE) when measuring intellectual growth. The definition of GE's enforces an equal amount of growth each year, forcing all growth curves to be straight lines of predetermined slope, thus completely concealing variations in growth rate. Angoff explained how differences in GE could not be interpreted as differences in ability and so urged that GE's be avoided. Twenty years later, scholarly journals, public school systems and government agencies, otherwise committed to clarity and precision, continue to study and report growth in GE's. What a scientific embarrassment!.

Absolute zero:
For many, researchers and lay-persons, measurement in the social sciences will always seem fundamentally flawed because measures of non-physical characteristics, such as mental ability or attitude, do not seem to have the "natural" absolute zeroes so plentiful in physics. Even measurement specialists, knowledgeable about their particular techniques, fail to provide an adequate response to this naive apprehension. In fact, the "no zero" criticism is frequently accepted as an inherent limitation on the application of science to human affairs. This, in turn, perpetuates a myth, associated with Descartes, that the human aspects of experience are not suitable for scientific investigation.

While the role of zero in measurement has several perspectives , I offer the reader two from the physical sciences. First is the simple fact that many measurement applications in the physical sciences, such as pitch, hue, loudness and hardness, do well without any absolute zeroes. The Mohs hardness scale is not even a measure, but a physical operation for ranking geological specimens! In fact, the familiar measures of length and time only acquire their zeroes through the context in which they are applied. Neither length nor time have natural origins or absolute zeroes. What they have is agreed upon starting points - the points from which differences are measured. The practical importance of "natural" zeroes is vastly overrated.

Second is a lesson from thermodynamics where researchers use scales with various zeroes, each of which has its own theoretical significance. In a social research devoted to the expansion of scientific knowledge, this is the central concern. At the simplest level, say for measurement of temperature, the correlation of an observation with the physical expansion of a criterion requires only a convention to establish the numerical values on a scale such as centigrade or fahrenheit. The zero is no more than a convenient means of anchoring the numbers on the scale.

At higher levels, speculation on theoretical constructs that might underlie the interaction of observation and instrument become central. The instrument developer puts greater emphasis on assigning numbers to a scale according to a reproducible consistency between numerical order and hypothesized theoretical terms. In proposing that temperature scales be based on molecular activity and heat exchange, the concept of zero acquires a theoretical context the utility of which can be investigated through empirical research. When successful, this approach results in a measure with broad empirical implications, an outcome not possible when measurement is based on no more than a correlation with a criterion. The importance of conceptual insight to the development of the theoretical context for a scale of temperature with a meaningful zero applies equally well to the development of social measures.

Reliability:
Reliability is a term that has taken on a sacred and obscure status in contemporary social research. Few measurement terms have wider application and less meaning. Researchers rely upon reliability to qualify the fundamental adequacy of their research. They use it as the blanket criterion for success or failure. From the perspective of objective measurement, however, the implications of any particular reliability are revealed as ambiguous at best. The reliability of a test is determined by a local and by no means general or necessary mixture of item difficulties and person abilities. A minor, even trivial, change in any part of this mixture will change the value of the reliability coefficient. Indeed, it is not possible to decide from the value of a reliability coefficient alone whether the test in question is useful or useless. This widespread misunderstanding about reliability leads to confusion at best and to entirely erroneous conclusions at worst.



Faulty Thinking by Educational Researchers, N Bezruczko … Rasch Measurement Transactions, 1990, 4:3 p. 114-115


Please help with Standard Dataset 4: Andrich Rating Scale Model



Rasch Publications
Rasch Measurement Transactions (free, online) Rasch Measurement research papers (free, online) Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Applying the Rasch Model 3rd. Ed., Bond & Fox Best Test Design, Wright & Stone
Rating Scale Analysis, Wright & Masters Introduction to Rasch Measurement, E. Smith & R. Smith Introduction to Many-Facet Rasch Measurement, Thomas Eckes Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. Statistical Analyses for Language Testers, Rita Green
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Journal of Applied Measurement Rasch models for measurement, David Andrich Constructing Measures, Mark Wilson Rasch Analysis in the Human Sciences, Boone, Stave, Yale
in Spanish: Análisis de Rasch para todos, Agustín Tristán Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez

To be emailed about new material on www.rasch.org
please enter your email address here:

I want to Subscribe: & click below
I want to Unsubscribe: & click below

Please set your SPAM filter to accept emails from Rasch.org

www.rasch.org welcomes your comments:

Your email address (if you want us to reply):

 

ForumRasch Measurement Forum to discuss any Rasch-related topic

Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Sept. 15-16, 2017, Fri.-Sat. IOMC 2017: International Outcome Measurement Conference, Chicago, jampress.org/iomc2017.htm
Oct. 13 - Nov. 10, 2017, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Oct. 25-27, 2017, Wed.-Fri. In-person workshop: Applying the Rasch Model hands-on introductory workshop, Melbourne, Australia (T. Bond, B&FSteps), Announcement
Jan. 5 - Feb. 2, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Jan. 10-16, 2018, Wed.-Tues. In-person workshop: Advanced Course in Rasch Measurement Theory and the application of RUMM2030, Perth, Australia (D. Andrich), Announcement
Jan. 17-19, 2018, Wed.-Fri. Rasch Conference: Seventh International Conference on Probabilistic Models for Measurement, Matilda Bay Club, Perth, Australia, Website
April 13-17, 2018, Fri.-Tues. AERA, New York, NY, www.aera.net
May 25 - June 22, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 29 - July 27, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
Aug. 10 - Sept. 7, 2018, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com
Oct. 12 - Nov. 9, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
The HTML to add "Coming Rasch-related Events" to your webpage is:
<script type="text/javascript" src="https://www.rasch.org/events.txt"></script>

 

The URL of this page is www.rasch.org/rmt/rmt43e.htm

Website: www.rasch.org/rmt/contents.htm