"A paradoxical property of a test is a property such that the validity of the test is not a monotonic function of that property... Is it not intuitively valid, however, to demand that the most basic concept of psychometrics shall be a non-paradoxical property of tests? Reliability is paradoxical." J. Loevinger, 1954, pp. 500-501
The Attenuation Paradox was named by Loevinger (1954). It was recognized earlier by Gulliksen: "The criterion of maximizing test variance [reliability] cannot be pushed to extremes. Test variance is a maximum, if half of the population makes zero scores, and the other half makes perfect scores. Such a score distribution is not desirable for obvious reasons, yet current [true-score classical] test theory (CTT) provides no rationale for rejecting such a score distribution. Obviously the best test score distribution is one which accurately reflects the true ability distribution in the group, but there is perhaps little hope of obtaining such a distribution by the current procedure of assigning a score based upon the sheer number of correct answers. At present the only solution to such difficulties seems to lie in some type of absolute scaling theory (Thurstone, 1925), to replace the number correct score" (1945 pp. 90-91). Gulliksen,however, ignores Thurstone and perpetuates the paradoxical true-score tradition: "In order to maximize the reliability and variance of a test, the items should have high inter-correlations, all items should be of the same difficulty level, and the level should be as near to 50% as possible" (1945 p. 79).
Tucker (1946) provides an excellent analysis of the "inconsistencies between higher reliability and better measurement" (p.1). He observes that "if the reliability of the items were increased to unity, all correlations between the items would also become unity and a person passing one item would pass all items and another failing one item would fail all items. Thus the only possible scores are a perfect score or one of zero... Is a dichotomy of scores the best that can be expected from a test with items of equal difficulty?" (p. 2). Using scaling theory (in current terminology, a two-parameter item response theory model based on the normal ogive and random normal probabilities), Tucker shows how increasing test reliability must lead to decreasing test validity.
Laudan, Laudan and Donovan (1988) describe seven empirically testable hypotheses regarding how scientists react to data-dominated empirical anomalies and theory-dominated conceptual paradoxes. The reactions of psychometricians to the attenuation paradox of true-score theory provide instructive case studies of how scientists function. In the next columns, I will examine how measurement theorists of the 1940s, 1950s and today react to the Attenuation Paradox.
Gulliksen, H. (1945). The relation of item difficulty and inter-item correlation to test variance and reliability. Psychometrika, 10(2), 79-91.
Laudan, R., Laudan, L., & Donovan, A. (1988). Testing theories of scientific change. In A. Donovan, L. Laudan, & R. Laudan (Eds.), Scrutinizing science: Empirical studies of scientific change (pp. 3- 44). Dordrecht, The Netherlands: Kluwer Academic Publishers.
Loevinger, J. (1954). The attenuation paradox in test theory. Psychological Bulletin, 51, 493-504.
Thurstone, L.L. (1925). A method of scaling psychological and educational tests. Journal of Educational Psychology, 16(7), 433-451.
Tucker, L.R. (1946). Maximum validity of a test with equivalent items. Psychometrika, 11(1), 1-13.
What is The Attenuation Paradox?, G Engelhard Jr … Rasch Measurement Transactions, 1993, 6:4 p. 257
Rasch Publications | ||||
---|---|---|---|---|
Rasch Measurement Transactions (free, online) | Rasch Measurement research papers (free, online) | Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch | Applying the Rasch Model 3rd. Ed., Bond & Fox | Best Test Design, Wright & Stone |
Rating Scale Analysis, Wright & Masters | Introduction to Rasch Measurement, E. Smith & R. Smith | Introduction to Many-Facet Rasch Measurement, Thomas Eckes | Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. | Statistical Analyses for Language Testers, Rita Green |
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar | Journal of Applied Measurement | Rasch models for measurement, David Andrich | Constructing Measures, Mark Wilson | Rasch Analysis in the Human Sciences, Boone, Stave, Yale |
in Spanish: | Análisis de Rasch para todos, Agustín Tristán | Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez |
Forum | Rasch Measurement Forum to discuss any Rasch-related topic |
Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement
Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.
Coming Rasch-related Events | |
---|---|
June 23 - July 21, 2023, Fri.-Fri. | On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com |
Aug. 11 - Sept. 8, 2023, Fri.-Fri. | On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com |
The URL of this page is www.rasch.org/rmt/rmt64h.htm
Website: www.rasch.org/rmt/contents.htm