Test-equating based upon False Premises

The Psychopathy Checklist-Revised (PCL-R: Hare, 1991) is a 20-item, 0,1,2 response-category, summated rating instrument, completed by a trained rater either during an interview with the patient or from patient records. Cooke and Michie (1998) have attempted to equate diagnostic cut-off scores across countries using a 2-parameter IRT model.

Quote #1: "As noted above, the presence of a common metric was ensured by anchoring the traits together using the 3 `anchor' items; that is, with the items with similar parameters in Scotland and North America. Using regression procedures, it was possible to demonstrate that the North American diagnostic cut-off score of 30 on the PCL-R North America is metrically equivalent to the diagnostic cut-off score of 25 in Scotland." (p. 30)

The cut-off score of 25 is justified on two grounds: "metric equivalence" and "inferred recidivism equivalence" (p. 40). In essence, metric equivalence is associated directly to meaning equivalence, which then permits a UK user to draw upon the extensive body of North American predictive validity results as an evidence-base. This argument, however, requires minimally that the PCL-R in the UK and US cultures is making equivalent measurement of the attribute "psychopathy".

Unfortunately, the use of a 2-parameter IRT model introduces two-dimensional measurement individuals can differ not only in the amount of psychopathy, but perhaps also in something else that affects the linear measurement of the trait such that items can discriminate differentially between individuals over different regions of the trait measure. Basically, some items can discriminate very well between high and low-scorers on the trait, whereas other items barely discriminate at all. The issue here is that, if only one "thing" (e.g., psychopathy) is being measured by the items, then all items must, by definition, discriminate equally across the range of the trait. For the only way in which individuals can differ from one another is in how much or how little they possess of the trait because the trait is being measured using equal-interval, additive measurement units. If the unit concatenation operation (arithmetic additivity) remains constant over the range of the trait, then items cannot differ in terms of their discrimination, because this requires that the unit of measurement changes its property over the range of the scale, such that addition of units no longer the fulfills at least the associativity axiom (Michell, 1990, p. 52) over all unit magnitudes on the scale.

Let's put this in terms of the measurement of length that is, let us imagine that we are measuring length instead of psychopathy. We have a ruler (the trait scale = the psychopathy latent trait) that measures units of length. We measure objects with this ruler. We can then position these objects against our measurement scale (length) in order of magnitude of length (magnitude of psychopathy). The units (mm) on our ruler do not vary in width depending upon where they are on our ruler (the lower range or upper range). So each object's position, relative to every other object measured with our ruler, can be defined using linear, arithmetic, concatenation of the fundamental unit of measurement, the mm (or what we assume is a fundamental unit of psychopathy). But, if for some reason, we use a ruler whose mm units vary in width over the range of the scale of measurement, we can see that some objects are likely to be given the same length measurement because two or more objects of actually different length are now falling within a same "stretched" unit (measuring to the nearest mm). If we systematically stretch our units at the low end of our ruler, and compress them to near unit-width equality at our high end, then for objects of low real length (low scorers on PCL-R?) we can barely discriminate between them using our "trait" ruler. For high scorers, we will be making much sharper discrimination. However, you can now see what has happened to our assumed "metric" unidimensional measurement in order to achieve this ersatz "measurement". The amount of measured trait in an object is varying not only as a function of the amount of trait it possesses, but is also varying in relation to the relative position of the magnitude on the measurement scale. In short, objects fail to be discriminated solely as a function of the amount of trait they "possess". This does not give one any confidence in attempting to linearly "equate" trait scores between the North American and UK samples.

Quote #2: "However, it was found that when the a (discrimination) parameters alone were constrained to be equal [across countries], then the model fitted well. This indicated that the a parameters were essentially equal and that items discriminate as well in Scotland as they do in North America. However, the variation in the b (difficulty, facility) parameters revealed that the level of the underlying trait at which the characteristics of the disorder come apparent, differed in the two settings." (p. 28)

What we have in Quote #2 is an admission that the unit-width discrepancies are, in fact, near equal in both latent trait scales, but that the amount of trait in each item differs between the two cultures. This is equivalent to saying that we are using the same unequal width mm unit ruler in the two cultures, but that in the UK, all units on the ruler are also shortened by some constant factor such that the two rulers differ in overall length. The shortening is now assumed to be a perfect linear function of the North American ruler hence, Cook and Michie can then equate a score of 30 on the US "psychopathy ruler" to 25 on the UK one.

The problem here is that we have now lost all connection with measurement, and are in some strange land where units not only change their width as a function of magnitudes, but now also linearly change their unequal widths as a function of culture. Surely, the only sensible and honest way to use the PCL-R raw or IRT "scores" are as ordinal magnitudes, where only ordinal relations between different magnitudes hold.

So, I would conclude that the "metric equivalence" argument justifying the simple subtraction of 5 score points from a North American cut-off score is flawed to some unknown degree, whilst predicated on an IRT model that uses two parameters to make ostensibly unidimensional measurement.

Finally, I have ignored the a priori specification of the meaning of psychopathy, and its rules for instantiation in the above. I note Salekin et al. (1996) refer to the PCL-R as a polythetic model with "more than 15,000 possible variations of psychopathy for scores equal to or greater than 30 (Rogers, 1995)". This test badly needs a Rasch model analysis to help sort out both its measurement and its supposed "polythetic" nature! This "polythetic" adjective seems more an excuse for clinicians' unwillingness to think clearly about the meaning instantiation and subsequent measurement of their constructs than a serious, meaningful, construct-definitional adjective.

Cooke, D.J., Michie, C. (1998) Psychopathy across cultures. In Cooke, D.J., Forth, A.E., and Hare, R.D. (Eds.). Psychopathy: Theory, Research, and Implications for Society. Kluwer Academic Pub.

Hare, R. (1991) Hare Psychopathy Checklist, Revised. New York: Multi-Health Systems Inc.

Michell, J. (1990) An Introduction to the Logic of Psychological Measurement. Lawrence Erlbaum.

Rogers, R. (1995) Diagnostic and Structured Interviewing. Odessa, Fl: Psychological Assessment Resources.

Salekin, R.T., Rogers, R., Sewell, K.W. (1996) A review and meta-analysis of the Psychopathy Checklist and Psychopathy Checklist-Revised: predictive validity of dangerousness. Clinical Psychology: Science and Practice, 3, 3, 203-215

Test-equating based upon false premises. Barrett P. … Rasch Measurement Transactions, 2000, 14:1 p.732

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn. George Engelhard, Jr. & Jue Wang	Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Apr. 21 - 22, 2025, Mon.-Tue.	International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025	On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025	On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
July 21 - 23, 2025, Mon.-Wed.	Pacific Rim Objective Measurement Symposium (PROMS) 2025, www.proms2025.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com