Test-equating based upon False Premises

The Psychopathy Checklist-Revised (PCL-R: Hare, 1991) is a 20-item, 0,1,2 response-category, summated rating instrument, completed by a trained rater either during an interview with the patient or from patient records. Cooke and Michie (1998) have attempted to equate diagnostic cut-off scores across countries using a 2-parameter IRT model.

Premise: Metric Equivalence = Meaning Equivalence

Quote #1: "As noted above, the presence of a common metric was ensured by anchoring the traits together using the 3 `anchor' items; that is, with the items with similar parameters in Scotland and North America. Using regression procedures, it was possible to demonstrate that the North American diagnostic cut-off score of 30 on the PCL-R North America is metrically equivalent to the diagnostic cut-off score of 25 in Scotland." (p. 30)

The cut-off score of 25 is justified on two grounds: "metric equivalence" and "inferred recidivism equivalence" (p. 40). In essence, metric equivalence is associated directly to meaning equivalence, which then permits a UK user to draw upon the extensive body of North American predictive validity results as an evidence-base. This argument, however, requires minimally that the PCL-R in the UK and US cultures is making equivalent measurement of the attribute "psychopathy".

Unfortunately, the use of a 2-parameter IRT model introduces two-dimensional measurement ­ individuals can differ not only in the amount of psychopathy, but perhaps also in something else that affects the linear measurement of the trait such that items can discriminate differentially between individuals over different regions of the trait measure. Basically, some items can discriminate very well between high and low-scorers on the trait, whereas other items barely discriminate at all. The issue here is that, if only one "thing" (e.g., psychopathy) is being measured by the items, then all items must, by definition, discriminate equally across the range of the trait. For the only way in which individuals can differ from one another is in how much or how little they possess of the trait ­ because the trait is being measured using equal-interval, additive measurement units. If the unit concatenation operation (arithmetic additivity) remains constant over the range of the trait, then items cannot differ in terms of their discrimination, because this requires that the unit of measurement changes its property over the range of the scale, such that addition of units no longer the fulfills at least the associativity axiom (Michell, 1990, p. 52) over all unit magnitudes on the scale.

Let's put this in terms of the measurement of length ­ that is, let us imagine that we are measuring length instead of psychopathy. We have a ruler (the trait scale = the psychopathy latent trait) that measures units of length. We measure objects with this ruler. We can then position these objects against our measurement scale (length) in order of magnitude of length (magnitude of psychopathy). The units (mm) on our ruler do not vary in width depending upon where they are on our ruler (the lower range or upper range). So each object's position, relative to every other object measured with our ruler, can be defined using linear, arithmetic, concatenation of the fundamental unit of measurement, the mm (or what we assume is a fundamental unit of psychopathy). But, if for some reason, we use a ruler whose mm units vary in width over the range of the scale of measurement, we can see that some objects are likely to be given the same length measurement because two or more objects of actually different length are now falling within a same "stretched" unit (measuring to the nearest mm). If we systematically stretch our units at the low end of our ruler, and compress them to near unit-width equality at our high end, then for objects of low real length (low scorers on PCL-R?) we can barely discriminate between them using our "trait" ruler. For high scorers, we will be making much sharper discrimination. However, you can now see what has happened to our assumed "metric" unidimensional measurement in order to achieve this ersatz "measurement". The amount of measured trait in an object is varying not only as a function of the amount of trait it possesses, but is also varying in relation to the relative position of the magnitude on the measurement scale. In short, objects fail to be discriminated solely as a function of the amount of trait they "possess". This does not give one any confidence in attempting to linearly "equate" trait scores between the North American and UK samples.

Quote #2: "However, it was found that when the a (discrimination) parameters alone were constrained to be equal [across countries], then the model fitted well. This indicated that the a parameters were essentially equal and that items discriminate as well in Scotland as they do in North America. However, the variation in the b (difficulty, facility) parameters revealed that the level of the underlying trait at which the characteristics of the disorder come apparent, differed in the two settings." (p. 28)

What we have in Quote #2 is an admission that the unit-width discrepancies are, in fact, near equal in both latent trait scales, but that the amount of trait in each item differs between the two cultures. This is equivalent to saying that we are using the same unequal width mm unit ruler in the two cultures, but that in the UK, all units on the ruler are also shortened by some constant factor such that the two rulers differ in overall length. The shortening is now assumed to be a perfect linear function of the North American ruler ­ hence, Cook and Michie can then equate a score of 30 on the US "psychopathy ruler" to 25 on the UK one.

The problem here is that we have now lost all connection with measurement, and are in some strange land where units not only change their width as a function of magnitudes, but now also linearly change their unequal widths as a function of culture. Surely, the only sensible and honest way to use the PCL-R raw or IRT "scores" are as ordinal magnitudes, where only ordinal relations between different magnitudes hold.

So, I would conclude that the "metric equivalence" argument justifying the simple subtraction of 5 score points from a North American cut-off score is flawed to some unknown degree, whilst predicated on an IRT model that uses two parameters to make ostensibly unidimensional measurement.

Premise: Numbers = Meaning

Finally, I have ignored the a priori specification of the meaning of psychopathy, and its rules for instantiation in the above. I note Salekin et al. (1996) refer to the PCL-R as a polythetic model with "more than 15,000 possible variations of psychopathy for scores equal to or greater than 30 (Rogers, 1995)". This test badly needs a Rasch model analysis to help sort out both its measurement and its supposed "polythetic" nature! This "polythetic" adjective seems more an excuse for clinicians' unwillingness to think clearly about the meaning instantiation and subsequent measurement of their constructs than a serious, meaningful, construct-definitional adjective.

Paul Barrett, The State Hospital (Carstairs), and University of Liverpool, UK

Cooke, D.J., Michie, C. (1998) Psychopathy across cultures. In Cooke, D.J., Forth, A.E., and Hare, R.D. (Eds.). Psychopathy: Theory, Research, and Implications for Society. Kluwer Academic Pub.

Hare, R. (1991) Hare Psychopathy Checklist, Revised. New York: Multi-Health Systems Inc.

Michell, J. (1990) An Introduction to the Logic of Psychological Measurement. Lawrence Erlbaum.

Rogers, R. (1995) Diagnostic and Structured Interviewing. Odessa, Fl: Psychological Assessment Resources.

Salekin, R.T., Rogers, R., Sewell, K.W. (1996) A review and meta-analysis of the Psychopathy Checklist and Psychopathy Checklist-Revised: predictive validity of dangerousness. Clinical Psychology: Science and Practice, 3, 3, 203-215

Test-equating based upon false premises. Barrett P. … Rasch Measurement Transactions, 2000, 14:1 p.732




Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse Rasch Measurement Theory Analysis in R, Wind, Hua Applying the Rasch Model in Social Sciences Using R, Lamprianou El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Rasch Models for Measurement, David Andrich Constructing Measures, Mark Wilson Best Test Design - free, Wright & Stone
Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias Diseño de Mejores Pruebas - free, Spanish Best Test Design A Course in Rasch Measurement Theory, Andrich, Marais Rasch Models in Health, Christensen, Kreiner, Mesba Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen
Rasch Books and Publications: Winsteps and Facets
Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes Rasch Models for Solving Measurement Problems (Facets), George Engelhard, Jr. & Jue Wang Statistical Analyses for Language Testers (Facets), Rita Green Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind Rasch Measurement: Applications, Khine Winsteps Tutorials - free
Facets Tutorials - free
Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan

To be emailed about new material on www.rasch.org
please enter your email address here:

I want to Subscribe: & click below
I want to Unsubscribe: & click below

Please set your SPAM filter to accept emails from Rasch.org

www.rasch.org welcomes your comments:

Your email address (if you want us to reply):

 

ForumRasch Measurement Forum to discuss any Rasch-related topic

Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Oct. 4 - Nov. 8, 2024, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Jan. 17 - Feb. 21, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
May 16 - June 20, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com

 

The URL of this page is www.rasch.org/rmt/rmt141c.htm

Website: www.rasch.org/rmt/contents.htm