Comparing and Choosing between "Partial Credit Models" (PCM) and "Rating Scale Models" (RSM)

"I'd like to compare a Rasch rating scale model and a partial credit model with the same data. Are there ways to compare the two models?"

A "rating scale" model is one in which all items (or groups of items) share the same rating scale structure. A "partial credit" model is one in which each item has a unique rating scale structure. This increases the number of free parameters to be estimated by (L-1)*(m-2) where L is the number items and m the number of categories in the rating scale. A statistician might reply "Comparing models is easy! Compute the difference between the chi-squares for the models. Then test the hypothesis that the partial credit model fits no better than the rating scale model." A measurement practitioner might suggest "Compare the sample separation indices. Has there been a noticeable improvement in measure discrimination?" In practice, however, estimate stability and the communication of useful results all push towards fewer rating scale parameters.

Estimate stability. With the partial credit model, there may be only a few, or maybe no, observations in some categories of some items. The difficulty estimates of those items will be insecure, and a weak basis for inference (which is why we are bothering to do this analysis, isn't it?). A slightly better fit of the data to the model is of no value if it does not lead to a stronger basis for inference. [At least 10 ratings per category are recommended.]

Communication of useful results. If all the items or groups of items use the same response format, e.g., Strongly Disagree, Disagree, Agree, Strongly Agree, then the test constructors, respondents, and test users all perceive those items to share the same rating scale. To attempt to explain a separate parameterization for the rating-scale structure of each item would be an exercise in futility. It may be that one or two items clearly have unique structures. For instance, they may be badly-worded "true-false" item stems with Likert responses. Model most items to share the common rating scale(s), and allow just those very few idiosyncratic items to have their own scales. Even then, expect your audience to stumble as you try to explain your construct and your substantive findings. It will probably be more productive simply to omit those aberrant items.

Comparing and Choosing between "Partial Credit Models" (PCM) and "Rating Scale Models" (RSM), Linacre, J.M. … Rasch Measurement Transactions, 2000, 14:3 p.768

PCM vs. RSM is a difficult decision. Here are some considerations:

1. Design of the items.
If items are obviously intended to share the same rating scale (e.g., Likert agreement) then the Rating Scale Model (RSM) or Group Rating Scale Model is indicated, and it requires strong evidence for us to use a Partial Credit Model (PCM).
But if each item is designed to have a different rating scale, then PCM (or grouped-items) is indicated, and it requires strong evidence for us to use RSM.

2. Communication with the audience.
It is difficult to communicate many small variants of the functioning of the same substantive rating scale (e.g., a Likert scale). So the differences between items must be big enough to merit all the extra effort.
Sometimes this difference requires recoding of the rating scale for some items. A typical situation is where the item does not match the response-options, such as when a Yes/No item "Are you a smoker?" has a Likert-scale responses: "SD, D, N, A, SA".

3. Size of the dataset.
If there are less than 10 observations in a category used for estimation, then the estimation is not robust against accidents in the data. In most datasets, RSM does not have this problem. But a dataset of items with 4 response categories, but only 62 measured persons, indicates that some items may have less than 10 observations in some categories. This may be evidence against PCM.

4. Construct and Predictive Validity.
Compare the two sets of item difficulties, and also the two sets of person abilities.
Is there any meaningful difference? If not, use RSM.
If there is a meaningful difference, which analysis is more meaningful?
For instance, PCM of the "Liking for Science" data loses its meaning because not all categories are observed for all items. This invalidates the item difficulty hierarchy for PCM.

5. Fit considerations.
AIC is a global fit indicator, which is good. But global-fit also means that an increase in noise in one part of the data can be masked by an increase in dependency in another part of the data. In Rasch analysis, underfit (excess non-modeled noise) is a much greater threat to the validity of the measures than overfit (over-predictability, dependency). So we need to monitor parameter-level fit statistics along with global fit statistics.

Global log-likelihood is intended to compare descriptive (explanatory) models, for which the model fits the data. Rasch models are prescriptive models for which the data fit the model. We "prescribe" the Rasch model with the properties we need for our measures to be useful, and then fit the data to that model. If the fit is poor, then the data are deficient. We need better data (not a better model) for the purposes for which our data are intended.

6. Constructing new items.
With the partial-credit model, the thresholds of the rating-scale are unknown in advance of data collection. With the rating-scale model, they are the current thresholds.

7. Unobserved categories.
With the partial-credit mode, unobserved (in this sample) categories for an item distort the rating-scale structure. With the rating-scale model, the functioning of an unobserved category for one item is inferred from observations of the same category for other items.

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn. George Engelhard, Jr. & Jue Wang	Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Apr. 21 - 22, 2025, Mon.-Tue.	International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025	On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025	On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
July 21 - 23, 2025, Mon.-Wed.	Pacific Rim Objective Measurement Symposium (PROMS) 2025, www.proms2025.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com