Using a Partial-Credit Rasch Model to Detect Social Desirability Bias

In Public Health, since many causes of disease, disability and death are preventable by behavior changes there is a strong reliance on designing and evaluating prevention-focused interventions. In designing these evaluations, researchers rely heavily on scales to assess complex variables including knowledge, attitudes and even behavior. Oftentimes, sensitive topics or behaviors are measured using these self-reported scales. Under Classical Test Theory (CTT) and using traditional statistical tests (e.g. t-tests on raw scores), evaluators assume equal interval properties of Likert-scales without ever systematically assessing whether the scales truly fit that assumption. The danger of this assumption is that if respondents are not using the rating scale categories in the hypothesized manner, results obtained from these analyses may be misleading or incorrect.

To illustrate this challenge, data from the longitudinal evaluation of the Americorps Program were used (CNS, 2004). Among the immediate hypothesized outcomes of the Americorps program are Awareness of Others/Diversity, impacted by both specific educational activities offered by the program and by the diversity of the Corps itself. The program hopes to improve participants' understanding of diverse cultures and backgrounds, and appreciation of the value of diverse people and opinions. The evaluation of the program included an 11-item Appreciation of Ethnic and Cultural Diversity Scale to assess the change in this latent variable.

While many interventions have a similar desired impact on appreciation of diversity and other potentially sensitive topics, measurement of this latent variable is fraught with challenges. For one, social norms impact respondent behavior on self-reported surveys, even those that employ validated scales. The extent to which these norms influence respondent behavior in self-administered surveys comprises Social Desirability Bias (e.g. Nederhof, 1985). Those interested in assessing variables highly susceptible to social norms should be particularly interested in detecting whether or not Social Desirability Bias is at work in their sample.

A partial credit FACETS model was run using the data from this evaluation (n=4,016). The model included three facets (respondents, items and time period/group) to account for the design of the original evaluation (which included a pre-test and post-test for both Americorps members and a comparison group). Overall fit for the model varied for each of the three facets (Table 1). While the mean OUTFIT mean-square for respondents was 1.08, close to its expected value of 1.0, the S.D. is somewhat larger than is the typically encountered for well-behaved data. 8% of the people had alarmingly large mean-square values over 2.0, forcing another 12% to have mean-square values less than 0.5. This misfit prompts an investigation into whether Social Desirability Bias may have influence these people's responses.

The items were asked using a five-point Likert-type scale. A FACETS model was used to estimate the mean "ability" (location on the latent variable) of those who responded in each category of each item's rating scale. If the mean abilities are disordered, this could indicate that our respondents did not treat the rating scale as strictly monotonic, resulting in empirically disordered categories. Consequently we could have reason to believe that Social Desirability Bias may have skewed the use of the rating scales.

Figure 1 shows three examples of the distribution of ability estimates for each of the five rating scale options. For example, for the second item the ability estimate that corresponds with Strongly Disagree is 0.83 logits, the ability estimate for Strongly Agree is 2.74 logits. Six of the eleven items show a similar disordering of the rating scale categories; in other words, for those items there is at least one point in the rating scale where the ability estimate that corresponds to the category goes down while the category goes up. This indicates that respondents chose a higher rating for those items than their actual ability; a sign that Social Desirability Bias is likely in play.

Using these methods to detect Social Desirability Bias may also provide opportunities to correct the analysis plan. Replacing rating scale category values (e.g. 1 for Strongly Disagree) with the estimated ability from the Rasch model, for example, will allow the analysis to take into account the disordered ratings of participants. This allows researchers to account for the impact of inaccurate self-assessment without altering the format of the scale itself.

Using CTT and t-tests on raw scores for these items assumes that Strongly Disagree is the lowest rating, but these results show that for some items Disagree represented the lowest rating. In the Americorps evaluation, participants from certain programs performed significantly worse on this variable at follow-up as compared to baseline. The evaluators concluded that program-related experiences, "may [have led] to short-term disillusion with the concept of working in diverse groups." This analysis, however, indicates that an alternate analysis approach that incorporates the ability estimates for each item's rating scale may provide a more accurate impact of the program that is less impacted by Social Desirability Bias.

Nederhof, AJ. (1985). Methods of coping with social desirability bias: a review. European Journal of Social Psychology, 15: 263-280

Serving Country and Community: A Longitudinal Study of Service in Americorps. 2004, Corporation for National and Community Service (CNS), Office of Research and Policy Development: Washington, DC. Available at: http://www.americorps.org/pdf/06_1223_longstudy_report.pdf

Using a Partial-Credit Rasch Model to Detect Social Desirability Bias … L.M. Lessard, Rasch Measurement Transactions, 2008, 21:4 p. 1134-5

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn. George Engelhard, Jr. & Jue Wang	Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Apr. 21 - 22, 2025, Mon.-Tue.	International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025	On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025	On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
July 21 - 23, 2025, Mon.-Wed.	Pacific Rim Objective Measurement Symposium (PROMS) 2025, www.proms2025.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com