Optimizing a rating scale is "fine-tuning" to try to squeeze the last ounce of performance out of a test. So the first stage is to check that everything else about the test is working as well as is reasonable. For instance, there is no point in trying to optimize a rating scale if half the sample employ a "response set". Clean the data as much as possible. Put to one side for the moment clearly misfitting items and idiosyncratic people. When you have a core that looks like it should work well, take a look at the misfitting responses. Make sure that no data entry errors, random guessing, or other off-dimensional "bad spots" remain. Now you are ready to begin optimizing. Remember these are only guidelines. Not all apply. Not all are good to do under all circumstances. Keep a good eye on what is happening at the item level. The more you collapse categories, the more statistical and diagnostic information you lose.
|
Stage |
Guideline |
Measure Stability |
Measure Accuracy (Fit) |
Description of this sample |
Inference for next sample |
|---|---|---|---|---|---|
|
Pre. |
Scale oriented with latent variable |
Essential |
Essential |
Essential |
Essential |
|
1. |
At least 10 observations of each category. |
Essential |
Helpful |
|
Helpful |
|
2. |
Regular observation distribution. |
Helpful |
|
|
Helpful |
|
3. |
Average measures advance monotonically with category. |
Helpful |
Essential |
Essential |
Essential |
|
4. |
OUTFIT mean‑squares less than 2.0. |
Helpful |
Essential |
Helpful |
Helpful |
|
5. |
Step calibrations advance. |
|
|
|
Helpful |
|
6. |
Ratings imply measures, and measures imply ratings. |
|
Helpful |
|
Helpful |
|
7. |
Step difficulties advance by at least 1.4 logits. |
|
|
|
Helpful |
|
8. |
Step difficulties advance by less than 5.0 logits |
Helpful |
|
|
|
Summary of Guideline Pertinence. from JAM, 2002
This is an early research note. See Journal of Applied Measurement 3:1 2002 p.85-106.
Also:
Optimizing Rating Scales for Self-Efficacy (and Other) Research
Educational and Psychological Measurement, 1 June 2003, vol. 63, no. 3, pp. 369-391(23)
SMITH Jr. E.V.[1]; WAKELY M.B.[2]; de KRUIF R.E.L.[3]; SWARTZ C.W.[4]
[1] University of Illinois at Chicago [2] All Kinds of Minds [3] Virginia Commonwealth University [4] MetaMetrics, Inc.
Abstract:
This article (a) discusses the assumptions underlying the use of rating scales, (b) describes the use of information available within the context of Rasch measurement that may be useful for optimizing rating scales, and (c) demonstrates the process in two studies. Participants in the first study were 330 fourth- and fifth-grade students. Participants provided responses to the Index of Self-Efficacy for Writing. Based on category counts, average measures, thresholds and category fit statistics, the responses on the original 10-point scale were better represented by a 4-point scale. The modified 4-point scale was given to a replication sample of 668 fourth- and fifth-grade students. The rating scale structure was found to be congruent with the results from the first study. In addition, the item fit statistics and item hierarchy indicated the writing self-efficacy construct to be stable across the two samples. Combined, these results provide evidence for the generalizability of the findings and hence utility of this scale for use with samples of respondents from the same population.
Example: Guilford's Ratings of Creativity, (Psychometric Methods p.282 Guilford 1954)
-----------------------------------------------------------------------------------------------------
| DATA | FIT | STEP | EXPECTATION | MOST |THURSTONE| Cat|Response|
| Category Counts Cum.|Averge OUTFIT|CALIBRATIONS | Measure at |PROBABLE|THRESHOLD|PEAK|Category|
|Score Used % % |Measure MnSq |Measure S.E.|Category -0.5 | from | at |Prob| Name |
-----------------------------------------------------------------------------------------------------
| 1 4 4% 4%| -.85 .8 | |( -2.69) | low | low |100%| lowest |
| 2 4 4% 8%| -.11 2.6 | -.63 .53| -1.64 -2.20| | -1.73 | 17%| |
| 3 25 24% 31%| -.36* .9 | -2.31 .39| -.93 -1.25| -1.47 | -1.38 | 48%| |
| 4 8 8% 39%| -.43* .5 | .84 .25| -.41 -.65| | -.46 | 11%| |
| 5 31 30% 69%| -.04 .8 | -1.48 .24| .02 -.19| -.32 | -.29 | 39%| middle |
| 6 6 6% 74%| -.46* 4.1 | 1.71 .25| .44 .23| | .33 | 9%| |
| 7 21 20% 94%| .45 .6 | -1.01 .26| .93 .67| .35 | .47 | 47%| |
| 8 3 3% 97%| .74 .5 | 2.35 .44| 1.61 1.23| | 1.36 | 16%| |
| 9 3 3% 100%| .76 .7 | .53 .60|( 2.67) 2.16| 1.44 | 1.68 |100%| highest|
------------------------------------------------------(Mean)---------(Modal)--(Median)---------------
Probability Curves
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
++----------+----------+----------+----------+----------+----------++
1 | |
| |
|1 9|
| 111 999 |
| 11 999 |
P | 11 99 |
r | 11 9 |
o | 1 99 |
b | 11 9 |
a | 1 9 |
b | 1 3 99 |
i | 1 3333 333 77777777 9 |
l | 133 33 555 77 7* |
i | 3311 355 55* 9 7 |
t | 3 1 5533 7 55 9 77 |
y | 33 1 55 3 77 5 99 77 |
| 33 11 5 * 55 9 77 |
| 2**2222222222222** 77 33 9*5888888888888** |
|2222***3 55*****44**444*6**66***8855 ***8888 |
|3333 4****44 7******6 ******3 6666**** 7777*|
0 |*******************************************************************|
++----------+----------+----------+----------+----------+----------++
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
First, express the rating scale as a clearly defined, substantively relevant, ordered sequence of categories. Then use these guidelines to check it for measurement effectiveness.
Guideline 1: At least 10 observations of a category.
Step difficulty (Fk) is approximately the log-ratio of the frequency of adjacent
categories. When category frequency is low, then the step difficulty is poorly estimated and
unstable.
In example: Used counts as low as 3.
Solution: combine adjacent categories, or omit observations (e.g., "don't know")
Guideline 2: Observation distribution.
Irregularity in category observation frequency signals irregularity in usage. Look for
unimodal use or peaking in a central or extreme categories.
In example: roller-coaster Used distribution.
Solution: combine adjacent categories, or omit observations (e.g., "other")
Guideline 3: Average category measures advance.
Average measures are an empirical indicator of the context in which the category is used.
Since higher categories are intended to reflect higher measures, then the average measures are
expected to advance.
In example: average measure for category 6 is noticeably less than for category 5.
Solution: combine out of order categories with those below them.
Guideline 4: Outfit mean-squares less than 2.0.
We model a definite amount of randomness in choosing categories. This amount is
indicated by a mean-square of 1.0. Values over 2.0 indicate that there is more unexpected than
expected randomness. A high mean-square value indicates that this category has been used in
contexts in which the expected category is far different.
In example: category 6 has a mean-square of 4.1.
Solution: omit observations, combine categories or drop categories.
Guideline 5: Step difficulties advance.
Advancing step difficulties imply that each category in turn is most likely to be chosen.
This makes the probability curves look like a range of hills. Disordered step difficulties imply
that a category may not be observed as one advances along the variable. Categories with narrow
definitions produce disordered step difficulties. Disordered step difficulties do not mean that the
categories are out of order. The decision to eliminate or combine narrow categories must be
decided substantively based on the reasons for selecting the rating categories. for developmental
scales, ordered categories support the interpretation that a rating of k implies having passed
through k-1 lower categories.
In example: step 3 is less than step 2.
Solution: combine categories, edit data, but may not be attainable.
Guideline 6: Step difficulties advance by at least 1.4 logits.
When all step difficulty advances are larger than 1.4 logits, then the rating scale can be
decomposed, in theory, into a series of independent dichotomous items. Even though such
dichotomies may not be empirically meaningful, their possibility implies that the rating scale is
equivalent to a subtest of (category count - 1) dichotomies. For developmental scale, this
supports the interpretation that a rating of k implies successful leaping of k hurdles.
In example: this is not seen, due to disordering.
Solution: combine categories, edit data, but may not be attainable.
Guideline 7: Step difficulties advance by less than 5.0 logits
When adjacent step difficulties are too far apart, then a category becomes too wide and
a less informative dead zone appears in the middle of the category. This corresponds to a sag
in the statistical information available from the item. Often this results from Guttman-style
(forced consensus) rating procedures.
Solution: define more categories; change rating procedures.
MESA Research Note #2 by John Michael Linacre
Midwest Objective Measurement Seminar, Chicago, June 1997
Go to Top of Page
Go to Institute for Objective Measurement Page
| Rasch Publications | ||
|---|---|---|
| Rasch Measurement Transactions (free, online) | Rasch Measurement research papers (free, online) | Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch |
| Applying the Rasch Model 2nd. Ed., Bond & Fox | Best Test Design, Wright & Stone | Rating Scale Analysis, Wright & Masters |
| Introduction to Rasch Measurement, E. Smith & R. Smith | Introduction to Many-Facet Rasch Measurement, Thomas Eckes | Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. |
| Statistical Analyses for Language Testers, Rita Green | Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar | Journal of Applied Measurement |
|
|
|
| FORUM | Rasch Measurement Forum to discuss any Rasch-related topic |
| Coming Rasch-related Events | |
|---|---|
| June 19-21, 2013, Wed.-Fri. | SIS 2013 Conference on Advances in Latent Variables: Methods, Models and Applications, Brescia, Italy, meetings.sis-statistica.org/index.php/sis2013/ALV |
| July 1 - Nov. 30, 2013, Mon.-Sun. | Online Course: Introduction to Rasch Measurement Theory (D. Andrich, RUMM), www.education.uwa.edu.au/ppl/courses |
| July 5 - Aug. 2, 2013, Fri.-Fri. | On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com |
| July 15, 2013, Monday | Application deadline: UIC Educational Research Methodology online graduate certificate program, www.go.uic.edu/OnlineMESA |
| July 22, 2013, Monday | Submission deadline: 2014 AERA Annual Meeting, Philadelphia PA, www.aera.net |
| Aug.1-5, 2013, Thur.-Mon. | TERA-PROMS Annual Meeting, Kaohsiung, Taiwan, tera.education.nsysu.edu.tw |
| Aug. 9 - Sept. 6, 2013, Fri.-Fri. | On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com |
| Aug. 22, 2013, Thursday. | Symposium in honor of Svend Kreiner, Copenhagen, Denmark, biostat.ku.dk/kreinersymposium |
| Sept. 4-6, 2013, Wed.-Fri. | IMEKO TC1-TC7-TC13 Symposium: Measurement Across Physical and Behavioural Sciences, Genoa, Italy, www.imeko-genoa-2013.it |
| Sept. 13 - Oct. 11, 2013, Fri.-Fri. | On-line workshop: Rasch Applications in Clinical Assessment, Survey Research, and Educational Measurement (W.P. Fisher), www.statistics.com |
| Sept. 18-20, 2013, Wed.-Fri. | In-person workshop: Introductory Rasch (A. Tennant, RUMM), Leeds, UK, www.leeds.ac.uk/medicine/rehabmed/psychometric |
| Sept. 23-25, 2013, Mon.-Wed. | In-person workshop: Intermediate Rasch (A. Tennant, RUMM), Leeds, UK, www.leeds.ac.uk/medicine/rehabmed/psychometric |
| Sept. 26-27, 2013, Thurs.-Fri. | In-person workshop: Advanced Rasch (A. Tennant, RUMM), Leeds, UK, www.leeds.ac.uk/medicine/rehabmed/psychometric |
| Oct. 18 - Nov. 15, 2013, Fri.-Fri. | On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
| Oct. 20 - Oct. 25, 2013, Sun.-Fri. | International Association for Educational Assessment (IAEA) 39th Annual Conference, Tel Aviv, Israel, www.iaea-2013.com |
| Dec. 11-13, 2013, Wed.-Fri. | In-person workshop: Introductory Rasch (A. Tennant, RUMM), Leeds, UK, www.leeds.ac.uk/medicine/rehabmed/psychometric |
| March 12-14, 2014, Wed.-Fri. | In-person workshop: Introductory Rasch (A. Tennant, RUMM), Leeds, UK, www.leeds.ac.uk/medicine/rehabmed/psychometric |
| April 3-7, 2014, Thurs.-Mon. | AERA Annual Meeting, Philadelphia PA, www.aera.net |
| May 14-16, 2014, Wed.-Fri. | In-person workshop: Introductory Rasch (A. Tennant, RUMM), Leeds, UK, www.leeds.ac.uk/medicine/rehabmed/psychometric |
| May 19-21, 2013, Mon.-Wed. | In-person workshop: Intermediate Rasch (A. Tennant, RUMM), Leeds, UK, www.leeds.ac.uk/medicine/rehabmed/psychometric |
| July 4 - Aug. 1, 2014, Fri.-Fri. | On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com |
| Aug. 8 - Sept. 5, 2014, Fri.-Fri. | On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com |
| Sept. 10-12, 2014, Wed.-Fri. | In-person workshop: Introductory Rasch (A. Tennant, RUMM), Leeds, UK, www.leeds.ac.uk/medicine/rehabmed/psychometric |
| Sept. 12 - Oct. 10, 2014, Fri.-Fri. | On-line workshop: Rasch Applications in Clinical Assessment, Survey Research, and Educational Measurement (W.P. Fisher), www.statistics.com |
| Sept. 15-17, 2014, Mon.-Wed. | In-person workshop: Intermediate Rasch (A. Tennant, RUMM), Leeds, UK, www.leeds.ac.uk/medicine/rehabmed/psychometric |
| Sept. 18-19, 2014, Thurs.-Fri. | In-person workshop: Advanced Rasch (A. Tennant, RUMM), Leeds, UK, www.leeds.ac.uk/medicine/rehabmed/psychometric |
| The javascript to add "Coming Rasch-related Events" to your webpage is: <script type="text/javascript" src="http://www.rasch.org/events.txt"></script> | |
Our current URL is www.rasch.org
The URL of this page is www.rasch.org/rn2.htm