Recommendation: If the additional weight is intended to indicate a higher level of performance, then use a rating scale.
If the additional weight is intended to indicate replications of the same level of performance, then use item weighting.
If the additional weight is merely to make the scores look nicer, then use linear-rescaling of the measurement units.
Examples: a dichotomous item is scored 0-4 instead of 0-1:
In general, each observation is expected to be an independent and equal witness to examinee ability. The scientific motivation for this expectation is comparable to the motivations for random sampling and randomization. The introduction of arbitrary emphases, such as item weights, degrades the inferential stability of results and biases conclusions in an unreproducible way.
In the political world of examinations, however, some observations are decreed more important than others. For instance, if a pass- fail decision is to be made on the composite outcome of a 100 item MCQ test and one essay graded from 0 to 10, then the examination board may decide to assign the essay rating a weight 10 times heavier in order to give the essay and the MCQ items supposedly "equal" weight in the final decision.
Should you fall victim to such a decree, there are several ways the weights can be implemented with Rasch computer programs. Since each method has its drawbacks, initial data screening and quality control should proceed as though no weights existed. Once the measurement process has been validated, the following assignment methods may help:
1. The essay ratings and the MCQ items are analyzed separately, yielding two ability measures for each examinee. If there is insufficient overlap among the essay ratings, then additional constraints are required, such as modelling the ratings as binomial trials, and asserting that each grader is equally severe in order for a coherent set of essay measures to be produced. For the pass- fail decision, a weighted sum of the pairs of ability measures is used " the precise formula will be complicated by the different logit ranges of the two variables. The way to see what to do is to plot MCQ vs. Essay measures, and then to draw on this plot the line that best asserts the conjoint judgment of the standard setting committee. This method is the most comprehensible.
2. Each essay rating is entered 10 times (or each essay is given a weight of 10 times), and then the MCQ items and the essay ratings are analyzed together. This diminishes local independence among the observations but avoids the complication of two measurement scales. The replicated data will make the reported standard errors too small. In this example, they should be inflated about 75%. The 10 essay difficulties will be reported at about the same location on the variable as the one original essay difficulty.
3. Use explicit item weights, e.g., using IWEIGHT= in Winsteps, but adjusting the item weights to maintain approximately correct standard errors and score range. The original score range is 0-110. The essay is to be upweighted 10 times. This would give a score range 0-200. So to keep the meaningful score range, the weights needs to be adjusted by 110/200 = .55. So each MCQ item is weighted .55, and the essay item is weighted 5.50. This method is operationally the simplest.
4. Each essay rating is multiplied by 10, and then the rescaled 0- 100 essay ratings are analyzed with the MCQ items. Since only every 10th category of the 0-100 essay rating scale is observed, the analysis must allow for structurally present, but empirically absent, categories (Wilson RMT 5:1 p. 128). Again, standard errors will need to be inflated about 75% due to the effect of the fictitious categories. Only one essay difficulty will be reported, but it will not be at the same location on the variable as the 0-10 essay would have been. By convention, the difficulty of a rating scale item is chosen so that the sum of the step difficulties is zero, i.e., at the location on the variable where the highest and lowest possible ratings on the item are equally probable. If the difficulty of the 0-10 essay item is D logits from the center of the person ability distribution, the difficulty of the 0-100 essay item will be much closer to the mean ability, only about D/10 logits away. This makes the construct harder to understand, and can be confusing if the assigned weights are changed.
Assigning item weights: Item Replication or Rating Scales?. Linacre JM, Wright BD. Rasch Measurement Transactions, 1995, 8:4 p.403
Forum | Rasch Measurement Forum to discuss any Rasch-related topic |
Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement
Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.
Coming Rasch-related Events | |
---|---|
Oct. 4 - Nov. 8, 2024, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
Jan. 17 - Feb. 21, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
May 16 - June 20, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
June 20 - July 18, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com |
Oct. 3 - Nov. 7, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
The URL of this page is www.rasch.org/rmt/rmt84p.htm
Website: www.rasch.org/rmt/contents.htm