# Estimating Rasch Measures for Extreme Scores

Extreme scores (zero and perfect scores) imply extreme, but indefinitely located, measures. Indefinite measures are awkward to report and difficult to use in further analyses, such as computing means and standard deviations. What can be done to give these measures definite values? Here are several approaches. They are all based on the Bayesian idea that we would not have administered the test to the person, or included the item on the test, unless we thought that the person or item was relevant. Consequently, an extreme score implies a measure only slightly out of the measurement range of the test, not a measure a considerable distance away.

I. The extreme score is only barely extreme.

Raw scores are observed on an ordinal scale. Fractional raw scores are unobservable. Consequently any measure that yields an expected raw score closer than 0.5 score points to an extreme score is expected to be observed as producing an extreme score. Consequently the most central measure for a zero score is that corresponding to 0.5 score points, and for a perfect score is that corresponding to a perfect score less 0.5 score points. After the measures for non-extreme item and person have been estimated in the usual way, the measures corresponding to these almost extreme raw scores can be estimated (RMT 10:2 p. 499). Other commonly-used extreme score corrections are 1/3 and 1/4.

II. The extreme measure is only barely extreme.

From raw score R a measure MR and its standard error SER can be estimated. The measure for score R+1 is approximately MR + SER²(see Wright & Stone, BTD, 1979, 192-5). Thus the measure for an extreme score can be estimated from the measure for a score 1 point less extreme [see Table]. If S is the perfect score, then MS ≈ MS-1 + SES-1².

III. The extreme measure is only barely significantly different.

Only measures statistically significantly more extreme than non-extreme measures would provoke separate consideration. Thus a measure MS = MS-1 + 1.65*SES-1 is the most central that would cause the rejection of the hypothesis, at the .05 level, that MS and MS-1 are statistical equivalent.

IV. The extreme measure aligns smoothly with non-extreme measures.

This can be achieved by curve-fitting. For instance, a quadratic fit of MS to MS-1, MS-2 and MS-3 yields MS = 3*MS-1 - 3*MS-2 + MS-3.

V. The extreme response string is only barely modal.

The likelihood of each possible response string for a particular measure can be computed as LR = Pnix where x = R, the raw score corresponding to that response string. If L0>0.5 for a measure, then that measure will probably produce a response string with a raw score of zero. If L0<0.5, then a non-zero score will probably be observed. The most central measure likely to produce an extreme measure is the one for which L0 = 0.5.

VI. Data augmentation with non-extreme responses.

The belief in test relevancy can be expressed in terms of additional artificial responses (Jannarone et al., 1990). For instance, two further responses could be added to every person and item response string: a "1" and a "0". Then no response string can be extreme. If the additional responses are arranged to alternate "01" and "10" then the additional artificial persons and items will have close to 50% success rates, and so have minimal impact on the measurement system. Once the set of measures have been estimated, they can be anchored. Then the augmented data can be dropped, allowing standard errors and fit statistics to be computed from the observed data. If the prior belief is twice as strong, then 4 items can be added. For belief expressed in item fractions, then weights can be used for the artificial items.

VII. The underlying distribution is specified.

If the underlying distribution of, say, persons is specified to be normal (or any other distribution), then measures can be imputed for extreme scores that result in the best fit to that distribution. These measures are constrained to be more extreme than the measures estimated from similar non-extreme response strings.

VIII. Posterior distribution = Prior distribution.

The distribution of the measures estimated from the data is intended to coincide with the distribution of the measures that generated it. This can be used to refine the measure estimates for extreme scores.

After extreme measures are estimated using one of the methods above, the means and standard deviations of the item and person measure distributions are computed. Then data are simulated using the entire set of measures (extreme and non-extreme). From these data, a new set of measures are estimated for non-extreme and extreme scores. The means and S.D.s of these new measures are computed and compared to their previous values. The previous "extreme" measures are adjusted and new means and S.D.s computed which make the two distributions as similar as possible. Further data are simulated from the revised measures and the distributions are again compared. The extreme measures again adjusted to make the distributions coincide. This iterative process continues until no more adjustments are necessary or there is no improvement in distribution coincidence.

 "Least Measurable Distance" Extrapolations for Extreme Score Measures in Logits Approach Number of dichotomous items or polytomous steps, L I. Extreme Score Adjustment 10 25 50 100 R=1/2 R=1/3 R=1/4 (2L-1)/(L-1) (3L-1)/(L-1) (4L-1)/(L-1) 0.75 1.17 1.57 0.71 1.13 1.51 0.70 1.11 1.48 0.70 1.11 1.48 II. Measure Extrapolation LMD lower bound L/(L-1) 1.11 1.04 1.02 1.01 Test Width in Logits 2 4 6 8 Cf2/L, f=(L-1)/L Cf4/L Cf6/L Cf8/L 1.16 1.22 1.37 1.44 1.04 1.08 1.12 1.17 1.02 1.04 1.07 1.10 1.01 1.01 1.04 1.04

Which to choose?

Most of the difference between these approaches is hair-splitting [see Table], but questions to be addressed include:

(a) Are the items dichotomous, polytomous or mixed?
(b) Is the test fixed length or adaptive?
(c) Are there missing data?
(d) What is known about the underlying distributions?
(e) What computational resources are available?
(f) Are the computed extreme measures reasonable?

Choose an extrapolation approach that provides consistently reasonable measures for your data and is easy to explain. Approach I has proved robust and flexible for small samples with missing data and is implemented in WINSTEPS.

A Rule of Thumb

Measures corresponding to extreme scores 0 and L should be no closer to their next integer neighbors 1 and L-1 than the least measurable distance, LMD, between integer neighbors estimated at 1 and L-1. According to Best Test Design (Wright & Stone, 1979, pp. 132, 135, 192-198, 214), when R=1 or L-1,

LMD = Cfw/L > L/R(L-R) > L/(L-1)

From the Table, reasonable values are generally in the range

1.0 MS - MS-1 1.2

A rule of thumb follows:

No extreme score extrapolation can be less than one logit. Extrapolations >1.2 logits require convincing justification.

Standard Errors of Extreme Measures

Extreme measures have indefinite standard errors, but the following provide useful values:

(1) SES > SES-1

(2) SES ≈ SES-1 + SES-1²/2

(3) SES ≈ 1/(Variance of raw score S | MS)
[This is implemented in WINSTEPS]

Benjamin D. Wright

Jannarone R.J., Yu K.F., Laughlin J.E. (1990) Easy Bayes estimation for Rasch-type models. Psychometrika 55, 3, 449-460.

Estimating Rasch measures for extreme scores.Wright B.D. … Rasch Measurement Transactions, 1998, 12:2 p. 632-3.

Rasch Publications
Rasch Measurement Transactions (free, online) Rasch Measurement research papers (free, online) Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Applying the Rasch Model 3rd. Ed., Bond & Fox Best Test Design, Wright & Stone
Rating Scale Analysis, Wright & Masters Introduction to Rasch Measurement, E. Smith & R. Smith Introduction to Many-Facet Rasch Measurement, Thomas Eckes Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. Statistical Analyses for Language Testers, Rita Green
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Journal of Applied Measurement Rasch models for measurement, David Andrich Constructing Measures, Mark Wilson Rasch Analysis in the Human Sciences, Boone, Stave, Yale
in Spanish: Análisis de Rasch para todos, Agustín Tristán Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez

 Forum Rasch Measurement Forum to discuss any Rasch-related topic

Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Oct. 9 - Nov. 6, 2020, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Jan. 22 -Feb. 19, 2021, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
May 21 -June 18, 2021, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 25 - July 23, 2021, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
Aug. 13 - Sept. 10, 2021, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith,Facets), www.statistics.com
June 24 - July 22, 2022, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com