Measures for extreme scores

Extreme scores (zero and perfect scores) imply extreme, but indefinitely located, measures. Indefinite measures are awkward to report and difficult to use in further analyses, such as computing means and standard deviations. What can be done to give these measures definite values? Here are several approaches. They are all based on the Bayesian idea that we would not have administered the test to the person, or included the item on the test, unless we thought that the person or item was relevant. Consequently, an extreme score implies a measure only slightly out of the measurement range of the test, not a measure a considerable distance away.

Raw scores are observed on an ordinal scale. Fractional raw scores are unobservable. Consequently any measure that yields an expected raw score closer than 0.5 score points to an extreme score is expected to be observed as producing an extreme score. Consequently the most central measure for a zero score is that corresponding to 0.5 score points, and for a perfect score is that corresponding to a perfect score less 0.5 score points. After the measures for non-extreme item and person have been estimated in the usual way, the measures corresponding to these almost extreme raw scores can be estimated (RMT 10:2 p. 499). Other commonly-used extreme score corrections are 1/3 and 1/4.

From raw score R a measure M_R and its standard error SE_R can be estimated. The measure for score R+1 is approximately M_R + SE_R²(see Wright & Stone, BTD, 1979, 192-5). Thus the measure for an extreme score can be estimated from the measure for a score 1 point less extreme [see Table]. If S is the perfect score, then M_S ≈ M_S-1 + SE_S-1².

Only measures statistically significantly more extreme than non-extreme measures would provoke separate consideration. Thus a measure M_S = M_S-1 + 1.65*SE_S-1 is the most central that would cause the rejection of the hypothesis, at the .05 level, that M_S and M_S-1 are statistical equivalent.

This can be achieved by curve-fitting. For instance, a quadratic fit of M_S to M_S-1, M_S-2 and M_S-3 yields M_S = 3*M_S-1 - 3*M_S-2 + M_S-3.

The likelihood of each possible response string for a particular measure can be computed as L_R = P_nix where x = R, the raw score corresponding to that response string. If L₀>0.5 for a measure, then that measure will probably produce a response string with a raw score of zero. If L₀<0.5, then a non-zero score will probably be observed. The most central measure likely to produce an extreme measure is the one for which L₀ = 0.5.

The belief in test relevancy can be expressed in terms of additional artificial responses (Jannarone et al., 1990). For instance, two further responses could be added to every person and item response string: a "1" and a "0". Then no response string can be extreme. If the additional responses are arranged to alternate "01" and "10" then the additional artificial persons and items will have close to 50% success rates, and so have minimal impact on the measurement system. Once the set of measures have been estimated, they can be anchored. Then the augmented data can be dropped, allowing standard errors and fit statistics to be computed from the observed data. If the prior belief is twice as strong, then 4 items can be added. For belief expressed in item fractions, then weights can be used for the artificial items.

If the underlying distribution of, say, persons is specified to be normal (or any other distribution), then measures can be imputed for extreme scores that result in the best fit to that distribution. These measures are constrained to be more extreme than the measures estimated from similar non-extreme response strings.

The distribution of the measures estimated from the data is intended to coincide with the distribution of the measures that generated it. This can be used to refine the measure estimates for extreme scores.

After extreme measures are estimated using one of the methods above, the means and standard deviations of the item and person measure distributions are computed. Then data are simulated using the entire set of measures (extreme and non-extreme). From these data, a new set of measures are estimated for non-extreme and extreme scores. The means and S.D.s of these new measures are computed and compared to their previous values. The previous "extreme" measures are adjusted and new means and S.D.s computed which make the two distributions as similar as possible. Further data are simulated from the revised measures and the distributions are again compared. The extreme measures again adjusted to make the distributions coincide. This iterative process continues until no more adjustments are necessary or there is no improvement in distribution coincidence.

"Least Measurable Distance" Extrapolations for Extreme Score Measures in Logits
Approach			Number of dichotomous items or polytomous steps, L
I. Extreme Score Adjustment			10	25	50	100
R=1/2 R=1/3 R=1/4	(2L-1)/(L-1) (3L-1)/(L-1) (4L-1)/(L-1)		0.75 1.17 1.57	0.71 1.13 1.51	0.70 1.11 1.48	0.70 1.11 1.48
II. Measure Extrapolation
LMD lower bound		L/(L-1)	1.11	1.04	1.02	1.01
Test Width in Logits	2 4 6 8	C_f2/L, f=(L-1)/L C_f4/L C_f6/L C_f8/L	1.16 1.22 1.37 1.44	1.04 1.08 1.12 1.17	1.02 1.04 1.07 1.10	1.01 1.01 1.04 1.04

Most of the difference between these approaches is hair-splitting [see Table], but questions to be addressed include:

(a) Are the items dichotomous, polytomous or mixed?
(b) Is the test fixed length or adaptive?
(c) Are there missing data?
(d) What is known about the underlying distributions?
(e) What computational resources are available?
(f) Are the computed extreme measures reasonable?

Choose an extrapolation approach that provides consistently reasonable measures for your data and is easy to explain. Approach I has proved robust and flexible for small samples with missing data and is implemented in WINSTEPS.

Measures corresponding to extreme scores 0 and L should be no closer to their next integer neighbors 1 and L-1 than the least measurable distance, LMD, between integer neighbors estimated at 1 and L-1. According to Best Test Design (Wright & Stone, 1979, pp. 132, 135, 192-198, 214), when R=1 or L-1,

No extreme score extrapolation can be less than one logit. Extrapolations >1.2 logits require convincing justification.

Extreme measures have indefinite standard errors, but the following provide useful values:

(3) SE_S ≈ 1/(Variance of raw score S | M_S)
[This is implemented in WINSTEPS]

Jannarone R.J., Yu K.F., Laughlin J.E. (1990) Easy Bayes estimation for Rasch-type models. Psychometrika 55, 3, 449-460.

Estimating Rasch measures for extreme scores.Wright B.D. … Rasch Measurement Transactions, 1998, 12:2 p. 632-3.

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn. George Engelhard, Jr. & Jue Wang	Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Apr. 21 - 22, 2025, Mon.-Tue.	International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025	On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025	On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
July 21 - 23, 2025, Mon.-Wed.	Pacific Rim Objective Measurement Symposium (PROMS) 2025, www.proms2025.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com

Estimating Rasch Measures for Extreme Scores