Estimation: Iteration and Convergence

A question which arises when we use computer programs (e.g. BICAL, MSCALE) which use a Maximum Likelihood algorithm (e.g. JMLE UCON) is "When has the procedure converged ?".

The first problem that may arise is that the iterative procedure does not converge at all - the changes in the estimates from iteration to iteration do not get smaller and may even get larger. The estimation equations are usually derived from the likelihood equation for any set of observations according to the Rasch model. Estimates of the parameters in the likelihood equation (the "abilities" and "difficulties") are calculated in an attempt to maximize the likelihood of the observed data. These estimates are then improved once each iteration using a Newton-Raphson method. This suggests new estimates of the parameters which should give an overall likelihood of the observed data greater than that given by the previous set of estimates.

However, in deriving these Newton-Raphson equations, the working assumption is that each parameter estimate is sufficiently independent of all others that maximizing the likelihood for each parameter in turn will lead to maximizing it for all. This sometimes does not work. Further, the mean item difficulty is usually set to zero at each iteration which introduces a further source of perturbation into the estimation of values for extreme items. Consequently, it is not unusual to encounter sets of observations which do not converge satisfactorily using the standard programs, though I have yet to find any data which will not converge using more conservative techniques even though many iterations may be required.

Lack of convergence is an indication that the data do not fit the model well, because there are too many poorly fitting observations. A data set showing lack of convergence can usually be rescued by setting aside for separate study the person or item performances which contain these unexpected responses.

Assuming the estimates do converge, when does one stop the estimation procedure ? Some programs report a number such as ".13 logits" at the end of each iteration, which is the largest absolute change in any estimated logit measurement found when recalculating the estimates based on their previous values. Other report a mean square or average absolute change. Fortunately, with high-speed computers, one can always let the problem run during lunch, or overnight, and obtain estimates which are changing at less than .0001 logits per iteration. But is this necessary ?

Let's consider what is happening behind the scenes. During each iteration the computer calculates the expected score of each person based on the logit measure obtained at the end of the last iteration. Let us say that the estimated measure is 1.0 logits, giving an estimated score of 113.6 on a 200 item test. If the person's actual score was 114, then the estimated and actual scores are less than .5 score points apart. In counting test scores, we can only observe integers, scores such as 113, 114, 115 and so on. Consequently, someone whose true ability corresponds to a score of 114.2 cannot have this observed but only the nearest integral performance, 114. The range of abilities between 113.5 and 114.5 would be expected to produce a score of 114. Now our current estimated measure has already produced an estimated score of 113.6, so we may, in fact, already have the best estimate we can get for the person under observation. Thus we could say that any set of estimates which give estimated scores within half a score point of the observed scores must be "correct" estimates for the data set.

You may be convinced by this argument, but be using a computer program that does not report the biggest gap between observed and estimated scores. If your program reports the biggest change in logit estimates made during an iteration, you need to know the amount of logit change per iteration which corresponds to a score gap of half a score point. The last person or item measure to converge is generally that one which has the most extreme set of observations. If the observations are all successes or all failures, then that estimate tends to infinity and the maximum likelihood estimate cannot be obtained. However, an estimate corresponding to half a score point difference between observed and expected scores can be obtained.

As a guide to when convergence to within half a score point has been obtained, I have constructed the accompanying table. This table is for a dichotomous test of n items. (To use this table for a rating scale situation, multiply n by the number of categories less one, e.g. for a 4 category rating scale test of 25 items, look up the entry corresponding to 75 items.) Now note, either from experience, or from a previous estimation, the logit distance of the most extreme person measure from the mean of the items, and the most extreme item from the mean of the people. Look both of these up on the table. The smaller number will give the approximate size of the biggest logit change per iteration you can expect when the greatest difference between any observed score and the corresponding expected score is less than 0.5 score points.

For example, 25 persons take a test of 100 items. The mean of the items has been set at zero logits, with a range from -2 logits to +3 logits. The test is somewhat easy so that the mean of the people is 1 logit, with a range from 0 logits to 2 logits. Then for the most extreme person we look up 2 logits for 100 items = 0.05 logits change per iteration. For the most extreme item, 3 logits for 25 persons = 0.44 logits. Thus, when the maximum change per iteration is less than 0.05 logits, all expected scores are within 0.5 score points of their observed scores.

The algorithm used to calculate the chart assumes that persons and items are uniformly distributed about their common mean value for a distance up to the extreme distance.

Then, considering the most extreme item of difficulty D, maximum change in item difficulty estimate per iteration = 0.5/sum((P_n*(1-P_n)) where n = 1,N the number of observations
and P_n = exp(B_n - D)/(1 + exp(B_n - D))
where B_n is uniformly distributed over the common range of the persons and items.

Logit change per iteration corresponding to less than 0.5 score points difference between observed and expected scores

Rasch estimation: iteration and convergence. Linacre JM. … Rasch Measurement Transactions, 1987, 1:1 p.7-8

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn. George Engelhard, Jr. & Jue Wang	Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Apr. 21 - 22, 2025, Mon.-Tue.	International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025	On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025	On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
July 21 - 23, 2025, Mon.-Wed.	Pacific Rim Objective Measurement Symposium (PROMS) 2025, www.proms2025.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com

Rasch Estimation: Iteration and Convergence