A question which arises when we use computer programs (e.g. BICAL, MSCALE) which use a Maximum Likelihood algorithm (e.g. JMLE UCON) is "When has the procedure converged ?".
The first problem that may arise is that the iterative procedure does not converge at all - the changes in the estimates from iteration to iteration do not get smaller and may even get larger. The estimation equations are usually derived from the likelihood equation for any set of observations according to the Rasch model. Estimates of the parameters in the likelihood equation (the "abilities" and "difficulties") are calculated in an attempt to maximize the likelihood of the observed data. These estimates are then improved once each iteration using a Newton-Raphson method. This suggests new estimates of the parameters which should give an overall likelihood of the observed data greater than that given by the previous set of estimates.
However, in deriving these Newton-Raphson equations, the working assumption is that each parameter estimate is sufficiently independent of all others that maximizing the likelihood for each parameter in turn will lead to maximizing it for all. This sometimes does not work. Further, the mean item difficulty is usually set to zero at each iteration which introduces a further source of perturbation into the estimation of values for extreme items. Consequently, it is not unusual to encounter sets of observations which do not converge satisfactorily using the standard programs, though I have yet to find any data which will not converge using more conservative techniques even though many iterations may be required.
Lack of convergence is an indication that the data do not fit the model well, because there are too many poorly fitting observations. A data set showing lack of convergence can usually be rescued by setting aside for separate study the person or item performances which contain these unexpected responses.
Assuming the estimates do converge, when does one stop the estimation procedure ? Some programs report a number such as ".13 logits" at the end of each iteration, which is the largest absolute change in any estimated logit measurement found when recalculating the estimates based on their previous values. Other report a mean square or average absolute change. Fortunately, with high-speed computers, one can always let the problem run during lunch, or overnight, and obtain estimates which are changing at less than .0001 logits per iteration. But is this necessary ?
Let's consider what is happening behind the scenes. During each iteration the computer calculates the expected score of each person based on the logit measure obtained at the end of the last iteration. Let us say that the estimated measure is 1.0 logits, giving an estimated score of 113.6 on a 200 item test. If the person's actual score was 114, then the estimated and actual scores are less than .5 score points apart. In counting test scores, we can only observe integers, scores such as 113, 114, 115 and so on. Consequently, someone whose true ability corresponds to a score of 114.2 cannot have this observed but only the nearest integral performance, 114. The range of abilities between 113.5 and 114.5 would be expected to produce a score of 114. Now our current estimated measure has already produced an estimated score of 113.6, so we may, in fact, already have the best estimate we can get for the person under observation. Thus we could say that any set of estimates which give estimated scores within half a score point of the observed scores must be "correct" estimates for the data set.
You may be convinced by this argument, but be using a computer program that does not report the biggest gap between observed and estimated scores. If your program reports the biggest change in logit estimates made during an iteration, you need to know the amount of logit change per iteration which corresponds to a score gap of half a score point. The last person or item measure to converge is generally that one which has the most extreme set of observations. If the observations are all successes or all failures, then that estimate tends to infinity and the maximum likelihood estimate cannot be obtained. However, an estimate corresponding to half a score point difference between observed and expected scores can be obtained.
As a guide to when convergence to within half a score point has been obtained, I have constructed the accompanying table. This table is for a dichotomous test of n items. (To use this table for a rating scale situation, multiply n by the number of categories less one, e.g. for a 4 category rating scale test of 25 items, look up the entry corresponding to 75 items.) Now note, either from experience, or from a previous estimation, the logit distance of the most extreme person measure from the mean of the items, and the most extreme item from the mean of the people. Look both of these up on the table. The smaller number will give the approximate size of the biggest logit change per iteration you can expect when the greatest difference between any observed score and the corresponding expected score is less than 0.5 score points.
For example, 25 persons take a test of 100 items. The mean of the items has been set at zero logits, with a range from -2 logits to +3 logits. The test is somewhat easy so that the mean of the people is 1 logit, with a range from 0 logits to 2 logits. Then for the most extreme person we look up 2 logits for 100 items = 0.05 logits change per iteration. For the most extreme item, 3 logits for 25 persons = 0.44 logits. Thus, when the maximum change per iteration is less than 0.05 logits, all expected scores are within 0.5 score points of their observed scores.
The algorithm used to calculate the chart assumes that persons and items are uniformly distributed about their common mean value for a distance up to the extreme distance.
Then, considering the most extreme item of difficulty D,
maximum change in item difficulty estimate per iteration = 0.5/sum((P_{n}*(1-P_{n}))
where n = 1,N the number of observations
and P_{n} = exp(B_{n} - D)/(1 + exp(B_{n} - D))
where B_{n} is uniformly distributed over the common range of the persons and items.
Logit change per iteration corresponding to less than 0.5 score points difference between observed and expected scores
Number Logit distance of of most extreme person from mean of items observations or most extreme item from mean of persons ---------------------------------------------------------- Half-range .5 1 1.5 2 3 4 5 ---------------------------------------------------------- n = 5 .4 .4 .5 .5 .7 1.0 1.2 10 .2 .2 .2 .2 .3 .4 .5 25 .08 .08 .09 .1 .1 .1 .2 50 .04 .04 .04 .05 .06 .08 .1 75 .02 .02 .03 .03 .04 .05 .06 100 .02 .02 .02 .02 .03 .04 .05 250 .008 .008 .009 .01 .01 .01 .02 500 .004 .004 .004 .005 .006 .008 .01 750 .002 .002 .003 .003 .004 .005 .006 1000 .002 .002 .002 .002 .003 .004 .005 2000 .001 .001 .001 .001 .001 .002 .002 4000 .0005 .0005 .0005 .0006 .0008 .001 .001 10000 .0002 .0002 .0002 .0002 .0003 .0004 .0005
Rasch estimation: iteration and convergence. Linacre JM. … Rasch Measurement Transactions, 1987, 1:1 p.7-8
Rasch Publications | ||||
---|---|---|---|---|
Rasch Measurement Transactions (free, online) | Rasch Measurement research papers (free, online) | Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch | Applying the Rasch Model 3rd. Ed., Bond & Fox | Best Test Design, Wright & Stone |
Rating Scale Analysis, Wright & Masters | Introduction to Rasch Measurement, E. Smith & R. Smith | Introduction to Many-Facet Rasch Measurement, Thomas Eckes | Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. | Statistical Analyses for Language Testers, Rita Green |
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar | Journal of Applied Measurement | Rasch models for measurement, David Andrich | Constructing Measures, Mark Wilson | Rasch Analysis in the Human Sciences, Boone, Stave, Yale |
in Spanish: | Análisis de Rasch para todos, Agustín Tristán | Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez |
Forum | Rasch Measurement Forum to discuss any Rasch-related topic |
Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement
Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.
Coming Rasch-related Events | |
---|---|
Apr. 14-17, 2020, Tue.-Fri. | International Objective Measurement Workshop (IOMW), University of California, Berkeley, https://www.iomw.org/ |
May 22 - June 19, 2020, Fri.-Fri. | On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
June 26 - July 24, 2020, Fri.-Fri. | On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com |
June 29 - July 1, 2020, Mon.-Wed. | Measurement at the Crossroads 2020, Milan, Italy , https://convegni.unicatt.it/mac-home |
July - November, 2020 | On-line course: An Introduction to Rasch Measurement Theory and RUMM2030Plus (Andrich & Marais), http://www.education.uwa.edu.au/ppl/courses |
July 1 - July 3, 2020, Wed.-Fri. | International Measurement Confederation (IMEKO) Joint Symposium, Warsaw, Poland, http://www.imeko-warsaw-2020.org/ |
Aug. 7 - Sept. 4, 2020, Fri.-Fri. | On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com |
Oct. 9 - Nov. 6, 2020, Fri.-Fri. | On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
June 25 - July 23, 2021, Fri.-Fri. | On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com |
The URL of this page is www.rasch.org/rmt/rmt11b.htm
Website: www.rasch.org/rmt/contents.htm