The correspondence between a data set and a statistical model defines the notion of `fit'. In psychometric models of the Rasch class, all aspects of fit cannot be judged from a single statistic. On the other hand, all that can be learned about the fit of data to a Rasch model must be contained in the residuals or magnitudes of departures from the model. Various tests based directly on these residuals are reviewed.
INTRODUCTION
A key aspect of science development is the construction and verification of correspondence between observed data and an abstract model designed to represent the data. Modern psychometrics employs statistical models (taking the form of probability distributions), to describe, more or less, the results of mental tests. The degree of correspondence between empirical observations and those predicted through the operation of the model is generally known as `fit between the data and the model'.
The utility of the results of measurement in the social sciences rests to a large extent on a judicious choice of model. An investigation of the degree of fit is essential since conclusions are derived through the model properties and specifications and not directly from the particular observed data. For example, in the analysis of items within a test, it is the item difficulty parameter that is of major importance, not the specific set of observations which might lead to one particular estimate of that parameter. The purpose of using a model is thus to replace any particular data set with the more general model. The confidence in this replacement rests in part on the psychometric tests of fit.
Models are not expected to predict outcomes perfectly according to every conceivable criterion of accuracy. It is the degree of correspondence between data and model which is tolerable in terms of utility for a given purpose, which determines ultimately the extent of `fit of data to the model'. Thus any particular test of fit between a data set and the model is never complete.
To put it another way, the decision actually to use measures depends not only on their quality according to psychometric criteria, but also on non-psychometric factors such as economics, time restrictions and politics. This latter point is emphasized since recent articles (Gustafsson (1980), van den Wollenberg (1981)), appear to ignore this fundamental issue when reporting fit or when they criticize the statistics proposed to indicate degree of fit.
An aim of the following discussion is to draw a balance between the two extreme approaches to the relationship of test data to statistical models which are supposed to describe those data. On the one hand there is the point of view which sees the collected data as sacrosanct. Hence if a given data set does not fit the originally proposed model, one is expected to change or modify the model, most often by including further parameters to account for patterns in the data, or by deleting some parameters because they are superfluous.
On the other hand, there is the point of view that the model is `perfect'; it is usually argued on logical and/or measurement principles that the nature of measurement should comply with certain fundamental axioms and from this stance one establishes the necessity of a specific model. Since the model is argued on these grounds, it is the data which must be manipulated if there is any evidence of lack of fit. Most practitioners work somewhere between these two extremes, although those who advocate Rasch models tend to work towards the latter framework whereas those who work with the models of Lord and Birnbaum (1980), tend to work from the former framework.
It is worth noting that when it is the data which must be edited in some way, tradition has it that items and not persons need attention in order to enhance fit. Recent arguments center on a more symmetrical approach to fit via the analysis of misfitting persons as well as misfitting items. There is a sense in which the adoption of these psychometric models by mathematical statisticians has resulted in a lessening of importance attached to the work contributed by practicing psychometricians. It is for these reasons and the fact that an increasing variety of `tests' are being proposed for at least one large class of models that this article has been written; the discussion has been restricted to the class of models known as Rasch models since little work on fit has been forthcoming with respect to other models. These points are illustrated by the various tests of fit described in this article.
THE STATISTICAL ASPECTS OF FIT
It is informative to describe in some detail the way in which statisticians generally test the fit between data and discrete probability distributions from primarily a statistical point of view. The following description is necessarily simplified but does capture the essential logic. In the first instance, the probability distribution (i.e., the model), must be fully specified; that is, the algebraic form, the parameter(s) and the sample space of possible events must be stated. 'Fit' is the correspondence, then, for a given set of real data of sample size N, between the observed frequencies (of each element of the sample space) and those frequencies predicted by the particular model. There are various ways of calculating the 'correspondence'.
TABLE I | ||
---|---|---|
x | n_{x} | E[n_{x}] |
0 1 2 3 4 4+ |
109 65 22 3 1 0 |
108.7 66.3 20.2 4.1 0.7 0.0 |
λ (hat) = 0.61 |
The basics of fit from the common statistical viewpoint will be illustrated by reference to the often quoted data set arising from records kept of the number of deaths by horse-kick in the Prussian army and analyzed via the discrete Poisson distribution by Bortkiewicz (1898). In these data, there were ten army corps which were sampled each over 20 years, giving N = 200. The relevant data and calculations for fit appear in Table I, where x is the number of deaths and n_{x}, is the number of army corps with x number of deaths.
The Poisson model states that:
(1) |
where the parameter of the model, λ, is estimated (maximum likelihood or otherwise) from the data. The estimated probability of each event, p_{x} (hat), can be found by substituting λ (hat) for λ. Then an estimate of the expected value is found from E[n_{x}] (hat) = Np_{x} (hat) [where (hat) means "estimated from the data"]. The observed and expected n_{x} columns may be compared for correspondence, usually by some form of Chi-square statistic. It is also possible to calculate raw residuals for these data, that is, the difference between a single specified observation and its expected value. This is given by
X - E[X] = X - λ | (2) |
An estimated residual is found by substituting λ (hat) for λ. There are as many residuals as there are observations, in this case, 200. For example, if in 1880 in army corps A the number of deaths was two, then the residual for that observation would be 2 - λ (hat) = 2 - 0.61 = 1.39. Residuals for these data range from -0.61 to 3.39.
In order to illustrate a point which arises frequently in the discussion of fit to psychometric models, we note that there were four deaths in one army core. On the basis of these data the Poisson model, and the sample size of 200, it is not surprising that there was one army corps with four deaths in one year. It certainly would be concluded that the four deaths are in accord with the model and in fact the standard test of fit applied to these data results in an affirmation of fit which is so good that some statisticians have raised questions of 'over-fit'. However, horse deaths are rare events, and to have four of them in one year in one army corps might suggest an examination of that army corps. Perhaps a new captain was somehow implicated, or all four people were killed by the same horse or some other empirical explanation may be hypothesized why this particular corps had four deaths.
If a global test of fit is the limit of investigations, failure to understand important aspects of the data may occur; the purpose of fit is not just to make a simple 'yes' or 'no' declaration that the model and data accord, but to have a greater understanding of the way in which the data arose. This perspective has led practitioners to think in terms of 'psychological' fit in addition to statistical fit. These notions will be elaborated in the next section.
FIT AND THE RASCH MODEL
The fit-logic from a statistical point of view is now applied to the model of Rasch (1960, 1980). The probability distribution has the algebraic form
P{X;β_{v},δ_{i}} = exp[(β_{v} - δ_{i})X/{1 + exp[β_{v} - δ_{i}]}, X = 0,1 | (3) |
in which there are two parameters and two points in the sample space. Because there are only two outcomes, this distribution describes a Bernoulli random variable. To estimate the two parameters, it is necessary to have replications. It is impossible, however, to replicate the observations without introducing certain types of dependency conditions so that the standard statistical rules for IID [identical independently distributed] random variables would not apply. Thus each person v generally answers more than one item to provide 'pseudoreplications'; so for person v there is a compounding of Bernoulli random variables, but because each item has a different difficulty parameter, the distributions are not identically distributed.
Despite these complications a genuine probability distribution arises if, with L items, the probability of the response pattern, (X_{1}, . . ., X_{L}.), is derived. This probability, after suitable algebraic manipulation, is written as
(4) |
In practice, at least for an item calibration exercise, an even more complicated version of the model is dealt with, since the probability distributions are simultaneously replicated over a sample of persons. In practice, the probability distribution is stated as if the number of items were fixed at L. This permits an interpretation whereby N (the number of persons tested) has the same connotation as it did with the horse-kick data. Hence
(5) |
with the sample space of possible events equal to the 2^{L} potential patterns of responses. The sample size is N and there are N + L parameters (questions of identifiability and independence of parameters are ignored for the present).
The data in Table II below are fictitious but serve to highlight some of the problems regarding fit. They describe the response patterns of N = 300 persons taking an L = 3 item `test'. It is assumed that the item parameters are known with δ_{1} = - 1.0, δ_{2} = 0.0, δ_{3} = 1.0, and that all 300 subjects have the same ability β_{v} = -1.0.
TABLE II ILLUSTRATIVE RASCH DATA | ||
---|---|---|
(X_{v}) | N_{x} | E[N_{x}] |
(100) (000) (110) (010) (101) (111) (001) (011) |
153 58 54 19 9 3 3 1 |
151.89 56.16 56.16 20.79 7.98 2.97 2.97 1.08 |
300 | 300 |
It is possible to provide a residual in the form of a vector of observed responses versus predicted responses. Since
E[(X_{v})] = (E[X_{vi}]) = (p_{v1}, ..., p_{vL}) | (6) |
(X_{1}, ..., X_{L}) is compared with (p_{1}, ..., p_{L}), to obtain a vector of residuals for each observation. For example, 153 persons scored (100); that is, these 153 persons had only the easiest item correct. The residual vector for any one of these persons (all of them have the same ability of -1.0), is
(1 0 0) - (.73 .27 .05) = (.27 -.27 -.05).
Now consider the three persons with the pattern (001). These three persons had only the hardest item correct, and the residual for each of these three is
(0 0 1) - (.73 .27 .05) = (-.73 -.27 .95).
The point illustrated in the above example has been somewhat labored in order to make a distinction between attention to the psychological (or process) model and the statistical model; some events acceptable under the statistical model are psychologically questionable and a thorough analysis of the data would warrant an investigation of the reasons why even one person had such a peculiar answer pattern since ultimately the question will be asked whether or not this person had been measured on the construct of interest.
TESTS OF FIT FOR THE RASCH MODEL
In the succeeding sections the various suggestions that have been made with respect to fit of data to the Rasch model are discussed. Although an exact chronological order will not be observed, some attempt will be made to demonstrate the historical development of fit idea. Since the first papers dealing with this topic were published there have been disagreements regarding the 'correct' degrees of freedom, the extent of bias in the fit statistics, the use of conditional (CMLE) or unconditional (JMLE) probabilities, and so on. Such debates on statistical criteria for fit are still popular in both the published literature and at major psychometric conferences.
It is the contention of this writer, however, that for the analysis of fit of most sets of real data (in which at least 12 items form a test), the arguments for one statistic versus another lose their impact. For example, van den Wollenberg (1981) claims that the Wright and Panchepakesan [WP] statistic (1969) is 'heavily at fault'. However, for data from tests of 12 or more items, van den Wollenberg's 'new' statistic, Q_{1}, which tests the same violation of the model as the Wright and Panchepakesan statistic, is indistinguishable from it. Similar features can be demonstrated among other competing statistics.
It seems more profitable, therefore, to concentrate on procedures which genuinely differ in principle from one another.
Rasch (1960, 1980) used the term 'control of the model' to describe what has been called 'fit of the model'. In his mathematical development of the model via conditional probabilities, the pertinent probability distribution is the conditional distribution of response pattern conditioned on the sufficient statistic (which is the raw score arising from the pattern). Obviously many different patterns lead to the same raw score. The probability of each of these patterns, conditional on that raw score r_{v} is given by
(7) |
in which the γ_{rv} are elementary symmetric functions of the item parameters, (δ). A double conditioning of the total data matrix leads to
p{((X_{vi}))|(r), (S)} = 1/C | (8) |
in which (S) is the vector of item counts, (r) is the vector of N raw scores and C is a combinatorial number (the number of different 0/1 data sets which could produce the marginals (S) and (r)). Rasch says that this probability serves as a 'basis for parameter-free controls of the model'; since no estimation of parameters is required, exact tests of fit (in the sense of R.A. Fisher), are forthcoming. Rasch himself however, was quick to realize the near impossibility of determining C and to date no-one has been able to follow up those suggestions. [With modern computer-power this could be done, but it would serve no real purpose.]
As a result of the practical problems with the parameter-free tests, Rasch suggested approximations based on the observed proportions S_{gi}/N_{g} of the number of persons in score group g who had item i correct, such that the ratio S_{gi}/(N_{g} - S_{gi}) could be used in score group g to estimate exp(β_{g} - δ_{i}). Rasch suggested that the proportions be used as a basis for a test of fit by looking at the G = L - 1 different estimates. He suggests plots both for groups of items (grouped by their difficulties) and by persons (grouped by their raw scores). Wright (1967) elaborated on these plots and gave added insight into their utility for determining fit. There is a very real sense in which all tests of fit to the Rasch model which have been proposed in the last 20 years are simply variations of these original suggestions.
In 1969 Wright and Panchepakesan described a more formal test of fit based on the notions of splitting people into exclusive groups. Repeated reference and use of this statistic has led to its naming as the WP statistic. Persons are split into G groups on the basis of their raw scores and the observed number in each group with each item correct, S_{gi}, is compared to the expected number, the latter arrived at via the model after all item and person parameters have been suitably estimated. This comparison between observed and expected statistics may be accumulated over items to produce a total WP statistic. It is noted that the WP procedure uses unconditional (JMLE) probabilities for determining p_{vi}.
(9) |
Formal significance testing may be carried out by noting the approximate Chi-square distribution of WP in practical testing situations. As noted earlier, van den Wollenberg's Q_{1} statistic (1981) is equivalent to WP when more than 12 items are involved.
In the early 'seventies a modification of the WP approach was proposed by two groups of people on either side of the Pacific. In Austria, Fischer and Scheiblechner, and in the USA, Wright and colleagues, suggested that instead of predicting expected frequencies via the model, the item difficulties should actually be re-estimated in each group upon which the split had been made. The program MLTBIN of Andrich (1975) uses a median split although there is no logical barrier to a split based on G groups. The test of fit is an application of a statistical test of the homogeneity of a number of estimates of a model parameter and is described fully in Rao (1973). The statistic admits of an interpretation both for each item and for the collection of all items as a whole. The statistic H_{i} is given by
(10) |
in which the pooled estimate of the item difficulty is
(11) |
and its variance is
(12) |
H_{i}, which is distributed as Chi-square on G-1 df, may be accumulated over i to form a global test. Asymptotically this test has a similar distribution to that of WP.
A likelihood-ratio test was devised by Andersen (1973); it used as its guiding principle the logic outlined for the Fischer/Scheiblechner approach. Instead of adopting the Rao test, Andersen formed the (conditional) likelihood of the data based on the overall item estimates and also the (conditional) likelihood of the data for each subgroup. Thus
(13) |
was shown to be Chi-square, in which
(14) |
This test follows in the spirit of maximum likelihood estimation; it is noted that a similar LR test could be devised on the basis of the unconditional (JMLE) likelihoods and some recent developments of Rost (1982, this issue), demonstrates the power and utility of such an approach. However, because there is no sensible partitioning of the Chi-square statistic LR tests of any description give us no information about aberrant items. For this reason they provide little practical advantage. We may further note that the LR test is asymptotically equivalent to the WP statistic so in one sense WP, Q_{1}, H and LR will all lead to similar conclusions about a given data set - whether instituted conditionally or unconditionally.
In a slightly different context, Leunbach (1976) devised various tests of the hypothesis that two mental tests measure the same variable. The tests adhere to Rasch's principles in that they arise out of a conditional argument and lead to a probability distribution of the general form
(15) |
in which (δ^{(1)}) and (δ^{(2)}) are the two sets of item estimates and for which the sufficient statistics are marginals of the number of persons, n_{r1r2}, with various combinations of raw scores (on each of the tests and in which the actual person parameters have been eliminated as usual by the conditioning). Since relatively extensive data sets are required to ensure that no n_{r1r2} are zero, the tests appear to have limited practical application in their present form.
Another innovation from 1976 may be found in the dissertation of Mead and later resurrected by Divgi (1981). It is based on well-known principles of simple linear regression and indirectly provides an estimate of the slope of an item's characteristic curve. Some psychometricians refer to this property as the item's discrimination and actually parameterize it in their models. Working from general linear model theory, Mead postulated that a residual for person v on item i, written
(16) |
may be further explicated in terms of the linear form
y_{vi} = a_{0i} + a_{1i}b_{v} + a_{2i}b_{v}^{2} , | (17) |
where (i) a_{0i} is zero if the group of persons involved is actually the calibration group (otherwise a_{0i} acts as a `difficulty shift'),
(ii) a_{1i}, the linear coefficient, is the index of item discrimination, and
(iii) a_{2i}, the quadratic coefficient, relates the extent of `guessing' or 'indifference'.
A formal test of fit would proceed as an analysis of variance with the nullity of a_{2i} considered first, and upon acceptance of that hypothesis, the nullity of a_{1i} also investigated. Fit to Rasch model is claimed when the latter hypothesis is accepted also. It should be realized that these tests of fit, directed as they are to quite specific hypotheses, are relatively powerful when compared with the more global tests considered previously (there are far fewer degrees of freedom to account for), but on the other hand are likely to be less powerful for detecting departures arising from factors unrelated to guessing and varying discriminations. Perhaps Mead's major contribution was his application of an identical argument to the derivation of a test of `person fit', based on the same residuals y_{vi}, in which case
y_{vi} = a_{1v}d_{i} + a_{2v}d_{i}^{2} , | (18) |
Hence the test of a person's `linear fit' is contingent upon the nullity of a_{2v}.
The term `person fit' has been coined to describe those Rasch analysis activities which focus attention on aberrant patterns of responses for individuals taking a test. Aberrant patterns are those of very small probability, even though they might be `expected' in large enough samples according to the specification of a probability model. It is one thing to note that patterns of small probability will occur; it is quite another to realize that the patterns are created by persons taking tests and that a responsibility exists to investigate these situations since it is difficult to believe that such persons have been measured on the variable. A description of person-fit in practice is to be found in Wright and Stone (1979).
Most person-fit analyses calculate the probability of each person's response pattern and flag those of very small probability. Additional information is available for diagnosis if both observed and expected patterns are displayed in the analysis. For example, it is difficult for the psychometrician to believe, and even harder to explain, the observable fact that a person whose raw score is 2 obtained that score by answering correctly the two most difficult of 30 items-and still argue that the score of 2 represents as valid a measure on the variable as does the 2 of the person whose correct items are the two easiest on the test.
In practice the person-fit statistics used to highlight the misfit are some variations of the probability of the response pattern; most often they involve a summation over items of the person's standardized residuals and the conversion of this sum to a Chi-square or mean-square statistic with suitable distribution properties. For example, a person fit statistic used by Andrich (1980) has the following form:
(19) | |
where (i) is the degrees of freedom, and | |
(ii) is the sum of squared residuals. |
The purpose of the logarithmic transformation is to stabilize the variance of the statistic T so that it has an approximate normal distribution.
Some interesting variations on existing fit statistics and the exposition of a new statistic have been devised by two Europeans, van den Wollenberg (1981) and Molenaar (1981). Their arguments derive from the original work of Rasch and unpublished work of Martin-Löf. In order to understand the rationale of these tests, one needs to have a grasp of the differences between conditional (CMLE) and unconditional (JMLE) Rasch analysis. When Wright and Panchepakesan devised their WP statistic, they used the expected value of a response by person v to item i the unconditional probability
P{X_{vi}; β_{v},δ_{i}} = p_{vi} = exp[(β_{v} - δ_{i})X/{1 + exp[β_{v} - δ_{i}]} | (20) |
Since responses by person v to a set of L items are statistically independent, the covariances, Cov[X_{vi}, X_{vj}] are zero, and thus play no part in the tests of fit. Most variations on the basic tests of fit, however, as devised by European psychometricians, rely on the conditional probability of the response, given the raw score r_{v}. In this case the expected value has the form
π_{ri} = exp[-δ_{i}]γ_{r- 1,i}/γ_{r} , | (21) |
and does not involve person parameters as does the unconditional form shown above. (γ_{r-1,i} and γ_{r} are elementary symmetric functions of the δ's only). Although even Rasch found it difficult to write out explicitly the bivariate distribution of X_{vi} and X_{vj} (given r_{v}), it is not difficult to show that the covariance is given by
(22) |
where γ_{r-2,ij} is also a symmetric function in all δ's except δ_{1} and δ_{j}. Clearly, the conditional responses are not independent and any test of fit should take this into account if the dependence is likely to play a part in the ultimate fit decisions.
Martin-Löf provided a formal test of fit which incorporates the covariances. His statistic, in matrix notation, is
(23) |
where δ'_{g} (hat) is the transpose of the g x 1 vector of difficulty estimates in group g, and V_{g}^{- 1} is the inverse of the covariance matrix of these estimates. Van den Wollenberg has also demonstrated that when all item estimates are considered equal (an equivalent items test), T approximates the WP statistic and is in fact algebraically identical to van den Wollenberg's conditional version of WP, the Q_{1} statistic. The most recent effort of van den Wollenberg and Molenaar (1981) has been to effect a compromise between the excessive computations of T and the approximate nature of Q_{1}. The new statistic, Q_{2}, builds upon 'second-order' frequencies and appears to be quite powerful as a test of dimensionality. For the group with score r, observed 2 x 2 tables are constructed as follows:
ITEM i | ITEM j | ||
S_{rij} | S_{ri~j} | S_{ri} | |
S_{r~ij} | S_{r~i~j} | S_{r~i} | |
S_{rj} | S_{r~j} | n_{r} |
where S_{ri~j}, for example, means the number of people with score r who have item i correct and item j incorrect. These observed tables are to be compared with expected 2 x 2 tables in which the entries are obtained from
(24) |
and in which item estimates have to be obtained from each score group r. The statistic Q_{2}, summed over all score groups and all item combinations has an approximate Chi-square distribution; little evidence is available concerning its practicability with respect to real data.
A recent contribution of Molenaar (1981) has been the introduction of what he terms `splitter' items to test unidimensionality. The sample is split into two subsets, G_{i}^{+} and G_{i}^{-}, of those who answered item i correctly and those who answered it incorrectly. After separate calibrations (in each group), of the remaining items, evidence of multidimensionality would be forthcoming when the items easy for G_{i}^{+} and hard for G_{i}^{-} form one dimension and the reverse set the other dimension. WP (or Q_{1}) would be determined for the two groups and a formal test of fit applied as often as liked to select different `splitters'. It would be informative as well to plot item estimates for G_{i}^{+} and G_{i}^{-}.
CONCLUSION
The most valuable contribution to the area of tests of fit for Rasch models in recent years has been the recognition by some psychometricians that there is no such thing as a final `fit' of data to the model and hence that no one test is ever likely to be complete. Appreciation of this point still needs to be given much wider circulation among workers in the field. Then there will be less of a tendency to reject data sets (or the model) outright, simply because one test failed to show `fit'. Implicit in this perspective is the assumption that there is as much to be learnt about a data set from the responses which misfit as there is from those which do fit.
Issues in the Fit of Data to Psychometric Models, Graham Douglas
Education Research and Perspectives, 9:1, 1982, 32-43.
Reproduced with permission of The Editors, The Graduate School of Education, The University of Western Australia. (Clive Whitehead, Oct. 29, 2002)
REFERENCES
Andersen, E. B. A goodness of fit test for the Rasch model. Psychometrika, 1973, 38, 123-40.
Andrich, D. The Rasch Multiplicative Binomial Model: Applications to Attitude Data, Research Report Number 1, Measurement and Statistics Laboratory, Department of Education, University of Western Australia, 1975.
Bortkiewicz, L. V. Das Gestz der Kleinen Zahlen. Leipzig, Teubner, 1898.
Divgi, D. Does the Rasch model really work? Paper presented at Annual Meeting of the National Council on Measurement in Education, Los Angeles, 1981.
Leunbach, G. A probabilistic measurement model for assessing whether two tests measure the same personal factor. Unpublished paper, 1976.
Lord, F. M. Applications of Item Response Theory to Practical Testing Problems. Hillsdale, N.J.: Lawrence Erlbaum Associates, 1980.
Mead, R. Assessment of Fit of Data to the Rasch Model Through Analysis of Residuals. Unpublished Doctoral Dissertation, University of Chicago, 1976.
Molenaar, I. Some Improved Diagnostics for Failure of the Rasch Model. Heymans Bulletins Psychologische Instituten. R. J. Groningen, HB-80-482-EX, 1981.
Rao, C. R. Linear Statistical Inference and its Applications. (2nd ed.) John Wiley & Sons, N.Y., 1973.
Rasch, G. Probabilistic Models for Some Intelligence and Attainment Tests. (Copenhagen, Danish Institute for Educational Research, 1960), Chicago, University of Chicago Press, 1980.
van den Wollenberg, A. On the Wright-Panchepakesan goodness of fit test for the Rasch model. (In press), 1981. [Probably published in van den Wollenberg's 1982 papers.]
Wright, B. D. Sample-free test calibration and person measurement. Proceedings of the 1967 Invitational Conference on Testing Problems. Princeton, N.J.: E.T.S., 1967.
Wright, B. D. & N. Panchepakesan. A procedure for sample-free item analysis. Educational and Psychological Measurement, 1969, 29, 23-57.
Wright, B. D. and M. H. Stone. Best Test Design. MESA Press, Chicago, 1979.
Go to Top of Page
Go to Institute for Objective Measurement Page
Please help with Standard Dataset 4: Andrich Rating Scale Model
Rasch Publications | ||||
---|---|---|---|---|
Rasch Measurement Transactions (free, online) | Rasch Measurement research papers (free, online) | Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch | Applying the Rasch Model 3rd. Ed., Bond & Fox | Best Test Design, Wright & Stone |
Rating Scale Analysis, Wright & Masters | Introduction to Rasch Measurement, E. Smith & R. Smith | Introduction to Many-Facet Rasch Measurement, Thomas Eckes | Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. | Statistical Analyses for Language Testers, Rita Green |
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar | Journal of Applied Measurement | Rasch models for measurement, David Andrich | Constructing Measures, Mark Wilson | Rasch Analysis in the Human Sciences, Boone, Stave, Yale |
in Spanish: | Análisis de Rasch para todos, Agustín Tristán | Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez |
FORUM | Rasch Measurement Forum to discuss any Rasch-related topic |
Coming Rasch-related Events | |
---|---|
Feb. 27 - June 24, 2017, Mon.-Sat. | On-line: Advanced course in Rasch Measurement Theory (EDUC5606), Website |
March 31, 2017, Fri. | Conference: 11th UK Rasch Day, Warwick, UK, www.rasch.org.uk |
April 2-3, 2017, Sun.-Mon. | Conference: Validity Evidence for Measurement in Mathematics Education (V-M2Ed), San Antonio, TX, Information |
April 26-30, 2017, Wed.-Sun. | NCME, San Antonio, TX, www.ncme.org |
April 27 - May 1, 2017, Thur.-Mon. | AERA, San Antonio, TX, www.aera.net |
May 26 - June 23, 2017, Fri.-Fri. | On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
June 30 - July 29, 2017, Fri.-Fri. | On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com |
July 31 - Aug. 3, 2017, Mon.-Thurs. | Joint IMEKO TC1-TC7-TC13 Symposium 2017: Measurement Science challenges in Natural and Social Sciences, Rio de Janeiro, Brazil, imeko-tc7-rio.org.br |
Aug. 7-9, 2017, Mon-Wed. | PROMS 2017: Pacific Rim Objective Measurement Symposium, Sabah, Borneo, Malaysia, proms.promsociety.org/2017/ |
Aug. 11 - Sept. 8, 2017, Fri.-Fri. | On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com |
Aug. 18-21, 2017, Fri.-Mon. | IACAT 2017: International Association for Computerized Adaptive Testing, Niigata, Japan, iacat.org |
Sept. 15-16, 2017, Fri.-Sat. | IOMC 2017: International Outcome Measurement Conference, Chicago, jampress.org/iomc2017.htm |
Oct. 13 - Nov. 10, 2017, Fri.-Fri. | On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
Jan. 5 - Feb. 2, 2018, Fri.-Fri. | On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
Jan. 10-16, 2018, Wed.-Tues. | In-person workshop: Advanced Course in Rasch Measurement Theory and the application of RUMM2030, Perth, Australia (D. Andrich), Announcement |
Jan. 17-19, 2018, Wed.-Fri. | Rasch Conference: Seventh International Conference on Probabilistic Models for Measurement, Matilda Bay Club, Perth, Australia, Website |
May 25 - June 22, 2018, Fri.-Fri. | On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
June 29 - July 27, 2018, Fri.-Fri. | On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com |
Aug. 10 - Sept. 7, 2018, Fri.-Fri. | On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com |
Oct. 12 - Nov. 9, 2018, Fri.-Fri. | On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
The HTML to add "Coming Rasch-related Events" to your webpage is: <script type="text/javascript" src="http://www.rasch.org/events.txt"></script> |
Our current URL is www.rasch.org
The URL of this page is www.rasch.org/erp3.htm