Debate about IRT sometimes verges on the nonsensical, and certainly on the irascible, because protagonists are using the term in very different senses. Scanning the psychometric literature reveals at least three tentative definitions:
(1) IRT encompasses any model "relating the probability of an examinee's response to a test item to an underlying ability." (HMIRT, p. v). This definition is so broad that it includes everything from Classical Test Theory (CTT) to non-parametric Mokken scaling (see RMT 13:2 p. 690), including Rasch models.
or (2) IRT encompasses any mathematical model which attempts to predict observations from locations on a latent variable. This is also called "Latent Trait Theory". It includes logistic models of all types, normal ogive models, log-log_{e} models, Rasch models, Samejima's Graded-Response model, etc.
or (3) IRT centers on the particular models advocated by Frederic M. Lord, particularly 2-PL and 3-PL, but also 1-PL, Normal Ogive, and recently by extension, Generalized Partial Credit models. Samejima's Graded Response model is sometimes also included. This definition excludes Rasch models. Though Lord admitted the utility of Rasch models under special circumstances (Lord, 1983), in general he did not advocate them.
What is disconcerting is that deconstruction is needed in order to determine the IRT definition intended by an author and precisely what models lie within that definition.
A Taxonomic Adventure
Consider the typical, apparently straightforward, statement that
"IRT item parameters are not dependent on the sample used to generate the parameters, and are
assumed to be invariant (within a linear transformation) across divergent groups within a research
population and across populations" (Reeve, 2002).
This is not true of CTT, because item p-values are highly sample dependent, and linear transformations are meaningless. So this statement does not apply, in general, to (1) above.
Consider (3). Is this true of conventional 1-PL? Not exactly. Under 1-PL, the person sample mean is set at 0.
Consequently the item parameter estimates change depending on the person sample ability distribution. But perhaps
this is what is intended by "linear transformation". So let us grant this.
Note: 1-PL is an approximation to the normal ogive model, expressed in logit terms with fixed discrimination
and no guessing. Usually the person mean is set at 0, i.e., it is norm-referenced.
The Rasch dichotomous model is a derivation from measurement axioms. It has nothing to do with the normal ogive
model. The item mean is set at 0, i.e., it is criterion-referenced. By an accidental coincidence,
1-PL and the Rasch dichotomous model are, in principle, algebraically equivalent.
Is this true of 3-PL? We encounter another nicety. What does "IRT item parameters" mean? Does it mean their true values as expressed in the mathematical model, or their estimated values from data? If it means their true values, then the statement is a tautology because the true values are, by definition, not dependent on any sample. So does it mean "parameter estimates"? This takes us into another level of complexity.
"... not dependent on the sample used ..." implies that any reasonable sample of the same kind of subjects with any mean, variance, skewness, kurtosis, modality, discreteness, etc. yields statistically equivalent parameter estimates. The description that Lord (1980, p. 180) gives of his estimation procedure implies this is true. But that procedure cannot work. It diverges. Constraints must be placed on the sample distribution and on other parameter values. These constraints compress and expand the latent variable so that it loses its intended linear form and becomes a local description. But perhaps this is what the statement means by "assumed".
If analysts and decision-makers are prepared to assume that the constraints on sample distribution, etc., will always match their empirical data, then they are justified in assuming that their parameter estimates will not depend on the sample used. But their assumptions will always be insecure. Once the assumption, or rather assertion, that the sample has any particular distribution is imposed on the estimation process, that process will yield a sample distribution that matches the assumption. The estimation process becomes a self-fulfilling prophecy. Surely this is not what the statement intends.
We now see that a statement intended to characterize all IRT models, in fact characterizes only a limited set, and not even all the ones in (3) that Fred Lord advocated.
Wright's Bifurcation
Consider (2) above. Ben Wright (e.g., 1984) divides it into two sub-classes according to the axiomatic basis of the
psychometric models.
In one sub-class are what Wright labels "IRT models." These accord with Lord (1980, p. 14), where he writes, "The reader may ask for a priori justification of [3-PL]. No convincing a priori justification exists .... The model must be justified on the basis of the results obtained, not on a priori grounds." Here is Martha Stocking's summary of Lord's statistical methodology: "Building statistical models is just like this. You take a real situation with real data, messy as this is, and build a model that works to explain the behavior of real data." (New York Times, 2-10-2000). In other words, if the model doesn't fit a particular data set, change the model!
In the other sub-class are what Wright labels "fundamental measurement models," based on measurement axioms. These include Rasch models. Such models embody mathematical ideals (analogous to parallel lines and Pythagorean triangles) that can never be realized empirically. They can only be approximated. But a good approximation is all that is required for utility. Accordingly, if the data don't approximate the desired model, the data are not immediately useful for measurement, and so must be changed or replaced.
The Reeve (2002) statement applies most exactly to this "fundamental measurement" sub-class of definition (2), and not, in general, to (1), (3) or the sub-class of (2) that Wright labels "IRT".
Those new to IRT have good reason to be confused.
Now it's your turn ....
Here is another statement: "Rasch scaling transforms the ordinal items to the logit scale and, thus, to interval-level measurement. It should be noted that this metric is characteristic of all IRT models, not just the Rasch model" (Cook et al., 2003). Please deconstruct this statement. How does it relate to the three tentative definitions of IRT? Is it robust against Wright's bifurcation?
[Much later addition: What is "this metric" in the Cook et al. statement? If "this metric" is the "logit scale", it is seen that the statement does not apply to classical test theory (raw score metric), nor to the normal ogive (probit metric) or Mokken (no metric) models. Nor even to Lord's IRT models - these are usually expressed in an approximate probit metric. So "all IRT models" has reduced back to Rasch models! On the other hand, if "this metric" means "interval-level measurement", it can be demonstrated that only Rasch models have the measurement property that "one more unit" means the same amount extra anywhere on the latent variable. For other models, this may be assumed, but cannot be demonstrated. In fact, it can usually be falsified.]
John Michael Linacre
Cook K.C., Monahan P.O., McHorney C.A. (2003) Delicate balance between theory and practice: Health status assessment and Item Response Theory. Medical Care 41:5, 571-4.
HMIRT: Van der Linden, W. J., Hambleton, R. K. (eds.) (1997). Handbook of Modern Item Response Theory. New York: Springer Verlag.
Lord F.M. (1980) Applications of Item Response Theory to Practical Testing Problems. Hillsdale NJ: Erlbaum.
Lord F.M. (1983) Small n Justifies the Rasch Model. In: Weiss, David J., ( ed.) (1983). New Horizons in Testing. New York: Academic Press, 51-61.
Reeve, B.B. (2002) An Introduction to Modern Measurement Theory. National Cancer Inst.
Wright B.D. (1984) Despair and hope for educational measurement. Contemporary Education Review, 3(1), 281-288.
What is Item Response Theory, IRT? A tentative taxonomy. J.M. Linacre … 17:2 p. 926-927
Rasch Publications | ||||
---|---|---|---|---|
Rasch Measurement Transactions (free, online) | Rasch Measurement research papers (free, online) | Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch | Applying the Rasch Model 3rd. Ed., Bond & Fox | Best Test Design, Wright & Stone |
Rating Scale Analysis, Wright & Masters | Introduction to Rasch Measurement, E. Smith & R. Smith | Introduction to Many-Facet Rasch Measurement, Thomas Eckes | Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. | Statistical Analyses for Language Testers, Rita Green |
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar | Journal of Applied Measurement | Rasch models for measurement, David Andrich | Constructing Measures, Mark Wilson | Rasch Analysis in the Human Sciences, Boone, Stave, Yale |
in Spanish: | Análisis de Rasch para todos, Agustín Tristán | Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez |
Forum | Rasch Measurement Forum to discuss any Rasch-related topic |
Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement
Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.
Coming Rasch-related Events | |
---|---|
Aug. 14 - 16, 2019. Wed.-Fri. | An Introduction to Rasch Measurement: Theory and Applications (workshop led by Richard M. Smith) https://www.hkr.se/pmhealth2019rs |
August 25-30, 2019, Sun.-Fri. | Pacific Rim Objective Measurement Society (PROMS) 2019, Surabaya, Indonesia https://proms.promsociety.org/2019/ |
Oct. 11 - Nov. 8, 2019, Fri.-Fri. | On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
Nov. 3 - Nov. 4, 2019, Sun.-Mon. | International Outcome Measurement Conference, Chicago, IL,http://jampress.org/iomc2019.htm |
Jan. 24 - Feb. 21, 2020, Fri.-Fri. | On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
May 22 - June 19, 2020, Fri.-Fri. | On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
June 26 - July 24, 2020, Fri.-Fri. | On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com |
Aug. 7 - Sept. 4, 2020, Fri.-Fri. | On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com |
Oct. 9 - Nov. 6, 2020, Fri.-Fri. | On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
June 25 - July 23, 2021, Fri.-Fri. | On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com |
The URL of this page is www.rasch.org/rmt/rmt172g.htm
Website: www.rasch.org/rmt/contents.htm