Nandakumar and Stout (1993, p.64) write "Stout's [DIMTEST] procedure seems very promising for assessing the dimensionality underlying a set of items. It is an outgrowth of the conceptual definition of essential unidimensionality and was developed to be sensitive to dominant dimensions and insensitive to transient or minor dimensions. The procedure is nonparametric (thus avoiding parametric model-data problems), supported by asymptotic theory, and is computationally simplistic."
DIMTEST is a creative and mathematically ambitious attempt to assess unidimensionality. Nevertheless, despite its award-winning status, DIMTEST is severely flawed.
The indispensable step in DIMTEST is partitioning of test items into three subtests: AT1 a "unidimensional" subtest, AT2 a subtest with the same difficulty distribution as AT1, [in DIMTEST 2, AT2 is simulated], and PT a subtest used for partitioning person abilities. Then some simple (for a computer) calculations produce Stout's T statistic that assesses departure from essential unidimensionality. But hazards and defects abound!
1. All person response strings with missing data must be omitted. All non-dichotomous items must be dropped. This is because items are to be characterized by dichotomous p-values and persons by total test raw scores. Obviously DIMTEST is intended for conventional MCQ tests. In other situations, large proportions of the data will be lost. Adaptive testing and rated performances are excluded.
2. An item difficulty continuum must be constructed in order to get started. Then subset AT1 is specified to reasonably cover the difficulty continuum. But how is difficulty to be identified? DIMTEST requires that item characteristic curves be monotone, i.e., that any increase in person ability is always accompanied by an increase in the probability of a correct response on any item. The form of the monotonic function (logistic, normal ogival, with/without guessing, etc.) is not specified. This means that for given data there is no unique hierarchy of item difficulty (see example, RMT 6(1) p. 199 Figure 5). DIMTEST actually uses item p-values to define the continuum. This is equivalent to requiring monotonicity, on average, of person characteristic curves, i.e., that stochastic Guttman ordering prevail. This, in turn, implies that the data must approximate the Rasch model (RMT 6(3) p. 232)!
3. A thought-to-be "unidimensional" subset, AT1, must be chosen. AT1 is to contain up to ¼ of the items (but at least 4) chosen by "expert opinion" or analytical technique (e.g., factor analysis) to have the same dominant trait, i.e., to be unidimensional. In addition, this "dominant" trait is to be as different as possible from any other traits there may be in the test. This implies that the analyst can identify and contrast all the dimensions in a test a priori - in which case what could be the motive for a test of dimensionality? Finally, AT1 must otherwise contain the same types and amounts of response variance as the other items (another impossible requirement), and also provide reasonable coverage of the difficulty continuum. Since AT1 becomes the criterion for determining unidimensionality, a poor choice of items for AT1 makes DIMTEST results undetectably meaningless! In general, the analyst cannot know from DIMTEST results whether the choice is good or poor.
Rasch analysis is almost always more effective than factor analysis for identifying the dominant trait (Smith & Miao, 1994), and Rasch item maps certainly provide a useful guide to expert opinion.
4. Linear arithmetic is perpetrated on non-linear p-values during the selection of AT2. AT2 is selected after AT1 and must have the same number of items and the same item difficulty distribution as AT1, but contain the same dimensional structure as the PT items. This assumes that the raw score metric is linear enough so that the AT1 and AT2 distributions can be compared with the usual summary statistics. AT2 must also be as noisy and multidimensional as the remaining items (of whatever quality they happen to be)?! Here too, a poor selection of items for AT2 makes DIMTEST results undetectably meaningless! [In DIMTEST 2, AT2 is is simulated. The claim is "Research has shown this to result in a more powerful hypothesis-testing statistic" (www.assess.com). It would be a considerable accomplishment to construct a simulated AT2 that matches the noise and multidimensionality of PT.]
Since Rasch is more realistic about the metric, and better reports noise distribution, the best way to select AT2 is to use Rasch difficulties and fit mean-squares, not p-values and point-biserials or factor loadings, as the selection basis.
5. Consistency of person performance (i.e., person fit) across items is required. The PT subtest contains all remaining items, exhibiting all types of response variance, including multidimensionality, when present. The persons are stratified by raw score on PT into subgroups. The raw score on items of intended heterogeneity is treated as a good enough, (i.e., sufficient), statistic for subgrouping persons of similar ability on the trait. Since the subgroupings on PT are to be carried back to AT1 and AT2, it follows that these persons must perform at the same level on all three subtests. But this performance consistency is not verified. Thus, for example, response effects at the end of a speeded test, would skew DIMTEST results depending on how the last few items are allocated to AT1, AT2 and PT. Finally subgroups with less than 20 persons are dropped, so that DIMTEST can even drop complete, non-extreme response vectors.
Rasch analyze the PT data set and drop inconsistent persons before stratifying by raw scores.
6. Essential unidimensionality is not strictly unidimensional. DIMTEST's model raw score variance for each raw score subgroup is the sum of the binomial variances of the p-values, pi(1-pi), regardless of the supposed form of the monotonic ICC's. Essential unidimensionality then requires that the unmodelled noise level across subgroups on AT1, the purportedly "unidimensional" item subset, be statistically the same as on AT2, the purportedly "typical" item subset. This is a much more relaxed requirement than that of local unidimensionality, which requires that all item covariances be statistically zero. Essential unidimensionality is thus in pretension more accommodating to multidimensional data than is the Rasch model specification.
DIMTEST is easy-going on theory. It is based on a relaxed form of unidimensionality, defined in terms of a criterion subtest (AT1) of researcher-contrived "unidimensionality". DIMTEST deliberately overlooks bad items and misperforming persons. The resulting statistic has no clear meaning. But DIMTEST is demanding on data collection, rejecting person response strings containing missing data and also raw scores infrequently observed. A variable pronounced unidimensional by Rasch will always be essentially unidimensional by DIMTEST. A variable declared essentially unidimensional by DIMTEST may be far from unidimensional by Rasch criteria. Since, essential unidimensionality is easier to obtain (and manipulate by the arbitrary choice of items allowed in AT1 and AT2) than strict unidimensionality, expect Stout's T to become a test constructor's statistic of choice!
Nandakumar R., Stout W. (1993) Refinements of Stout's Procedure for Assessing Latent Trait Unidimensionality.
Smith R.M., Miao C.Y. (1994) Assessing unidimensionality for Rasch measurement. Chapter 18 in M. Wilson (ed.) Objective Measurement: Theory into Practice, Vol. 2. Norwood, NJ: Ablex.
DIMTEST diminuendo. Linacre JM. Rasch Measurement Transactions, 1994, 8:3 p.384
|Rasch Measurement Transactions (free, online)||Rasch Measurement research papers (free, online)||Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch||Applying the Rasch Model 3rd. Ed., Bond & Fox||Best Test Design, Wright & Stone|
|Rating Scale Analysis, Wright & Masters||Introduction to Rasch Measurement, E. Smith & R. Smith||Introduction to Many-Facet Rasch Measurement, Thomas Eckes||Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr.||Statistical Analyses for Language Testers, Rita Green|
|Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar||Journal of Applied Measurement||Rasch models for measurement, David Andrich||Constructing Measures, Mark Wilson||Rasch Analysis in the Human Sciences, Boone, Stave, Yale|
|in Spanish:||Análisis de Rasch para todos, Agustín Tristán||Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez|
|Forum||Rasch Measurement Forum to discuss any Rasch-related topic|
Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement
Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.
|Coming Rasch-related Events|
|June 23 - July 21, 2023, Fri.-Fri.||On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com|
|Aug. 11 - Sept. 8, 2023, Fri.-Fri.||On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com|
The URL of this page is www.rasch.org/rmt/rmt83n.htm