Editor's note: There are indications that the random-number generator malfunctioned during these simulations. Please do not rely on these findings without verifying them for your own situation. See also Rasch First or Factor First?
Introduction
The Rasch measurement model is a unidimensional measurement model and this attribute has been the subject of much discussion in the Transactions (Stahl J 1991; Wright BD 1994; Linacre JM 1994; Fisher WP 2005). In an early article Wright and Linacre tell us that "whether a particular set of data can be used to initiate or to continue a unidimensional measuring system is an empirical question (Wright BD, Linacre JM 1989). The only way it can be addressed, they argue, is to
1) analyze the relevant data according to a unidimensional measurement model,
2) find out how well and in what parts these data do conform to our intentions to measure and,
3) study carefully those parts of the data which do not conform, and hence cannot be used for measuring, to see if we can learn from them how to improve our observations and so better achieve our intentions'. (MESA Memo 44, reprinted from Wright BD, Linacre JM 1989).
Smith uses simulation to investigate which technique is better at discovering dimensionality (Smith RM, 1996). A review of these findings in RMT (9:4, 1996) argues that the conclusions are simple. 'When the data are dominated equally by uncorrelated factors, use factor analysis. When they are dominated by highly correlated factors, use Rasch. If one factor dominates, use Rasch."
Table 1. Details of Datasets | ||
---|---|---|
Dataset | Structure | Contents |
1 | Unidimensional | 20 items. |
2 | Two orthogonal dimensions (r<.05) | 10 items in each dimension. Items generated in difficulty order (1=easiest, 20=hardest) . Interlaced items with item 1 assigned to dimension1, item2 assigned to dimension 2 … to ensure equal difficulty for each dimension |
3 | Two orthogonal dimensions (r<.05) | 10 items in each dimension. Items generated in difficulty order (1=easiest, 20=hardest) . Dimensions stacked with easy items 1-10 in Dimension 1, and hardest items 11-20 in Dimension 2 |
4 | Two orthogonal dimensions (r<.05) | 16 items in Dimension 1 and 4 items in Dimension 2 . (items 5,10,15,20). Items generated in difficulty order (1=easiest, 20=hardest) |
5 | Two correlated dimensions (r =.70) | 10 items in each dimension. Items generated in difficulty order (1=easiest, 20=hardest) . Interlaced items with item 1 assigned to dimension1, item2 assigned to dimension 2 … to ensure equal difficulty for each dimension |
6 | Two correlated dimensions (r =.70) | 10 items in each dimension. Items generated in difficulty order (1=easiest, 20=hardest) . Dimensions stacked with easy items 1-10 in Dimension 1, and hardest items 11-20 in Dimension 2 |
In Rasch analysis the understanding and detection of unidimensionality in the context of medical and psychological studies has developed and changed much in the past 15 years. Early published articles subscribed to the notion that fit to the model supported the unidimensionality of the scale and little else was done to confirm that assumption (Tennant A, et. al 1996). In the 1990's Wright had put forward a Unidimensionality Index (Wright BD, 1994), and gradually greater emphasis was placed on analysis of the residuals and particularly a Principal Component Analysis (PCA) of the residuals to detect second factors after the 'Rasch Factor' was removed. Originally interpretation of this was difficult as the proportion of variance attributable to the first residual factor was reported, but the total variation in the data was unknown. Subsequently Winsteps (Linacre JM, 2006) has incorporated the total variation into its reporting, so the magnitude of the first residual factor against the Rasch factor can be determined. In 2002 Smith reported an independent t-test approach to testing for unidimensionality (Smith EV, 2002, JAM) which is being incorporated into the latest RUMM2020 software (Andrich, D., Lyne A, Sheridan B., Luo G, 2003). Elsewhere, others have used classical factor analytical approaches to testing for unidimensionality prior to fitting data to the Rasch model (Bjorner JB, Kosinski M, Ware JE Jr, 2003).
A review of the literature suggests that there are three main approaches to assessing dimensionality:
a) prior testing using classical approaches, such as factor analysis;
b) those which hold to the assumption of fit equals unidimensionality - a fit only approach;
c) those which involve post-hoc testing, having undertaken the Rasch analysis and supposing fit to the Rasch model (e.g., PCA of the residuals).
Thus it is possible to conceive of a broad selection of tests which may be undertaken for any given data set. For the everyday user of Rasch software working in the health and social sciences, how can they be sure that they are truly dealing with a unidimensional construct? How far do these various tests detect multidimensionality in the data?
Methods
The aim of this present study is to contrast commonly used techniques from each of the three main approaches identified above by applying them to a set of simulated datasets with known dimensionality characteristics. Each data set is based upon 20 polytomous items with 5 response options (0-4) and 400 cases. Details of the datasets are outlined in Table 1. A series of analyses were conducted on each of the 6 data files to assess dimensionality (Table 2). SPSS Version 14.0 was used to conduct factor analysis, and both Winsteps and RUMM2020 were used to conduct Rasch analysis. The data were simulated using SIMMsDepend (Marais I,2006).
We have chosen procedures from SPSS because it is widely available and easy to use. Principal components analysis (PCA) was used to extract the factors followed by oblique rotation of factors using Oblimin rotation (delta = 0). Kaiser's criterion, which retains eigenvalues above 1, was used in Procedure 1.1 to guide the identification of relevant factors. In Procedure 1.2 Horn's parallel analysis (Horn JL, 1965), which has been identified as one of the most accurate approaches to estimating the number of components (Zwick & Velicer, 1986), was used. The size of eigenvalues obtained from PCA are compared with those obtained from a randomly generated data set of the same size. Only factors with eigenvalues exceeding the values obtained from the corresponding random data set are retained for further investigation. Parallel analysis was conducted using the software developed by Watkins (2000). Analyses were also conducted using a non-linear Factor Analysis (HOMALS) available in SPSS. Using curve estimation and a quadratic function, the values exported from the HOMALS procedure can be tested to determine the number of dimensions in the data.
For the Rasch procedures we set both Winsteps and RUMM2020 to have identical convergence criteria. As none of the data sets satisfied the assumptions of the rating scale model, we use the unrestricted (partial credit) polytomous model. A number of different fit statistics are reported. OUTFIT ZSTD in Winsteps and Residuals in RUMM are equivalent, with any variation reflecting the difference in the underlying estimation procedures. We use the value 2.5 and above for both ( ~ 99% significance) to determine misfit to model expectation. Usually the two statistics provide similar magnitudes of fit to the model.
INFIT and OUTFIT MNSQ (Winsteps) are also reported with acceptable ranges of 0.9-1.1 and 0.7-1.3 respectively, following Smith's recommendations for sample size adjustment (Smith RM et al, 1998). RUMM Chi-Square probabilities are also reported, Bonferroni adjusted to 0.0025 and unadjusted. We also report the RUMM Chi Square Interaction Fit Statistic which is a summary fit statistic and widely used to indicate overall fit to the model. We also report Wright's Unidimensionality Index which is the person separation using model standard errors, divided by the person separation using real (misfit inflated) standard errors (Wright BD, 1994). A value above 0.9 is indicative of unidimensionality; 0.5 and below of multidimensionality and everything between is the usual grey area of uncertainty!
We report the usual Principal Component Analysis (PCA) of the residuals, including the percentage of variance attributable to the Rasch factor and the first residual factor (usually identical in Winsteps and RUMM), and the percentage of variance attributable to the first residual factor out of total variance (Winsteps).
Finally, we report on a comparison of person estimates based upon subsets of items. In practice where there is a conceptual basis for multidimensionality estimates are made from the a-priori dimensions. In the present case with this simulated data, we use the item loadings on the first factor of the PCA of the residuals. Person estimates derived from the highest positive set of items (correlated at 0.3 and above with the component) are contrasted against those derived from the highest negative set. A series of independent t-tests are undertaken to compare the estimates for each person and the percentage of tests outside the range ±1.96 is computed, which follows Everett Smith's general approach (Smith EV, 2002). A Binomial Proportions Confidence Interval can be calculated for this percentage. The Binomial CI should overlap 5% for a non-significant test. The results of these analyses are reported in the Table 3.
Table 2. Details of Procedures | ||
---|---|---|
Prior testing | 1.1 Default SPSS Principal Components Analysis using Kaiser's criterion, retaining eigenvalues above 1. | |
1.2 Default SPSS Principal Components Analysis with Horn's parallel analysis to determine significant eigenvalues. | ||
1.3 HOMALS non linear factor analysis | ||
Fit to the Rasch model | 2.1 Percentage of items which misfit the (polytomous) model OUTFIT ZSTD (Winsteps). | |
2.2 Percentage of items which misfit the (polytomous) model Residuals (RUMM). | ||
2.3 Percentage of items showing INFIT MNSQ misfit (Winsteps). | ||
2.4 Percentage of items showing OUTFIT MNSQ misfit (Winsteps). | ||
2.5 Percentage of items showing Chi-Square misfit (RUMM). | ||
2.6 Percentage of items showing Chi-Square misfit (RUMM), Bonferroni corrected | ||
2.7 Summary Fit statistics. | ||
2.8 Wright's Unidimensionality Index. | ||
2.9 Person Separation Index (RUMM) (= Rasch reliability) | ||
2.10 Person Separation (real) (Winsteps) | ||
Post Hoc tests | 3.1 Percentage of variance attributable to the Rasch factor | |
3.2 Percentage of variance attributable to the first residual factor | ||
3.3 Ratio of variance attributable to first residual factor compared with Rasch factor (Winsteps) | ||
3.4 Percentage of individual t-tests outside the range ± 1.96 (RUMM2030) with Binomial Test for Proportion confidence intervals where appropriate. |
Results
The default factor analysis (1.1) failed to identify the single dimension, instead, identifying two 'difficulty' dimensions. The HOMALS procedure failed to detect the situation (specified in Set 4) where only four items belonged to a second dimension, and consistently failed where the correlation between factors was ~ 0.7. The Rasch model fit statistics performed poorly where dimensions were interlaced and where the correlation between factors was ~ 0.7. Wright's Unidimensionality Index appeared insensitive to multidimensionality. Little can be gleaned from the percentage of variance attributable to the Rasch factor, as this seems consistently high, irrespective of the underlying dimensionality. In Set 1 the percentage of variance attributable to the first residual factor was substantially lower than in other sets, but the percentage of variance out of the total variance was low, except for the orthogonal data sets. The independent t-test approach consistently identified the unidimensional and multidimensional data sets.
These results have a number of implications for everyday practice of Rasch analysis. In the construction of a new polytomous scale where the intention is to create a unidimensional construct, Rasch fit statistics may mislead if there are two dimensions where the items are interlaced in difficulty. Supporting Richard Smith's (1996) recommendation, exploratory factor analysis should be undertaken at the outset to make sure that dimensionality is not going to be a problem, or to identify which items may be problematic so as to inform the iterative Rasch analysis procedure. As we cannot know in advance whether or not two interlaced dimensions may exist, this analysis should be undertaken as a matter of routine. The simplest way to undertake this is with the default factor analysis procedure using the parallel analysis to determine the number of significant eigenvalues.
Although the PCA of the residuals may give clues to multidimensionality in the data, their interpretation is not straightforward. The percent of variance of the first residual factor (out of total variance in the residuals) does show a clear increasing trend from the unidimensional data, through the correlated factors to the orthogonal factors. However, at what point does this figure shift from a unidimensional indicator to a multidimensional indicator?
The individual t-test approach proposed by Everett Smith seems the most robust in that it clearly identifies dimensionality. This test has importance not just for the interpretation of unidimensionality, but also the meaning of multidimensionality in the data. Note that the proportion of t-tests outside the range is high across Sets 2-6, even when the factors are correlated at ~ 0.7. In practice this means that person estimates differ by between 1 to 2 logits, depending upon which set of items are being used for that estimate. This variability in person estimate is unsustainable when scales are to be used for individual clinical use, for example where cut points are often used to determine clinical pathology. The variability of person estimates where multidimensionality exists also raises fundamental questions about Computer Adaptive Testing approaches which rely upon estimates based upon just a few variables. Clearly, only the strictest form of unidimensionality must be used to avoid significantly different person estimates driven by multidimensionality.
The analysis we have undertaken is only at the simplest level, reflecting what is most likely to be used in everyday research practice in the health and social sciences. We have, for example, not used Monte Carlo simulation or other methods to look at ranges of variance explained. Neither have we looked at different sample sizes or different test lengths. We have not addressed dichotomous items, which bring their own set of problems to factor analysis. Nevertheless, we believe that this simple analysis has shown that great care needs to be taken in confirming the assumption of unidimensionality of data when fitted to the Rasch model. Perhaps others may pursue some of the issues we have omitted.
Conclusion
When developing new polytomous scales, an exploratory factor analysis used a priori, with parallel analysis to indicate significant eigenvalues, should give early indications of any dimensionality issues prior to exporting data to Winsteps or RUMM [Editor: but see also Rasch Analysis First or Factor Analysis First?]. This should identify the situation of equal number of items on two factors which will not be detected by the Rasch analysis fit statistics and where the PCA of the residuals may be indeterminate. After fit of data to the Rasch model, careful examination of the PCA of the residuals should provide clues to any remaining multidimensionality. Comparison of person estimates derived from these subsets of items, using the independent t-test approach, should confirm or reject the unidimensionality of the scale.
Alan Tennant PhD,Academic Unit of Musculoskeletal & Rehabilitation Medicine, Faculty of Medicine and Health, The University of Leeds, UK.
Julie F. Pallant PhD, Faculty of Life and Social Sciences, Swinburne University of Technology, Hawthorn, Victoria 3122, Australia
Table 3. Summary of Results of Analyses | |||||||
---|---|---|---|---|---|---|---|
Test | Dataset: | 1 | 2 | 3 | 4 | 5 | 6 |
Prior Tests - Number of Factors | |||||||
1.1 | EFA with eigenvalue>1. (% Variance 2nd factor) | 2(6%) | 2(30%) | 2(31%) | 2(14%) | 2(63%) | 2(63%) |
1.2 | EFA with parallel analysis | 1 | 2 | 2 | 2 | 2 | 2 |
1.3 | HOMALS - number of factors | 1 | 3 | 2 | 1 | 1 | 1 |
Rasch Fit | |||||||
2.1 | % OUTFIT ZSTD out of range | 0 | 0 | 0 | 100 | 5 | 0 |
2.2 | % Residuals outside range | 0 | 0 | 0 | 85 | 0 | 0 |
2.3 | % INFIT MNSQ out of range | 5 | 0 | 5 | 100 | 20 | 15 |
2.4 | % OUTFIT MNSQ out of range | 0 | 0 | 0 | 60 | 0 | 0 |
2.5 | % Chi-Square significant | 0 | 5 | 70 | 100 | 0 | 0 |
2.6 | % Chi-Square significant (Bonferroni adjusted) | 0 | 0 | 35 | 70 | 0 | 0 |
2.7 | Item-Trait Interaction Fit statistic | 0.74 | 0.09 | 0.00 | 0.00 | 0.97 | 0.12 |
2.8 | Wright's Unidimensional Index | 1.08 | 1.11 | 1.11 | 1.12 | 1.07 | 1.08 |
2.9 | Person Separation Index (= Rasch Reliability) ~ a | 0.91 | 0.88 | 0.89 | 0.93 | 0.95 | 0.95 |
2.10 | (Real) Person Separation | 3.12 | 2.44 | 2.56 | 3.59 | 4.04 | 4.09 |
Post Hoc tests | |||||||
3.1 | % variance attributable to the Rasch factor. | 82.0 | 70.0 | 70.6 | 76.9 | 85.2 | 84.8 |
3.2 | % variance attributable to first residual factor | 7.4 | 48.8 | 47.5 | 25.4 | 26.3 | 23.8 |
3.3 | % variance attributable to first residual factor out of total variance | 1.4 | 14.3 | 14.1 | 6.4 | 3.8 | 3.7 |
3.4 | Percentage of individual t-tests outside range ± 1.96 (95% CI) where needed | 7.0(5-9%) | 55.0 | 51.5 | 45.3 | 38.8 | 35.0 |
References
Andrich D, Lyne A, Sheridan B, Luo G. (2006). RUMM 2020. Perth: RUMM Laboratory
Bjorner JB, Kosinski M, Ware JE Jr. Calibration of an item pool for assessing the application of item response theory to the headache. Quality of Life Research 2003; 12: 913 - 933.
de Bonis M, et al. The Severity of Depression. Rasch Measurement Transactions, 1992; 6:3 p. 242-3
Fisher W.P. Jr. Meaningfulness, Measurement and Item Response Theory (IRT). Rasch Measurement Transactions, 2005, 19:2 p. 1018-20
Horn JL A rationale and test for the number of factors in factor analysis. Psychometrika 1965; 30:179-185.
Linacre JM. DIMTEST diminuendo. Rasch Measurement Transactions, 1994, 8:3 p.384
Linacre JM. Winsteps Rasch measurement computer program. Chicago: Winsteps.com, 2006.
Marais I. SIMMsDepend. Murdoch University, Western Australia, 2006.
Raîche G. Critical Eigenvalue Sizes (Variances) in standardized residual Principal Components Analysis. Rasch Measurement Transactions, 2005, 19:1 p. 1012
Schumacker RE, Linacre JM Factor analysis and Rasch. Rasch Measurement Transactions 1996, 9:4 p. 470.
Smith EV. Detecting and evaluation the impact of multidimensionality using item fit statistics and principal component analysis of residuals. Journal of Applied Measurement 2002; 3:205-231.
Smith RM. A Comparison of methods for determining dimensionality in Rasch measurement. Structural Equation Modeling 1996; 3:25-40.
Smith RM et al. Using item mean squares to evaluate fit to the Rasch model. Journal of Outcome Measurement 1998; 2:66-78
Stahl J. Lost in the Dimensions, Rasch Measurement Transactions, 1991; 4(4):120
Tennant A, Hillman M, Fear J, Pickering A, Chamberlain MA. Are we making the most of the Stanford Health Assessment Questionnaire? Brit J Rheum 1996; 35: 574-578.
Watkins MW: Monte Carlo PCA for Parallel Analysis [software]. State College, PA: Ed & Psych Associates; 2000.
Wright BD. Unidimensionality coefficient. Rasch Measurement Transactions, 1994; 8:3 p.385
Wright B.D. Rank-ordered raw scores imply the Rasch model. Rasch Measurement Transactions, 1998, 12:2 p. 637-8.
Wright BD, Linacre JM. Observations are always ordinal; measurements, however, must be interval. Archives of Physical Medicine and Rehabilitation 1989; 70: 857-860.
Zwick, WR, Velicer WF. Comparison of the rules for determining the number of components to retain. Psychological Bulletin, 1986; 99: 432-442
Unidimensionality Matters! (A Tale of Two Smiths?), Tennant A., Pallant J.F. … Rasch Measurement Transactions, 2006, 20:1 p. 1048-51
Rasch Publications | ||||
---|---|---|---|---|
Rasch Measurement Transactions (free, online) | Rasch Measurement research papers (free, online) | Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch | Applying the Rasch Model 3rd. Ed., Bond & Fox | Best Test Design, Wright & Stone |
Rating Scale Analysis, Wright & Masters | Introduction to Rasch Measurement, E. Smith & R. Smith | Introduction to Many-Facet Rasch Measurement, Thomas Eckes | Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. | Statistical Analyses for Language Testers, Rita Green |
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar | Journal of Applied Measurement | Rasch models for measurement, David Andrich | Constructing Measures, Mark Wilson | Rasch Analysis in the Human Sciences, Boone, Stave, Yale |
in Spanish: | Análisis de Rasch para todos, Agustín Tristán | Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez |
Forum | Rasch Measurement Forum to discuss any Rasch-related topic |
Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement
Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.
Coming Rasch-related Events | |
---|---|
June 23 - July 21, 2023, Fri.-Fri. | On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com |
Aug. 11 - Sept. 8, 2023, Fri.-Fri. | On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com |
The URL of this page is www.rasch.org/rmt/rmt201c.htm
Website: www.rasch.org/rmt/contents.htm