Theoretical Complexity vs. Rasch Item Difficulty in Reading Tests

The concept of unidimensionality of reading comprehension (Weir & Porter, 1994) has led scholars to believe that there might be a one-to-one correspondence between item difficulty and the level of cognition the item measures (Alderson, 1990). It is commonplace among reading specialists to divide reading ability into different layers of cognition such that hypothetically labeled lower layers are assumed to be followed by higher ones (Alderson, 1990). The hierarchy assumption is so appealing that tests developers usually calibrate items solely in terms of item difficulty, while ignoring issues related to their level of cognition. Yet, it is often the case that more difficult items represent lower order abilities (at least as predicted by theory) than do easier ones (Weir and Porter, 1994). Paradoxically, harder items seem to contribute less to reading ability than do easier ones (Meyer, 1975, cited in McNamara, 1996).

Weir and Porter (1994) suggest that the main reason for limiting the reproducibility assumption to item difficulty in test constructions is 'practical expediency rather than ... a principled view of unidimensionality' (p. 9). Because empirical item hierarchies sometimes contradict theoretical notions of reading comprehension (McNamara, 1996; Weir & Porter, 1994; Alderson, 1990), we approach the issue from a qualitative as well as a quantitative perspective:

1. Does there exist a one-to-one correspondence between item difficulty and the nature of the latent ability the item measures?

2. To what extent do variations in item difficulty reflect qualitative rather than quantitative item differences?


Figure 1. Theoretical Complexity vs. Rasch Difficulty.

To address these questions we used the SBRT - Forms a and b - which are (mostly) multiple-choice item language tests. The SBRT was developed at the Iran University of Science and Technology (IUST) (Daftarifard, 2000) using over 200 intermediate students for each form. As is shown in Table 1, the SBRTa contains 39 questions that address twenty-four abilities that are frequently referred to in the literature. Items' hypothetical cognitive complexity is indicated by the ordinal number in the last column of this table. The classification of some items is uncertain (e.g., answering factual questions might either be classified as perception or speed reading).

Reading ability as a hierarchy

The results in Table 1 and Figure 1 reveal a clear lack of correspondence between item complexity and the hypothetical level of cognition. Some supposedly cognitively demanding abilities turned out to be less difficult than less cognitively demanding abilities, and some item types are out of order. This is summarized by the finding that the Spearman rank correlation between items' Rasch locations and their hypothetical complexity is just 0.22. Moreover, the average locations for items in complexity groups 1 or 2, 2, 3, 3 or 4, 4, 4 or 5, and 5, are -2.8, -0.2, 0.4, 0.4, 0.8, 1.2, and -0.1 logits, respectively.

The existence of one-to-one relation between empirical (i.e., Rasch) and hypothetical complexity follow is contradicted in many ways. For instance, DFH2 (distinguishing between fact and hypothesis) is harder than IN2 (inferencing), while RF2 (understanding the rhetorical function of the text) is easier than LT1 (understanding the literal meaning). Similarly, the presumably more complex skill of understanding the factual question (here FQ1) is much easier than mere text scanning (both SCB and SCE). Also, skimming (SK1) turns out to be more difficult than SK2 (Rasch measure -0.23) although both belong to speed reading category. Certain items which hypothetically measure higher ability like interpretation ability turn out to be much easier than lower level items like speed reading (here SK1 with the Rasch measure of 0.57). These include items AU1 with Rasch measure of -1.65, CT1 with the Rasch measure of -1.24, MI1 with the Rasch measure of -0.52, TP2 with the Rasch measure of -0.42, RF1 with the Rasch measure of -0.40, and IN2 with the Rasch measure of -0.40.

Another problem found in the data pattern concerns items with the same operational definition but with quite different item difficulties. These items include understanding the audience of the text, i.e., AU1 and AU2 with two different consecutive Rasch measures -1.65 and 0.39, CD1 and CD2 with two consecutive Rasch measures of 0.84 and -2.32, ED1 and DE2 with two different Rasch measures of -0.23 and 0.60 respectively, PA1 and PA2 with two consecutive different Rasch measures of -0.03 and 0.98, and TP1 and TP2 with the Rasch measures of 1.07 and -0.42 respectively. Among these, however, there are only a few items that operationally belong to one category and turn up with almost similar measure like SI1 and SI2 with Rasch measure of 0.45 and 0.43.

We point out that the findings summarized above cannot be attributed to the particular set of items being used. Firstly, the SBRTa items fit the Rasch model adequately (only one item's outfit exceeds 1.3), thus establishing this test-form's measurement validity. Second, in support of unidimensionality, factor analysis of items' Rasch residuals indicates that just three items (SI1, AU2 and SK2) loaded higher than 0.5 on the most prominent residual factor. Third, the SBRTa shows acceptable classical reliability (coefficient alpha = 0.82). Fourth, students' SBRTa measures are highly correlated with their measures on the widely used IELTS (exemplar, 1994, the academic version of module C , r = 0.71, p < .001). Finally, the lack of correlation between items' hypothetical and empirical difficulties is replicated for the second test form, the SBRTb. Similar to the value observed for the SBRTa, the rank correlation for the SBRTb is just 0.23.

The present findings thus indicate that while reading is unidimensional and hierarchical, this hierarchy disagrees with theoretical predictions in the literature (for an overview, see e.g., Alderson, 1990). Given this lack of correspondence, we propose that notions of items complexity require careful distinctions between the qualitative and quantitative aspects of reading theory. For instance, it may be necessary to distinguish between the complexity of a concept and the complexity of the question designed to assess this concept. Rasch scaling is likely to remain the tool of choice in this research, but it seems likely that multi-facetted approaches will be needed to accommodate both types of complexity simultaneously.

Parisa Daftarifard

Rense Lange, Integrated Knowledge Systems

References

Alderson, J. C. (1990). Testing reading comprehension skills (part one). Reading in a Foreign Language, 6(2), 425-438.

Daftarifard, P. (2002). Scalability and divisibility of the reading comprehension ability. Unpublished master's thesis. Tehran, Iran: IUST.

McNamara, T. (1996). Measuring Second Language Performance. New York: Addison Wesley Longman.

Weir C. J., & Porter D. (1994). The Multi-Divisible or Unitary Nature of Reading: The language tester between Scylla and Charybdis. Reading in a Foreign Language, 10(2), 1-19.

Editor's Note: These findings contrast with the remarkable success of the Lexile system at predicting the Rasch item difficulty of reading-comprehension items. See Burdick B., Stenner A.J. (1996) Theoretical prediction of test items. Rasch Measurement Transactions, 1996, 10:1, p. 475. www.rasch.org/rmt/rmt101b.htm

Table 1
Items' Rasch Difficulty and Hypothetical difficulty (SBRTa)
 Skills to be measuredCodeRasch DifficultyCognitiveComplexity**
1.Scanning and information searchSCB-0.922
SCE-0.202
2.Skimming SK10.572
SK2-0.232
3.Guessing GU20.123
4.Understanding the factual questionsFQ1-3.371 or 2
FQ2-2.281 or 2
5.Interpreting cohesive devicesCD10.843
CD2-2.323
6.Paraphrasing PA1-0.033
PA20.983
7.Distinguishing between the facts and hypothesisDFH10.983
DFH21.713
8.Distinguishing between cause and effectCE 10.633
9.DeductionDE1-0.234
DE20.664
10.Paragraph organizationPO21.074
11.Transcoding informationTR20.454
12.Text organizationTO11.874
TO20.804
13.Understanding the source of the textSI10.455
SI20.435
14.Understanding the function of the textRF1-0.405
RF2-0.195
15.Understanding the audience of the textAU1-1.655
AU20.395
16.Understanding the opinion of the authorO10.005
O2-0.265
17.Choosing the best title for the textCT1-1.245
18.InferenceIN10.145
IN2-0.405
19.Choosing Title for paragraphTP11.075
TP2-0.425
20.Choosing the main idea of the textMI1-0.525
MI20.375
21.Understanding the propositional meaning
(syntactical meaning or literal meaning)
LT10.743 or 4
LT20.203 or 4
22.Text diagramsTD20.233 or 4
23.Summarizing abilitySU21.264 or 5
** Numbers in the last column stand for the following in increasing complexity:
(1) Perception, (2) Speed Reading, (3) Word-based reading, (4) Analyzing, (5) Interpretation.


Daftaripard P., Lange R. (2009) Theoretical Complexity vs. Rasch Item Difficulty in Reading Tests, Rasch Measurement Transactions, 2009, 23:2, 1212-3

Please help with Standard Dataset 4: Andrich Rating Scale Model



Rasch Publications
Rasch Measurement Transactions (free, online) Rasch Measurement research papers (free, online) Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Applying the Rasch Model 3rd. Ed., Bond & Fox Best Test Design, Wright & Stone
Rating Scale Analysis, Wright & Masters Introduction to Rasch Measurement, E. Smith & R. Smith Introduction to Many-Facet Rasch Measurement, Thomas Eckes Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. Statistical Analyses for Language Testers, Rita Green
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Journal of Applied Measurement Rasch models for measurement, David Andrich Constructing Measures, Mark Wilson Rasch Analysis in the Human Sciences, Boone, Stave, Yale
in Spanish: Análisis de Rasch para todos, Agustín Tristán Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez

To be emailed about new material on www.rasch.org
please enter your email address here:

I want to Subscribe: & click below
I want to Unsubscribe: & click below

Please set your SPAM filter to accept emails from Rasch.org

www.rasch.org welcomes your comments:

Your email address (if you want us to reply):

 

ForumRasch Measurement Forum to discuss any Rasch-related topic

Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
March 31, 2017, Fri. Conference: 11th UK Rasch Day, Warwick, UK, www.rasch.org.uk
April 2-3, 2017, Sun.-Mon. Conference: Validity Evidence for Measurement in Mathematics Education (V-M2Ed), San Antonio, TX, Information
April 26-30, 2017, Wed.-Sun. NCME, San Antonio, TX, www.ncme.org - April 29: Ben Wright book
April 27 - May 1, 2017, Thur.-Mon. AERA, San Antonio, TX, www.aera.net
May 26 - June 23, 2017, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 30 - July 29, 2017, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
July 31 - Aug. 3, 2017, Mon.-Thurs. Joint IMEKO TC1-TC7-TC13 Symposium 2017: Measurement Science challenges in Natural and Social Sciences, Rio de Janeiro, Brazil, imeko-tc7-rio.org.br
Aug. 7-9, 2017, Mon-Wed. In-person workshop and research coloquium: Effect size of family and school indexes in writing competence using TERCE data (C. Pardo, A. Atorressi, Winsteps), Bariloche Argentina. Carlos Pardo, Universidad Catòlica de Colombia
Aug. 7-9, 2017, Mon-Wed. PROMS 2017: Pacific Rim Objective Measurement Symposium, Sabah, Borneo, Malaysia, proms.promsociety.org/2017/
Aug. 10, 2017, Thurs. In-person Winsteps Training Workshop (M. Linacre, Winsteps), Sydney, Australia. www.winsteps.com/sydneyws.htm
Aug. 11 - Sept. 8, 2017, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com
Aug. 18-21, 2017, Fri.-Mon. IACAT 2017: International Association for Computerized Adaptive Testing, Niigata, Japan, iacat.org
Sept. 15-16, 2017, Fri.-Sat. IOMC 2017: International Outcome Measurement Conference, Chicago, jampress.org/iomc2017.htm
Oct. 13 - Nov. 10, 2017, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Jan. 5 - Feb. 2, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Jan. 10-16, 2018, Wed.-Tues. In-person workshop: Advanced Course in Rasch Measurement Theory and the application of RUMM2030, Perth, Australia (D. Andrich), Announcement
Jan. 17-19, 2018, Wed.-Fri. Rasch Conference: Seventh International Conference on Probabilistic Models for Measurement, Matilda Bay Club, Perth, Australia, Website
May 25 - June 22, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 29 - July 27, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
Aug. 10 - Sept. 7, 2018, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com
Oct. 12 - Nov. 9, 2018, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
The HTML to add "Coming Rasch-related Events" to your webpage is:
<script type="text/javascript" src="http://www.rasch.org/events.txt"></script>

 

The URL of this page is www.rasch.org/rmt/rmt232e.htm

Website: www.rasch.org/rmt/contents.htm