Designing a test that evaluates the quality of students' responses to literature puts one on risky theoretical ground, for it is not a popular position to say that some responses to literature are better than others. However, the insights Rasch analysis of such a test can give more than justify the risk. I'd like to explain the insights I gained in three important areas: test construction, understanding the variable, and understanding the quality of instruction. But because practical benefits alone may not be enough to justify a research practice, I'd like to begin with a brief theoretical justification of my decision to design a test to evaluate the quality of students' responses.
A Rationale for Measuring the Quality of Students' Responses
The need to make this defense may seem odd to some of you, but prevailing critical theories challenge the legitimacy of my decision. As Booth (1974) explains, "One hears it said these days that understanding is not possible in any normative sense; [every person] constructs [his or her] own meanings and the more variety we have the richer we are" (p. ix). The work of Holland (1975) Bleich (1975), Fish (1980) and many others would support such a view.
That is why I decided to consider irony, for irony is one domain in which a strong justification for judging the quality of responses can clearly be made. Booth explains that "we should marvel, in a time when everyone talks so much about the breakdown of values and the widening of communication gaps, at the astonishing agreements stable ironies can produce among us" (p. 82). We see such agreement on a daily basis. When I spill soup on my tie and say, "Nice job, Michael," I expect that my hearers will realize that I am criticizing my clumsiness rather than praising my artistic ability. As Booth explains, we behave similarly when we read ironic literature. When the speaker of "An Unknown Citizen" asks, "Was he free? Was he happy?'' I expect that every experienced reader understands that Auden would answer, "No." And I would further expect that every teacher would agree that a treatment that helps students come to such an understanding would be desirable. Because I chose a type of literature "designed precisely to demand flat and absolute choices" (Booth, p. 128), I was theoretically justified in writing a test to evaluate students' responses to that literature.
The test I designed made four statements about each of seven poems, five of which are ironic, and asked respondents to "strongly agree", "agree", "disagree", or "strongly disagree" with each statement. I intended that the correct response on each item would be either "strongly agree" or "strongly disagree." I designed the test to evaluate three instructional treatments: (1) a direct method, based on research on metacognition in reading, which attempts to give students conscious control of the interpretive strategies experienced readers use to understand irony, (2) a tacit method, which seeks to have students develop their own strategies through extended practice with the genre, and (3) no treatment. (See Smith  for an explanation of the treatments.) Three ninth-grade, six tenth grade and three eleventh grade classes participated in the study. In all, 261 students responded to the 28 item test before they received instruction and 253 students took a revised posttest that included two additional ironic poems.
The test proved to be a reliable measure of students' understanding of irony in poetry. Most important, the person fit statistics suggest that the test fulfills the conditions of the model when I scored it by marking students' responses as either right or wrong. That is, when the desired response was "strongly agree", I counted both "strongly agree" and "agree" as correct and when the desired response was "strongly disagree", I counted both "strongly disagree" and "disagree" as correct. When I used this scoring system to examine the 514 total responses to the test, only fifteen students had infit statistics of more than two, thus signaling performance that would not be predicted by the probability model.
This was not true for all of the scoring systems that I tried. I designed the test to use step scoring. This decision was based on the assumption that the response that students chose depended only on their understanding of the poems. I assumed that students made extreme responses only when they totally understood (or totally misunderstood) the poem. However, that assumption proved to be unjustified. Whereas fewer than 3% of the students misfit with the dichotomous scoring, over 10% of the students misfit when I used step scoring. I hypothesized that the interaction between students' confidence or lack of confidence and their understanding accounted for these misfits.
To test this hypothesis, I scored the responses in yet another way. I assigned the extreme responses ("strongly agree" and "strongly disagree") a one and assigned the moderate responses ("agree" and "disagree") a zero. Only 12% of the students received positive measures in this scoring system.
This group of 12% included three-fourths of the students who misfit. These analyses suggest that the determining factor in some students' response was their self-confidence rather than their understanding of irony in poetry. Therefore, I decided to use the dichotomous scoring in analyzing my results. The most important insight into test construction I gained from developing the test is that interaction between confidence and understanding could spoil step scoring.
Having developed a test and a scoring system that I could use with confidence, I could address instructional issues, one of the chief among them, a description of what it means to understand irony in poetry. The data supported my hypothesis that, in general, the items that were not ironic were easier than those that were ironic. There was a statistically significant difference between the item groups (t= 4.25, p<.001). However, the item calibrations establish that within the ironic items there were substantial variations in difficulty. Initially, I theorized that the difficulty of ironic items would depend on the nature of the poem, its syntax, and imagery. However, an analysis of the item calibrations and of the possible basis for students' responses gave empirical support for Booth's claim that, "Every reader will have . . . difficulty in detecting irony that mocks his own beliefs or characteristics." (1974, p. 81) Rasch analysis of the test suggests that as the ability to understand irony increases, readers are better able to reconstruct ironic meanings in increasingly difficult poems and to reconstruct ironic meanings that challenge their own beliefs and behaviors.
Understanding the Instruction
Rasch analysis also provides unique benefits in evaluating the instruction. One advantage is that logit measures give a clear sense of the magnitude of students' change. The students who received the direct instruction improved, on average, .49 logits from the pretest to the posttest, which means the probability for success on the average ironic item for the average student increased from 50 to 67 percent. The students who received the tacit instruction improved .44 logits, which means the probability for success on the average ironic item for the average student increased from 50 to 56 percent. The students who received no treatment performed worse by .04 logits.
In addition to giving a clear sense of how much students improved, Rasch analysis also allows one to take a closer look at the effects of a treatment. Although there were no statistically significant differences between the performance of the direct and tacit groups on a repeated measure analysis, that does not mean that one may not be more desirable than the other. Because it calculates a standard error of measurement for each individual, Rasch analysis allows researchers to examine the performance of individual students. To be sure, we want to know how a curriculum is affecting groups of students. However, we never want to lose sight of how it is affecting the individuals within these groups. Therefore, I examined how the direct and tacit treatments affected individual students at each grade level.
Seven of the 26 ninth-grade students who received direct instruction improved by two or more standard errors. Seven additional students improved by one or more standard errors. Two students performed worse by two or more standard errors. Given the assumption that the posttest population is likely to have a higher mean than the pretest group, a change greater than one standard error indicates that there is approximately an 84% likelihood that the change is not a result of chance. While this is not a traditional level of statistical significance, it is a noteworthy change for an individual, particularly since the standard error of a short test is relatively high, especially at the top of the scale. A change of two or more standard errors indicates that there is approximately a 98 % likelihood that the change is not a result of chance, a change almost always regarded as statistically significant.
The ninth graders showed by far the largest differences between the two methods. This makes sense, for ninth-grade students were the least experienced readers in the study. Consequently, they had had fewer chances to develop their own interpretive strategies. If a task is easy, one can do it naturally. Reflection on the process is not necessary. However, when a task is problematic, it is important to have conscious control of the strategies one may use to accomplish it, something the direct instruction provided.
The direct method also appears to carry additional risk. Two ninth- graders and two other students who received the direct instruction scored more than one standard error worse on the posttest whereas this was only true for one student who received the tacit instruction. Because readers have a limited capacity for attention, drawing attention to interpretive strategies may take away from the attention some students pay to the critical details of a text.
Logit measures help one understand the magnitude of a treatment's effect, and a standard error of measurement for individuals allows one to examine the effect of a treatment on individual students. Another advantage of Rasch analysis is that fit statistics can provide insight into the effectiveness of a treatment. Only 3% students misfit when I used dichotomous scoring. When I analyzed these misfits, I found that 87% of them were on posttests. I also found that the highest residuals came on the two poems that were not ironic. This made sense. I hypothesized that some students who had just finished a unit on irony might be expected to read ironically even when the text does not call for such a reading. To test this hypothesis, I recalibrated after eliminating the questions on the two poems that were not ironic. All but two of the misfits were eliminated by this procedure. This suggests that the instruction should devote more time to what Booth (1974) calls "knowing when to stop."
Rasch analysis provided me a number of important benefits. It helped me design a reliable instrument and determine an effective scoring system for that instrument. It helped me gain important insights into the variable I was measuring. And it helped me evaluate the effectiveness of my experimental treatments by giving me a clear sense of the magnitude of students' change and allowing me to look closely at the performance of individual students. Evaluating the quality of students' responses may not be a popular practice, but Rasch analysis makes it a profitable one.
Bleich, D. (1975). Readings and feelings: An introduction to subjective criticism. Urbana, IL: NCTE.
Booth, W. (1974). A rhetoric of irony. Chicago: University of Chicago Press.
Fish, S. (1980). Is there a text in this class? Cambridge: Harvard University Press.
Holland, N. (1975). Five readers reading. New Haven: Yale University Press.
Smith, M. Teaching the interpretation of irony in poetry. Research in the Teaching of English, 23, 254-272.
"It is in Socrates that the concept of irony has its inception in the world.
Concepts, like individuals,have their histories and are just as incapable of
withstanding the ravages of time as are individuals."
Soren Kierkegaard in The Concept of Irony, 1841
Understanding of Irony in Poetry, M W Smith Rasch Measurement Transactions, 1990, 4:1 p. 89-91
|Rasch Measurement Transactions (free, online)||Rasch Measurement research papers (free, online)||Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch||Applying the Rasch Model 3rd. Ed., Bond & Fox||Best Test Design, Wright & Stone|
|Rating Scale Analysis, Wright & Masters||Introduction to Rasch Measurement, E. Smith & R. Smith||Introduction to Many-Facet Rasch Measurement, Thomas Eckes||Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr.||Statistical Analyses for Language Testers, Rita Green|
|Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar||Journal of Applied Measurement||Rasch models for measurement, David Andrich||Constructing Measures, Mark Wilson||Rasch Analysis in the Human Sciences, Boone, Stave, Yale|
|in Spanish:||Análisis de Rasch para todos, Agustín Tristán||Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez|
|Forum||Rasch Measurement Forum to discuss any Rasch-related topic|
Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement
Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.
|Coming Rasch-related Events|
|Aug. 14 - 16, 2019. Wed.-Fri.||An Introduction to Rasch Measurement: Theory and Applications (workshop led by Richard M. Smith) https://www.hkr.se/pmhealth2019rs|
|August 25-30, 2019, Sun.-Fri.||Pacific Rim Objective Measurement Society (PROMS) 2019, Surabaya, Indonesia https://proms.promsociety.org/2019/|
|Oct. 11 - Nov. 8, 2019, Fri.-Fri.||On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com|
|Nov. 3 - Nov. 4, 2019, Sun.-Mon.||International Outcome Measurement Conference, Chicago, IL,http://jampress.org/iomc2019.htm|
|Jan. 24 - Feb. 21, 2020, Fri.-Fri.||On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com|
|May 22 - June 19, 2020, Fri.-Fri.||On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com|
|June 26 - July 24, 2020, Fri.-Fri.||On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com|
|Aug. 7 - Sept. 4, 2020, Fri.-Fri.||On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com|
|Oct. 9 - Nov. 6, 2020, Fri.-Fri.||On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com|
|June 25 - July 23, 2021, Fri.-Fri.||On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com|
The URL of this page is www.rasch.org/rmt/rmt41a.htm