The following letter was sent from George Ingebo to Ben Wright
I had a chance to read the statements by Matthew Schulz in the SIG newsletter (RMT 1:2, p. 10-11) describing what he called one-step (concurrent) item banking, also known as concurrent equating, and I'm a little worried. I need your thinking, as a scientist, to help me with an important operating principle to rely on in proceeding with research on Rasch technology. Your opinion has been valuable on two other similar situations in the past.
It seems to me, although I'm no physicist, that the situation we face in achievement test data is similar to that described by Heisenberg's Uncertainty Principle with each single data point existing as such only after it is observed.
Before we get to the process for linking groups, let's go over what happens when we define a scale using one group. We can understand that as a student perceives a test item the information that he recalls because it is associated with the item is a part of the apperceptive mass of the student and is processed in a way learned by the student for an item of that kind. When "readiness", both in associative information and information processing, exists to some extent the observation establishes a bit of data useful for defining a scale and for measuring a student. As long as we stay in the area of basic skills, it seems that the latitude in both knowledge and process is wide for practical item calibrating purposes. But it can be abused.
Calibrating items for use in an item bank is quite different from scaling an established test for a defined population, such as the SATs and other published tests for particular grades, because the purposes are different. The item calibration irrespective of any population becomes the focus in item banking. In pinpointing calibration values each instance of non-readiness for an item on the part of a student becomes noise in the information stream. Unfortunately, the computer does not recognize counterproductive data, it always forces a set of numbers regardless of data quality.
Now to my problem, Ben. Just as calibrations generated in a group are influenced by responses from students for whom some of the questions are inappropriate, it has been established repeatedly that links between groups are seriously influenced by instances where an item operates quite differently in one test than it does in the other. Like working with students who exhibit individual differences in so many ways that we cannot consider replication at the individual level, we also find that many differences exist between groups in instructional backgrounds, in chance discussion the previous day, in test administration atmosphere or logistics and etc. For such reasons acceptable items can work well in two different tests as determined by the item characteristic curves and yet be counterproductive in linking because they provide information differently in the two tests -- item interaction is only one plausible explanation.
Again, linking for a bank is different from linking two established tests to establish a scale for some special purpose even when the tests are appropriate for the students taking them. Bank calibrations need to be all purpose values, precisely and uniformly linked in a relationship that holds with other similar calibrations and with other item calibrations several logits away. As you once told us, one item is enough to establish an accurate link if it is the right item. Detecting "right" items, then, becomes the critical factor in the linking equation. Those items far afield in the linking pattern could be the "right" ones, but are most likely adding more noise to the information stream.
Now it seems to me that Matthew Schulz's one-step (concurrent) banking program MFORMS, for all its elegance and efficiency, accepts data uncritically. If calibrating and linking were situations calling for central tendencies of all instances of reality there would be no objection, but in the effort to develop a scale against which these instances of reality are ultimately to be judged it seems that detecting and deleting instances of misapplication of data in both calibrating and linking processes must be avoided to achieve valid linking.
It is likely that I have overlooked something important in this line of thinking, Ben, and if so you will be able to point out where. Otherwise, since we both dislike data free argument, you would be the one to help set up an experiment that would determine quantitatively what, if any, difference results from using MFORMS or the technology that discards questionable data. Two experiments could be made with existing data; running MFORMS with raw data from a testing session in Portland and by using the predetermined calibrations, without setting grade differences, for another MFORMS run. However, since the main attribute of the Rasch model is that it provides symmetry to support item response theory, the crucial experiment has to test the measurement logit - calibration logit relationship, their interchangeability in the technology creating them. This experimenting can only be done in the basic skills, I think, but the design used in Portland's Area III in 1979 to test the flexibility of the item bank and the expectations of the Levels Tests seem both necessary and sufficient to establish the capability of a methodology to produce theoretically specified results.
The sour note in the symphony is logistical. Can you lead the charge to get an MFORMS constructed bank and levels tests developed from it introduced into a large district's testing program? The SIG group might be the place to plant the idea and you are the one to encourage us.
Does test-item interaction invalidate one-step (concurrent) item banking? Ingebo G. Rasch Measurement Transactions 2:2 p.17-18
Does test-item interaction invalidate one-step (concurrent) item banking? Ingebo G. Rasch Measurement Transactions, 1988, 2:2 p.17-18
|Rasch Measurement Transactions (free, online)||Rasch Measurement research papers (free, online)||Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch||Applying the Rasch Model 3rd. Ed., Bond & Fox||Best Test Design, Wright & Stone|
|Rating Scale Analysis, Wright & Masters||Introduction to Rasch Measurement, E. Smith & R. Smith||Introduction to Many-Facet Rasch Measurement, Thomas Eckes||Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr.||Statistical Analyses for Language Testers, Rita Green|
|Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar||Journal of Applied Measurement||Rasch models for measurement, David Andrich||Constructing Measures, Mark Wilson||Rasch Analysis in the Human Sciences, Boone, Stave, Yale|
|in Spanish:||Análisis de Rasch para todos, Agustín Tristán||Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez|
|Forum||Rasch Measurement Forum to discuss any Rasch-related topic|
Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement
Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.
|Coming Rasch-related Events|
|June 23 - July 21, 2023, Fri.-Fri.||On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com|
|Aug. 11 - Sept. 8, 2023, Fri.-Fri.||On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com|
The URL of this page is www.rasch.org/rmt/rmt22a.htm