Does Test-Item Interaction Invalidate one-step (concurrent) Concurrent Equating and Item Banking?

Dear Ben:
I had a chance to read the statements by Matthew Schulz in the SIG newsletter (RMT 1:2, p. 10-11) describing what he called one-step (concurrent) item banking, also known as concurrent equating, and I'm a little worried. I need your thinking, as a scientist, to help me with an important operating principle to rely on in proceeding with research on Rasch technology. Your opinion has been valuable on two other similar situations in the past.

It seems to me, although I'm no physicist, that the situation we face in achievement test data is similar to that described by Heisenberg's Uncertainty Principle with each single data point existing as such only after it is observed.

Before we get to the process for linking groups, let's go over what happens when we define a scale using one group. We can understand that as a student perceives a test item the information that he recalls because it is associated with the item is a part of the apperceptive mass of the student and is processed in a way learned by the student for an item of that kind. When "readiness", both in associative information and information processing, exists to some extent the observation establishes a bit of data useful for defining a scale and for measuring a student. As long as we stay in the area of basic skills, it seems that the latitude in both knowledge and process is wide for practical item calibrating purposes. But it can be abused.

Calibrating items for use in an item bank is quite different from scaling an established test for a defined population, such as the SATs and other published tests for particular grades, because the purposes are different. The item calibration irrespective of any population becomes the focus in item banking. In pinpointing calibration values each instance of non-readiness for an item on the part of a student becomes noise in the information stream. Unfortunately, the computer does not recognize counterproductive data, it always forces a set of numbers regardless of data quality.

Now to my problem, Ben. Just as calibrations generated in a group are influenced by responses from students for whom some of the questions are inappropriate, it has been established repeatedly that links between groups are seriously influenced by instances where an item operates quite differently in one test than it does in the other. Like working with students who exhibit individual differences in so many ways that we cannot consider replication at the individual level, we also find that many differences exist between groups in instructional backgrounds, in chance discussion the previous day, in test administration atmosphere or logistics and etc. For such reasons acceptable items can work well in two different tests as determined by the item characteristic curves and yet be counterproductive in linking because they provide information differently in the two tests -- item interaction is only one plausible explanation.

Again, linking for a bank is different from linking two established tests to establish a scale for some special purpose even when the tests are appropriate for the students taking them. Bank calibrations need to be all purpose values, precisely and uniformly linked in a relationship that holds with other similar calibrations and with other item calibrations several logits away. As you once told us, one item is enough to establish an accurate link if it is the right item. Detecting "right" items, then, becomes the critical factor in the linking equation. Those items far afield in the linking pattern could be the "right" ones, but are most likely adding more noise to the information stream.

Now it seems to me that Matthew Schulz's one-step (concurrent) banking program MFORMS, for all its elegance and efficiency, accepts data uncritically. If calibrating and linking were situations calling for central tendencies of all instances of reality there would be no objection, but in the effort to develop a scale against which these instances of reality are ultimately to be judged it seems that detecting and deleting instances of misapplication of data in both calibrating and linking processes must be avoided to achieve valid linking.

It is likely that I have overlooked something important in this line of thinking, Ben, and if so you will be able to point out where. Otherwise, since we both dislike data free argument, you would be the one to help set up an experiment that would determine quantitatively what, if any, difference results from using MFORMS or the technology that discards questionable data. Two experiments could be made with existing data; running MFORMS with raw data from a testing session in Portland and by using the predetermined calibrations, without setting grade differences, for another MFORMS run. However, since the main attribute of the Rasch model is that it provides symmetry to support item response theory, the crucial experiment has to test the measurement logit - calibration logit relationship, their interchangeability in the technology creating them. This experimenting can only be done in the basic skills, I think, but the design used in Portland's Area III in 1979 to test the flexibility of the item bank and the expectations of the Levels Tests seem both necessary and sufficient to establish the capability of a methodology to produce theoretically specified results.

The sour note in the symphony is logistical. Can you lead the charge to get an MFORMS constructed bank and levels tests developed from it introduced into a large district's testing program? The SIG group might be the place to plant the idea and you are the one to encourage us.

Does test-item interaction invalidate one-step (concurrent) item banking? Ingebo G. … Rasch Measurement Transactions 2:2 p.17-18

Does test-item interaction invalidate one-step (concurrent) item banking? Ingebo G. … Rasch Measurement Transactions, 1988, 2:2 p.17-18

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn. George Engelhard, Jr. & Jue Wang	Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Apr. 21 - 22, 2025, Mon.-Tue.	International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025	On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025	On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
July 21 - 23, 2025, Mon.-Wed.	Pacific Rim Objective Measurement Symposium (PROMS) 2025, www.proms2025.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com