Item Specification vs. Item Banking

Our thesis is simple and straightforward. It is not necessary to have a bank of items for measuring a construct when we possess an algorithm for writing an item at any desired level of difficulty. The algorithm is the key to the bank, so to speak. If one has the key, the bank is open.

Bruce Choppin (1968) was an early Rasch pioneer who promoted item bank development. Items representative of the variable of interest are banked and selected for use as required. Leveled paper-pencil tests can be quickly assembled from the bank of items based on their associated item calibrations and item use histories. Also, computer based adaptive tests can be assembled electronically and targeted to each examinee. As useful as item banking has proven to be it is possible to move beyond the banking of individual items and their associated item statistics.

When enough is known about what causes item difficulty a specification equation can be written that yields a theory based item calibration for any item the computer software designs. An item's calibration is seen to be the consequence of decisions the computer software makes in constructing the item. This process mimics the steps a human item writer takes in constructing an item, albeit, with more control over the causal recipe for item difficulty. A thesis of this paper is that when asserting that a measure possesses construct validity there is no better evidence than demonstrated experimental control over the causes of item difficulty.

A measurement instrument embodies a construct theory; a story about what it means to move up and down a scale (Stenner, Smith & Burdick, 1983). Such a theory should be vigorously tested. In a demonstration of these methods Stone (2002) theorized that the difficulty of short term memory and attention items (Knox Cube Test) was caused by (1) number of taps, (2) number of reverses in the direction of the tapping pattern and (3) total distance in taps for the pattern. This theory was tested by regressing the observed item difficulties on the above mentioned three variables. The Figure plots the correspondence between predicted (theoretical) item difficulties and observed item difficulties. Ninety-eight percent (98%) of the variation in observed item difficulties was explained by number of taps (standardized Beta=.80) and distance covered (standardized Beta=.20). Number of reverses in the context of these two predictors made no independent contributions. An earlier study (Stenner and Smith, 1982) using different samples of items and persons found that an equation employing the same two variables explained 93% of the item difficulty variance. Finally, Stone (2002) re-analyzed KCT-like items developed over the last century and found a striking correspondence between the two variable theory and observation. We should note that there is some uncertainty in the observed item difficulties analyzed in these studies, suggesting that the disattenuated correlation between theory and observation approaches unity.

When item difficulties and by implication person measures are under control of a construct theory and associated specification equation it becomes possible to engineer items on demand. No need to develop more items than you need, pilot test these items, estimate item calibrations and then bank the best of these items for use on future instruments. Rather, when an instrument is needed an algorithm generates items to a target test specification along with calibrations for each item.

Applications that incorporate the above ideas are under development for the next KCT revision and for an on line reading program that builds reading items real time as the reader progresses through an electronic text.

Some of the practical benefits of what might be called theory referenced measurement are (1) if the process yields reproducible person measures, then evidence for construct validity is strong, (2) test security is facilitated because there are no extant instruments that would be compromised upon release, and (3) a fully computerized procedure keeps the process under tight quality control at a fraction of the cost of traditional item standardization procedures.

Finally, one well-recognized means of supporting an inference about what causes item difficulty is to experimentally manipulate the variables in the specification equation and observe whether the predicted item difficulties materialize when examinees take the items. In building the latest version of the KCT a part of the scale had an insufficient number of items. The specification equation was used to engineer candidate items to fill in the space. Subsequent data collection confirmed that the items behaved in accord with theoretical predictions (Stone, 2002). Although this exercise involved only four items, it suggests that the construct specification equation is a causal representation (rather than merely descriptive) of the construct variance.

Reflecting on this extraordinary agreement between observation and theory suggests two conclusions: (1) the specification equation affords a nearly complete account of what makes items difficult, and (2) the Rasch model used to linearize the ratios of counts correct/counts incorrect must be producing an equal interval scale or a linear equation could not account for such a high proportion of the reliable variation in item difficulties.

Measurement of constructs evolves along a predictable course. Early in a constructs history measurements are subjective, awkward to implement, inaccurate and poorly understood. The king's foot as a measure of length is an illustration. With time, standards are introduced, common metrics are imposed, artifacts are adopted, (e.g. the meter bar) precision is increased and use becomes ubiquitous. Finally, the process of abstraction leaps forward again and the concrete artifact based framework is left behind in favor of a theoretical process for defining and maintaining a unit of length (oscillations of a cesium atom). Human science instrumentation similarly evolves along this pathway of increasing abstraction. In the early stages a construct and unit of measurement are inseparable from a single instrument. In time multiple instruments come to share a common metric, item banking becomes commonplace and finally, the construct is specified. When a specification equation exists for a construct and accounts for a high percentage of the reliable variance in item difficulties (or ensembles) the construct is no longer operationalized by a bank of items but rather by the causal recipe for generating items with pre-specified attributes.

Choppin, B. (1968). Item banking using sample-free calibration. Nature, 219 (5156), 870-872.

Stenner, A. J. & Smith, M. (1982). Testing construct theories. Perceptual and Motor Skills, 55, 415-426.

Stenner, A. J., Smith, M. & Burdick, D. S. (1983). Toward a theory of construct definition. Journal of Educational Measurement, 20 (4), 305-315.

Stone, M. H. (2002). Quality control in testing. Popular Measurement, 4 (1), 15-23.

Item Specification vs. Item Banking, Stenner A.J. & Stone M.H. … Rasch Measurement Transactions, 2003, 17:3 p.929-930

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn. George Engelhard, Jr. & Jue Wang	Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Apr. 21 - 22, 2025, Mon.-Tue.	International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025	On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025	On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
July 21 - 23, 2025, Mon.-Wed.	Pacific Rim Objective Measurement Symposium (PROMS) 2025, www.proms2025.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com