"If" or "When" to Assess?

"No single assessment or combination of assessments offers the whole picture. The greater question to me is not if we use a given assessment tool (be it testing or other) but when to do so." (John Roope)

True. We never get the whole picture, but multiple methods shine different kinds of light from different angles, and can provide real illumination. But what does it really mean that "No single assessment or combination of assessments offers the whole picture." It means that any sample of test questions or assessment criteria are inherently incomplete. We can never imagine and pose every conceivable kind of problem that a student could possibly some day encounter. The assessment problem is then one of sampling from an infinite universe of possible questions and criteria, making sure that the ones chosen actually belong to that common universe (typically called a population), that they adequately represent it, and then calibrating the ones chosen so that they measure in a quantitative unit that any other similar sample from the same population will also measure in.

This is what Rasch measurement is all about, this is what gets people like me excited, because we have seen this work in practice over and over. Traditional methods focus on counts of right answers, or on sums of ratings, but those are only a preliminary step in the process.

Statistical models are usually chosen so as to describe the raw data. But what is the point of describing data that will never happen again in their specific detail? The raw data are inherently dependent on the particular questions asked, and cannot be generalized so as to be comparable with the scores likely to be obtained by the students on another, or even the same, sample of questions. We are then deliberately and prematurely restricting ourselves to the limited picture we have in hand, without even checking to see if a better picture can be brought into focus.

Rasch models are chosen so as to obtain generalizable comparability across samples of items and/or examinees/respondents. Instead of fitting models to data, and rejecting models that don't describe the data well enough, Rasch models prescribe data structures general enough to think with, and non-constructive data are rejected. This process much more closely approximates what has historically worked in experimental science.

If we can't generalize from our data, no amount of statistical hocus pocus is going to construct meaningful results. But if we start from a strong sense of how meaningful results are constructed, and we carefully monitor the process, we stand a pretty good chance of mediating past and future. By that last phrase, I mean that the past data we have in hand are only something we can learn from to help manage the future. We can't do anything about the past. Those data are history. But maybe we can extract a general structure from them that we see applies over and over again across data sets. So we try to take advantage of that structure by building it into a measuring instrument that will tell us what is going on with a child, in detailed quantitative and qualitative terms, at the very moment that the measure is made.

And that brings us to John's second point: "The greater question to me is not if we use a given assessment tool (be it testing or other) but when to do so." Right here is the crux of my passion. I start from the observation that everyday conversational language is an assessment tool that precedes formal testing, and it has a lot still to teach us about what testing could be.

Before getting to the if and when, think again about what assessment is supposed to do. It is supposed to let us know how we're doing, right? So how do we know when we know? One criterion for whether someone knows something for themselves is whether they can explain it in their own words. As Albert Einstein is reported to have said, "You do not really understand something unless you can explain it to your grandmother."

Well then, for different assessments and tests, intended to measure the same thing, to be shown to do so, and to do so in the same amount, is just another way of showing that we know what we're talking about, right? And because we, by and large, have hardly even started to assess experimentally the quality of our assessments and tests, we don't really have a clue as to whether we know what we're talking about, do we?

Plato's Dictum: "Let no one a-geometric enter!" (from above the door to his Academy.)
From above the door of Geoffrey Opat, late Professor of Physics at the University of Melbourne, Australia.

We might start by further consideration of where we're coming from. The ancient Greek term, ta mathemata, (the mathematical,) was used to refer to anything teachable and learnable. Mathematical thinking is characterized by its abstract generality and lack of dependence on concrete particulars, not primarily by an association with number. So conversation can function as a test that has qualitatively mathematical consequences in the form of what is found to be teachable and learnable. In fact, it seems that in order to read or hear, and to learn from what is read or heard, one must have implicitly in mind the question to which the thing learned is an answer.

Developmentally, don't infants and kids also test themselves against the things they come up against, learning at the level they're at by means of the challenges posed by their environments? Then shouldn't their tests and assessments be sampling that same population of questions in determining whether children know something or not, or in assessing their developmental readiness?

And now we're at the if/when issue. The current fashion of high stakes tests poses hoops that are jumped through just one time and then are left behind forever. Teachers try to help their students get through those hoops by fashioning their own hoops to as nearly the same specifications as the high stakes ones. And then the results are used to decide whether the kid advances a grade, or gets into college, or if the school gets praise or money or a new principal.

But didn't we just see that actual daily life is a process of constant testing? Life doesn't pose rare high stakes tests, and it doesn't prepare us for them by posing other tests that are as similar as possible to the "real" one. Instead we have a constant random sampling of problems from each particular domain or construct. Some kinds of problems come up fairly frequently, and we develop routines for dealing with them.

But the point is, wouldn't a superior testing and assessment environment be built the way life tests us? We would want at least a small sample of problems to be posed daily, as, in fact, they already are in many textbooks and classrooms. The key difference is to calibrate these problems so that 1) the teacher, the student, the parent, and anyone who cares to look can see that the challenge posed is relevant to the student's level of ability, and 2) success on a new more difficult challenge can be immediately related to a likely measure on the high stakes assessment. In fact, given frequent and reproducible daily measures, there might be no further need for the high stakes assessments.

Rasch models help us implement tests and assessments patterned after the way that life itself challenges and promotes growth and development. Rasch's models are the tools we need to check the offspring of our assessments and tests for viability, because tests of sample- and scale-independence are what they provide. Because they are probabilistic, they support adaptive administration, meaning that short tests could be administered daily, and results obtained in a standard uniform metric that would inform the student and teacher as to the most relevant point of entry in the curriculum. These processes could help to integrate teaching and testing in a way that brings back to life the ancient Greek connection between the curriculum and the mathematical in ta mathemata.

The biggest obstacle to actually implementing an approach like this is that we have not yet created the measurement-friendly environment, the ecological niche, in which Rasch-born constructs can thrive. Many carefully designed instruments have been closely studied and precisely calibrated, but exist in complete isolation from 1) other similar instruments that in all likelihood could measure in the same unit and could also throw considerable new light on the theory of the construct, and from 2) the communities of practitioners and researchers who could benefit from the sharing of common languages for exchanging qualitative and quantitative value. Let us push our own work in the direction of creating these niches.

William P. Fisher, Jr.

If and when to assess? Fisher WP Jr. … 16:1 p.864-5.

If and when to assess? Fisher WP Jr. … Rasch Measurement Transactions, 2002, 16:1 p.864-5.

Rasch Publications
Rasch Measurement Transactions (free, online) Rasch Measurement research papers (free, online) Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Applying the Rasch Model 3rd. Ed., Bond & Fox Best Test Design, Wright & Stone
Rating Scale Analysis, Wright & Masters Introduction to Rasch Measurement, E. Smith & R. Smith Introduction to Many-Facet Rasch Measurement, Thomas Eckes Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. Statistical Analyses for Language Testers, Rita Green
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Journal of Applied Measurement Rasch models for measurement, David Andrich Constructing Measures, Mark Wilson Rasch Analysis in the Human Sciences, Boone, Stave, Yale
in Spanish: Análisis de Rasch para todos, Agustín Tristán Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez

To be emailed about new material on www.rasch.org
please enter your email address here:

I want to Subscribe: & click below
I want to Unsubscribe: & click below

Please set your SPAM filter to accept emails from Rasch.org

www.rasch.org welcomes your comments:

Your email address (if you want us to reply):


ForumRasch Measurement Forum to discuss any Rasch-related topic

Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Oct. 6 - Nov. 3, 2023, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Facets), www.statistics.com
Oct. 12, 2023, Thursday 5 to 7 pm Colombian timeOn-line workshop: Deconstruyendo el concepto de validez y Discusiones sobre estimaciones de confiabilidad SICAPSI (J. Escobar, C.Pardo) www.colpsic.org.co
June 12 - 14, 2024, Wed.-Fri. 1st Scandinavian Applied Measurement Conference, Kristianstad University, Kristianstad, Sweden http://www.hkr.se/samc2024
Aug. 9 - Sept. 6, 2024, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com


The URL of this page is www.rasch.org/rmt/rmt161i.htm

Website: www.rasch.org/rmt/contents.htm