Georg Rasch and Measurement

Informal Remarks by Ben Wright at the Inaugural Meeting of the AERA Rasch Measurement SIG, New Orleans - April 8, 1988

"I'm overjoyed to see so many of you here today. (There were 95 persons in the room.)

There is a fundamental importance to the topic, measurement. In my experience this importance corresponds to the statements of the philosophers of science who say, "There can be no science and no scientific research without measurement." Given our topic, I wonder what the other 6980 participants in this conference do for a living.

I'm going to talk about Georg Rasch and measurement. I have [here] an article that Georg sent me in 1978. On the front he says, "To my first student outside of Denmark." The odd thing is that in 1960 I was not only his first student outside of Denmark, but his only real student. In the years that followed, only one other scholar understood what Georg was talking about and did anything about it. That was Gerhard Fischer who met Rasch about 1967. Rasch had a few students in Denmark. But when you look at their subsequent work, it seems that either they didn't understand what he was talking about or it didn't interest them. At any rate, their work has not advanced Georg's ideas in the way he hoped it would.

My own career led to an identity confusion. As a young physicist in the 1940s, I did a lot of measuring. That's all most physicists do in their whole careers - measure. Physicists think up variables and then they devise ways of measuring those variables. In physics, you keep collecting data until you get the data you want. You don't fit your theory or your ideas to the data that happens to be convenient. You may take a thousand observations and only keep the last four, because you see that the other 996 are no good. You have demanding expectations about what you're doing. The aim is to find data to support your theory, not to find a theory that might fit your data.

I found physics kind of boring and opted for a livelier life. I was 22 years old. It was a brilliant spring morning. The birds were chirping. The girls and boys were flirting and I was copying giant quantum mechanics equations from Willi Zachariasen's blackboard. So I put down my pencil, left the class and left Physics. I worked as a laboratory physicist for a few more years, because I needed the money and had the skills, but I went in search of life.

I looked into the English department and the History department. I finally got caught up in something called Human Development and learned how to do thematic apperception and sentence completion Tests. Then I wrote a short book on how to analyze the Draw-a-Person Test. But for all the fun I was having, this wasn't science and that bothered me. So I kept trying to make it science. In analyzing the Thematic Apperception Test I counted words, but that wasn't any good. If you've worked with that test, you know that making quantitative variables out of it is a mystery. I guess it must be possible, but I couldn't do it. Efforts to quantify the Rorschach are just as fraught with difficulties. The only person who has made any progress is Gerhard Fischer and his Rorschach research has not reached the main stream. You can see my predicament. I had an ethical problem about what I was doing. It was fun but it was more like gossip or reading tea leaves than it was like science. So I got into factor analysis.

I first encountered factor analysis in 1948 when I met Louis Thurstone at a Sigma Xi meeting. I did some analyses by hand, which was tedious. But I got fast with adding machines and cranking calculators. Then I did some work with IBM cards and tabulators and sorters. That was a stevedore's job. You push a cart back and forth in a big room from one card counting machine to another and it takes all night. The machines make a terrible clatter, so your hearing goes. But it was a lot faster than hand calculation and made a lot fewer mistakes, hardly any at all in fact.

UNIVAC I with Alex Orden and Ben Wright (right)

In 1959 we got a Univac I computer and I got hold of a program that did principal components analysis. Now I had a tremendous advantage. The elder experts in factor analysis had been forced to rely on hand methods. Few had done more than 20 or 30 analyses in their entire careers. I worked one day a week as a consultant for a market research firm. I started there by doing Thematic Apperception Tests for $15.00 each. At 45 minutes a test, this was tremendous pay in 1952. One day they asked me to do a factor analysis for one of their clients. So I did.

Soon I was doing 10 to 20 factor analyses a week. Later we got an IBM 7090 computer and I wrote a super factor and regression analysis program in FORTRAN, still on cardboard, however. After a couple of years I had done hundreds of analyses. Not only had I done them, but even more important, I had reported results to demanding clients, not six years later in a journal, but two weeks later so they would pay the bill. In this environment it wasn't enough to say, "This is modern." or "This is wonderful." or "We are not quite sure what these results mean, but we used a great technique." They wanted to know, "What is this worth to us?", "Are you sure these results are solid?" and most demanding of all "If we analyze more data, will we get the same result again"? Some clients did the same study every couple of weeks for a period of years. I ran one study a hundred times with the same instruments, and we factored it every time.

The question was, "Was the factor structure stable"? Well, if you've worked with factor analysis, you know the results aren't stable. When you explore rotations and communalities, you get all kinds of different results from the same data, depending on your choice of methods. Even when you stick with principal components, which a mathematician would tell you to do, the results are sample dependent. When a client asked why this week's results didn't look the same as last week's, you say, "That's how it is with statistics." But by and by that client goes to some other consultant because he wants something steady, not something that varies from week to week.

All this put me in considerable distress and I didn't know what to do. I had a neighbor and good friend, Jimmy Savage, who was a great statistician. We used to discuss all kinds of things, and I tried to convince him that factor analysis was a good idea. He selected David Wallace and Raj Bahadur, two young statisticians, to investigate the problem. (There were a lot of raving factor analysts on the University of Chicago campus in those days.) A series of discussions and debates on factor analysis followed. The great factor analysis war was fought. And the social scientists lost. These two statisticians proved that factor analysis was not a stable enough technique to build results that would hold up.

But there I was making a living doing factor analysis! I had the best computer programs around, at least in the Chicago area, and I could do the analysis faster and cheaper than anyone else. But I felt like a con man one jump ahead of the Sheriff. I smiled and talked like I'm talking to you now, but I grieved in private because I felt like a crook. I didn't see any alternative. This was the way I made my living. This was what clients wanted to buy.

Then Jimmy said he had met a funny Dane named Rasch who had an idea that he claimed would revolutionize educational measurement. Jimmy asked if I would have some students for this Dane if he invited him to come to Chicago for three months and give some lectures. I said, "At the University of Chicago you can't tell students to do anything but I'll invite my classes." So Jimmy invited Georg Rasch and he came [in the spring of 1960].

Georg gave his first lecture to 30 or 40 sociologists, statisticians, psychologists, faculty and students. He gave two lectures a week, [soon there were only two students, John Ginther and me,] and after about two or three weeks, I was the only one left in his class. The social scientists found it too complicated, the psychometricians found it too simple, and the statisticians decided it wasn't their job. They said, "We assume you have these measures before we get involved. Don't tell us your numbers aren't real numbers because we don't know what to do about that. Tell us that you think they're numbers and we'll calculate the mean, standard deviation and do an analysis of variance and everything will be fine."

But, of course, if the numbers aren't really numbers, then everything isn't fine. Even though you follow the right statistical procedures, someone else can change a little irrelevant thing and get different results from the same situation. Then no one knows what the real findings are. And of course, most results do not replicate unless things are so obvious that you don't need to replicate them.

If you used factor analysis to distinguish between arithmetic items and spelling items, and you claimed that you found out which were which by factor analysis, you would look foolish. But that's the level of the results that could prove that factor analysis was stable in terms that a physicist would accept. When we got into Osgood's semantic differential analyses, which we did by the ton, we couldn't keep the evaluation, potency and activity dimensions separate. Some of the statements that were supposed to be on one factor would jump to another factor every now and then. The statements jumped around. We ended up saying, "We don't really know how to define this, but we can kind of hint at what it is." And we would make up some new words to christen the factor. But they could never capture quite the right concept, so it was very embarrassing.

When Georg talked, it was marvelous. He was talking about a one- dimensional factor analysis; no rotation, no secondary factors which are based on a sample dependent error distribution that just happens to be in the data, and with an explicit stochastic basis. (That was another problem with factor analysis, it had no stochastic frame of reference, so there was no standard error for the factor loadings.) Georg's model produced a number which was a measure that you could do arithmetic with. And it did it for both persons and items simultaneously and put them on the same scale. Now you see why I was overjoyed. It was one simple little recipe that solved all my problems. I could stop going to the psychoanalyst to have my schizophrenia mended week by week.

Of course there was no money in this new kind of analysis. Nobody wanted anyone to do these kinds of things. In fact, I was the only one left in the room, so I would have to hire myself. But nevertheless, I felt very lucky.

Georg went back to Denmark, time passed, and Bruce Choppin showed up; I don't know why. It was 1963. There was Bruce with his M.A. from Cambridge in Mathematical Physics. I suppose he wanted to be an educator because he had been a high school math teacher. Somehow I got him interested, or he got me reinterested. Maybe we liked to write computer programs. Pretty soon we were down in the basement of the computation center with our cardboard IBM cards with holes in them, writing these programs. Bruce was good at math and quick, so by 1964 we had written Georg's loge program, which he used in his 1960 book. We also wrote his pair-wise program which he hints at in the back of his book, and his conditional program by calculating his symmetric functions recursively.

We analyzed all the data we could get hold of and things worked beautifully. Not only that, but the three methods gave the same results! So we could show that it didn't matter which method we used unless the data was really dreadful, in which case none of the methods would really handle it. Since I was still in factor analysis a bit and since the fit lines varied in slope, I said "Let's estimate the slopes too."

Georg was very much against this bright idea. Nevertheless, I wrote a two-parameter program in 1964, but couldn't get it to converge. Bruce and I spent night after night with the damn thing trying to figure out what was wrong. Finally, we checked every iteration of every single parameter estimate. Then it was obvious what was wrong. Either a person ability or an item discrimination always went off to infinity. So we thought we would just take out that person or item and continue. But then another person or item would go off to infinity. There was no way to get sensible results. Finally I consulted my mathematical friends, including Adrian Albert who had written an important paper in the '40s showing that factor analysis couldn't be a satisfactory method for constructing knowledge. He helped me see there was no way that this two-parameter procedure would converge unless I introduced some inevitably arbitrary constraint. The choice of the constraint would always alter the results. Perhaps this alteration would not be enough to notice in some cases, but there would always be a dependence on the arbitrary choice of constraint.

You might not care about the effect of reasonable constraints on the results. For example, you might choose some ceiling for the estimates, or force a normal distribution on the original scores. Whichever you chose, a subset of the data might produce stable results, providing the data was not "stressed" too much, but it was all just fooling around.

We used simulated data for these analyses. You can't test a method with real data, because real data is full of junk, and that junk is what the analysis is supposed to identify and see through. To test a new method, you must construct data that you know are right. That way you know the stochastic process and its magnitude and you know the parameters. Then you check to see if your program will recover those things you know. If it doesn't, don't bother analyzing any real data. It's not time yet, because your program isn't working. There is no sense in applying it to real data until you've seen it work on simulated data. Since I couldn't make the two-parameter program work, I discarded it. (I may have an old copy in my closet.)

I went to Denmark in 1964. I spent a couple of months with Georg there. He was a very enthusiastic man. He loved to joke and carouse. He was gracious, gregarious, and witty, a marvelous man! But he went his own way--not connected to any particular psychometric or statistical tradition. Anyway, I showed him the different programs we had written.

In April 1965 Bruce and I gave a nice seminar in Chicago for the Midwest Psychological Association. Paul Blommers from Iowa and two of his students, Gary Ramseyer and Richard Brooks, gave papers. Dave Wallace and Jane Loevinger were the discussants.

Bruce and I gave papers showing the three different estimation algorithms, how they worked with simulated data, and how they compared with one another. There were about as many people there as here today, but we never heard anything about it from them that I can recall. Nevertheless, we had a good time presenting the work and we felt good about it.

Bruce and I went to Denmark in 1965. And we had more fun with Georg. He loved to teach and lecture to a small group. He had a blackboard in his bedroom. We would go to his house in Holte, a suburb of Copenhagen. We'd go into his bedroom and he would lecture. After a couple of hours we'd hear the bottles clinking, and that was Nille, his wife, with the cocktails; she loved cocktails. About 12 o'clock she would come in with all these bottles and canapes and then we would start lunch, which would last two or three hours. We would consume several thousand calories and Aquavit and beer and sleep the rest of the day because we were dying from all the food and drink. But it was a lot of fun.

Then, in 1965, Nargis Panchapakesan came along. She and her husband were nuclear physicists but she couldn't get a job in Chicago right then because we were filled up with physicists. Someone asked me if I couldn't find something for this nice woman to do. So I hired her, and she wrote some computer programs. Then she wrote the JMLE (UCON) program, the unconditional analysis program, which Bruce and I hadn't gotten into yet. We got a nice grant from the National Science Foundation to pay her salary. She ended up getting a Ph.D. in Education to put with her Ph.D. in Physics.

In those years we gave a few more reports to various societies. At the 1967 Psychometrics Society's spring meeting in Madison, I gave a paper showing the application of the conditional method to some law school achievement test data from Educational Testing Service. I thought the presentation went well but the audience wasn't very interested. It wasn't complicated enough for Fred Lord. The only person who was interested, besides Lou Bashaw, was Ledyard Tucker.

Tucker said, "You're going to be disappointed Ben, because you're going to try to sell these things and nobody is going to pay any attention to you. Let me tell you, I've been there. I've been saying things like this for years and nobody pays any attention to me." But I was young, and I thought maybe Tucker didn't talk loudly enough or jump around enough. I thought if I went to enough meetings and made enough noise, it would work out all right.

Then I gave that paper to the ETS Invitational Test Conference in October 1967. The main comment about the paper came from the way I spoke about dividing the sample into two groups, the "dumb" ones and the "smart" ones. Several nice old ladies came to me afterwards and said "We don't like to refer to students as "dumb." You need to label those groups differently." That was the main reaction to my talk at the time.

But this did lead Norman Uhl to dragoon me into giving one of the first AERA Presessions for advanced professional training. It was on the Rasch model. The presession was held in Los Angeles in the spring of 1969. There were five days of lectures at a place and time nowhere near the annual meeting. Georg Rasch came to town and he gave the last lecture.

Dick Woodcock was there, and during the session he developed a beautiful example, the KEYMATH scoring and reporting form, which I have used ever since to show the value of the Rasch model to teachers, children and parents. Dick brought his data along and Clarence Bradford had a Rasch program going at UCLA. Dick analyzed his data and made the KEYMATH picture while we were lecturing. (I could see him working at his desk, looking up so he didn't miss too much of the lecture.) When he was done, he showed me the results.

If you want a good Rasch measurement teaching device, write to American Guidance Service and ask them to send you a bunch of the original KEYMATH forms. John Yackel has been very generous in sharing this useful device for helping people understand why all schools would want to use the Rasch model.

Here is something else we did way back then. It is a piece of computer output labeled March 12, 1967, for a rating scale analysis. This program, called BIGPAR, by Bruce Choppin, estimates as many dimensions for the data as there are steps in the response format. If there are five categories with four steps, you get four components. The program uses a pair-wise method that estimates a matrix of parameters for the items and then discomposes this matrix by principal components. We applied it to some semantic differential data and things like that. That was nice but we didn't pursue it further.

The reason I mention this analysis is because David Andrich's current work, which he reported here a couple of days ago, is a great improvement over our 1967 approach. I believe David's approach will become extremely valuable, important and convenient for people who have data in more than two categories.

The difference between David's approach and our principal component analysis is that David intends his components. He decides what the structure of the item response format should be, according to his theoretical requirements. Then he estimates coefficients for that structure. I think that is more in the spirit of measurement construction than letting a principal component analysis adapt to your data, however strange they may turn out to be, and then mistaking those haphazard results as having some long run significance which, by definition, they can't have because they are local descriptions of a passing situation. Even if they should seem invariant, it would be an accident which would eventually become clear to you as you continued to work with them. Each time you apply the principal component technique you must get slightly different structures. But then, which one is the right structure? You can't get away with that. You have to say ahead of time the kind of structure you want. And then try to make the data serve that structure.

Turning to our situation here today, it is likely that those of us who use Rasch measurement are in the minority as far as American psychometric theory is concerned. That is not so for European theory, however. The Europeans are very sensible and intelligent. They use only Rasch measurement. In terms of actual school practice, however, I think we are way in the majority. More tests are organized and kept track of by the Rasch method or its disguised equivalents than by any other method.

Still, at conferences like this, we feel a bit in the minority, and it's astonishing! The history of educational measurement shows that Rasch's model is a decisive culmination of years of seeking and searching. Edward Thorndike in 1905 laid down what he considered necessary for measurement, and that is exactly what the Rasch model provides. Louis Thurstone in the 1920s spelled out his measurement ideals of invariance and linearity which he approximated through his idea of people being normally distributed. But his good ideas weren't used much or much remembered even by those claiming to be Thurstone experts. You ask an expert on Thurstone what he had to say about invariance and linearity of measurement and he will reply, "Oh I just know about the factor analysis part."

Then Louis Guttman in 1945 set out his requirements for scalability. Well, the Rasch model is the stochastic representation of exactly that requirement. So here we have the best, most revered thinkers in educational measurement and their best ideas. And here we have in the Rasch model a practical realization of exactly these best ideas and yet it is treated as though it were strange, alien and ugly, both too simple and too complicated, something that couldn't possibly work. I don't know whether these critics of the Rasch model don't read, or don't think or didn't start reading until just recently, but it boggles my mind.

Let's turn to the history of quantitative inference. In 1795 when Gauss turned his attention to observation error, he invented least squares and discovered the real significance of the mean. Gauss gave meaning to the mean in his invention of least squares.

In the 1920s Ronald Fisher saw that Gauss's minimization of discrepancies could be implemented as a maximization of the probability that the data observed came from the intended model. This was a decisive step. Gauss minimized the residuals, i.e. adapted his inference to the data. Fisher said, "No, what Gauss meant to do is to maximize the probability that these data come from the intended model." But that puts the model ahead of the data. Fisher didn't put it just that way, and I don't have any evidence from him for interpreting his derivation this way, but that's what it means to me. In doing this, Fisher invented the maximum likelihood method and discovered estimation sufficiency. This was a great discovery even though some of my statistical colleagues don't fully appreciate it as such. Georg Rasch thought so. He studied with Fisher in 1934. He was interested in what Fisher was doing, not analysis of variance, not even maximum likelihood, but in the consequence of sufficiency.

In 1951 Georg was confronted with the problem of equating different reading tests taken at different times by different people. He saw right away that the only way to equate those tests was to find a way to characterize them so that it didn't matter who was taking the tests to produce the data for the characterization. He saw that he had to have a sample free calibration of those tests, or they could not be equated. This meant he had to have a sufficient statistics for the person measure, so that he could condition out the person measure and then learn something independent about the tests which would make it possible to equate them. To do this he invented the Rasch model, and discovered practical objectivity as a result.

At first he was just trying to solve this one problem and make a living. For 30 years he worked as a mathematical consultant in Denmark, taking in all kinds of problems. So he just dug into this one when it came his way and did the best he could. Then it worked. He showed the results to Ragnar Frisch, Norwegian Nobel Laureate in Economics, one of his early teachers. Frisch was surprised at the "disappearance" of the person parameter in Rasch's equations, and since Georg admired Frisch, he started thinking about what had happened from another point of view and then the importance dawned on him. That, I think, was Georg's crowning achievement: his discovery of what he called "specific objectivity."

Now as we look back, objectivity runs through almost everything and emerges as a concern of major interest not only to scientists but also to philosophers. Bill Fisher [William P. Fisher, Jr.], has just written a philosophy thesis on the significance of objectivity in the theory of knowledge. Bill provides a detailed review of objectivity from Socrates and Plato forward, showing how fundamental the implementation of this idea, which Georg stumbled on while trying to equate reading tests, is in all thinking and all logic.

As for the history of measurement theory and the foundations of measuring, in 1920 the physicist, Norman Campbell, in his writing on physics measurement, identified concatenation as the fundamental property of situations that could be used to construct measures. You have to be able to glue things together or pile them on top of each other to do it. Campbell concluded, as did Kant 50 years earlier, that measurement in social science would never be possible because there was no useful way to glue people together or pile them on top of each other in a way that would then, somehow, manifest their combined intelligence. You might get something else, if you piled people on top of each other, but you wouldn't get an addition of some mental variable created by that concatenation.

Then in the 1960s, Duncan Luce and John Tukey, and a few others, showed that if concatenation had any meaning, it was as an abstraction and not merely a physical act; an abstraction which depended on an additivity which must be conjoint. They saw that data matrices which could support measurement would require double cancellation or composite transitivity. This is a requirement which we now recognize as identical to Guttman's scalability, although I don't recall Tukey or the others mentioning this in their work.

Guttman's scalability requirement, and the composite transitivity requirement are stochastically identical to Fisher's sufficiency. Is that a surprise? Think about it. Fisher's sufficiency says, this statistic contains all of the information in the data that can be used to estimate the parameter. In other words, it's sufficient with respect to the way the data bears on the conclusion. What is Guttman's scalability requirement? That from the score you can reconstruct exactly how the person answered each item. It is the same condition, isn't it, just approached from the other direction. Guttman's requirement is deterministic and Fisher's is stochastic, otherwise they are the same. I wouldn't have realized this identity without looking at Georg's work. Georg's liberation of Guttman's idea from the chains of determinism put both the Guttman scale and conjoint additivity on the stochastic basis which is the necessary and sufficient consequence of the requirement for objectivity.

Now I want to draw your attention to a couple of things in the paper I've handed out. The first page (Rasch model derived from Campbell concatenation [original version]) is a meditation on concatenation from a probabilistic point of view. There are three considerations. The first consideration is, "Are two heads better than one?" Campbell makes a lot of the requirement that when you combine two amounts of something, you must always end up with more of it than you had before. So, then, what's the probability when two people try one item that one or the other gets it right? That would be an addition of their intelligences in their effort to deal with this item. The first section shows that the probability of one or the other person getting the item right is going to be more then the probability of either of them getting it right working alone. That's obvious. Now, if you plug in Rasch's model, and go down to the last line of section one, you discover what you get if you are willing to have two kinds of additivity-- additivity on an interval scale and additivity on a ratio scale.

Of course, if it works that way, it has to work the other way (see section two of page one). Are two items harder than one? The answer is, Yes. And the way they get harder is the sum of their difficulties, but you must also add the sum of the exponentials of their difficulties. Why it appears in both scales, I don't know. But I hope some of you will explain this to me. [David Andrich did, and his explanation is now known as "Pack-work" RMT 9:2 p. 432]

Section three is easy to see. If you want to make two people appear the same, we must administer to one of them an item of difficulty just different enough from the item the other one is taking to reach a balance. That is (Bn - Di) has to equal (Bm - Dj). Then the two persons will appear stochastically equivalent. They will have the same probability for success. That's the way we equate people, by adding to the smarter person an item enough harder than the easy item given to the dumber person to cancel how much the first person is smarter than the second. This is conjoint additivity in its simplest form.

Now to come to my last point. Here is a simple algebraic derivation of the Rasch model from the requirement of objectivity (RMT 1:1, 2:1). The fact that it's a derivation means that if you understand this algebra, then you must arrive at the conclusion that the Rasch model is not only sufficient but also necessary for measurement. That conclusion contains in it the exclusion of all alternative models, all models that don't have the simple structure of the Rasch model. All models that mix parameters, exponential and non-exponential, are excluded. There are two such mixtures in the three-parameter model and one in the two-parameter model. This derivation shows that neither two- nor three-parameter models can produce objectivity. It follows that every thoughtful person must realize that those other models cannot produce measurement.

I hope you'll study this derivation because I believe in my heart that everyone here can master it and teach it to their students or even to their grandmothers. It's that simple.

The key comes towards the bottom of the left-hand column, where we compare any two persons, say Richard and me. First we must turn this into a stochastic comparison because we are going to base the comparison on some kind of evidence which will then be only a sample. The probability that Richard does right and I do wrong is shown on the top. That will be Pni for Richard and (1-Pmi) for me. The probability that I do right and Richard does wrong is on the bottom. The ratio of these two probabilities is the comparison of our two abilities. That's the comparison.

Now if you want to measure Richard and me with any generality, that comparison of our abilities must be independent of which item is used. It cannot depend on which items are used. That realization takes us to the last point.

The equation at the bottom of the left column of page two between the two expressions for two different items must hold. That equation is what objectivity means in this case.

If you want objectivity, you must be able to compare Richard and Ben in height without saying which ruler was used, or in weight without saying we used the green scale marked Johnson and Johnson. Those conditions would make nonsense out of the measurements of height or weight. To measure, we must construct a class of items within which it doesn't matter which items we use. We must always get the same comparison. It is a very simple, but absolutely necessary, requirement.

From that requirement, and algebraic manipulation, we arrive at the conclusion that the odds that a person gets any particular item right, that is the probability right over the probability wrong, must be the product of two single-valued functions. One caused by the person. (There may be a million parameters inside that single-valued person function.) And one caused by the item. If you can make item guessing, discrimination and difficulty all come together into a single-valued function, then you can use the single value of that function in the model. Otherwise, forget it. You can decompose your single-item parameter as much as you want. The Dutch and Austrians have been doing that with good and interesting results. But when the item effect gets to the model, it has to be a single-valued function, i.e., one dimension, because the probability must depend entirely on the product of a single-valued person function and a single-valued item function. Then we can take the loge and establish an additive frame of reference with a zero point.

By the way, there are no natural zero points. Zero is an idea we use all the time, but you can't find a zero anywhere in nature. In fact, since zero is "nothing" and you can't see "nothing," the only way you know what you decide to call "nothing" is to know what it was before it wasn't. Zero is necessarily an abstraction, part of a frame of reference. You always need it, but it is not a fact of nature. Indeed nature itself is our own invention. There is hardly anything natural in this room. All of the things here, that chair you are sitting on, the clothes you are wearing, these are all invented. They weren't here before we made them up. Goodness knows what we'll invent next!

So we come to my last words. The Rasch model is not a data model at all. You may use it with data, but it's not a data model. The Rasch model is a definition of measurement, a law of measurement. Indeed it's the law of measurement. It's what we think we have when we have some numbers and use them as though they were measures. And it's the way numbers have to be in order to be analyzed statistically. The Rasch model is the condition that data must meet to qualify for our attention. It's our guide to data good enough to make measures from. And it's our criterion for whether the data with which we are working can be useful to us."

This careful transcript is the loving work of Fred and Shirley Forster. We are deeply grateful to them. This version with a few corrections by Ben Wright.

Georg Rasch and measurement. Wright BD. … Rasch Measurement Transactions, 1988, 2:3 p.25-32

Rasch Publications
Rasch Measurement Transactions (free, online) Rasch Measurement research papers (free, online) Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Applying the Rasch Model 3rd. Ed., Bond & Fox Best Test Design, Wright & Stone
Rating Scale Analysis, Wright & Masters Introduction to Rasch Measurement, E. Smith & R. Smith Introduction to Many-Facet Rasch Measurement, Thomas Eckes Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. Statistical Analyses for Language Testers, Rita Green
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Journal of Applied Measurement Rasch models for measurement, David Andrich Constructing Measures, Mark Wilson Rasch Analysis in the Human Sciences, Boone, Stave, Yale
in Spanish: Análisis de Rasch para todos, Agustín Tristán Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez

To be emailed about new material on
please enter your email address here:

I want to Subscribe: & click below
I want to Unsubscribe: & click below

Please set your SPAM filter to accept emails from welcomes your comments:

Your email address (if you want us to reply):


ForumRasch Measurement Forum to discuss any Rasch-related topic

Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website,

Coming Rasch-related Events
Oct. 6 - Nov. 3, 2023, Fri.-Fri. On-line workshop: Rasch Measurement - Core Topics (E. Smith, Facets),
Oct. 12, 2023, Thursday 5 to 7 pm Colombian timeOn-line workshop: Deconstruyendo el concepto de validez y Discusiones sobre estimaciones de confiabilidad SICAPSI (J. Escobar, C.Pardo)
June 12 - 14, 2024, Wed.-Fri. 1st Scandinavian Applied Measurement Conference, Kristianstad University, Kristianstad, Sweden
Aug. 9 - Sept. 6, 2024, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets),


The URL of this page is