The New Rules of Measurement

What Every Psychologist and Educator Should Know

is the striking title of a recent book edited by Susan E. Embretson and Scott L. Hershberger (Mahway, NJ: Lawrence Erlbaum, 1999). There are 11 informative chapters packed with real-life Rasch-related applications. Solid theory is presented, graphically and through practical implications, rarely as bald algebra.

But how I wish my copy had a global replace feature! In almost every instance where the letters IRT appear, one must replace them with Rasch. For instance, "IRT item parameters are not biased by the population ability distribution" (p. 2). As has been demonstrated repeatedly (e.g., RMT 6:2, 217), this is a characteristic of only the Rasch model and not at all a general characteristic of IRT models.

So what are Susan Embretson's New Rules?

Rule 1: The Standard Error of Measurement

Old Rule 1. The standard error of measurement applies to all scores in a particular population.

New Rule 1. The standard error of persons differs between persons with different response patterns, but generalizes across [similar] populations.

Of course, theorists in the classical tradition know that different raw scores have different standard errors. Nevertheless, "if the score distribution approaches normality, and if obtained scores do not extend over the entire possible range, the standard error of measurement is probably uniform at all score levels" (Guilford, 1965 p. 445). Indeed, a plot on p. 50 (reprinted below) of New Rules confirms that S.E.s can be reasonably uniform across most of the range of raw scores. Also, since the easiest way to compute raw score standard errors is from reliability coefficients, most classical analysts never go beyond computing one global standard error estimate.

So what are the real implications of Rule 1? As New Rules points out, standard errors of measures increase to infinity as scores become extreme. Standard errors of raw scores decrease to zero, misleading the analyst into believing that zero and perfect scores imply exact knowledge of the location of examinees on the latent variable. Further, examinee measures (as opposed to raw scores) are each identified with their own standard error, irrespective of who, if any one, takes the same test. Decisions can be made on an individual rather than group basis.

Rule 2. Test Length and Reliability

Old Rule 2. Longer tests are more reliable than shorter tests.

New Rule 2. Shorter tests can be more reliable than longer tests.

No, as New Rules clarifies, the Spearman-Brown prophecy formula is not revoked. Provided everything stays the same, a longer test of the same sort of items is more reliable than a shorter test. But a longer test is not necessarily more reliable than a different, shorter test. Of course, classicists know this, "Internal-consistency reliability is the greatest when ... the variance of items is greatest. This is when the proportion passing an item is .50" (Guilford p. 464). But classicists couldn't do much with this knowledge, because everyone had to take the same test, and test content was fixed. Now there are item banks and computer-adaptive testing. For instance, a 20-item on-target test can measure more reliably than a 30-item test on which an examinee achieves 80% success, and that can be more reliable than a 50-item test with 90% success.

Rule 3. Interchangeable Test Forms

Old Rule 3. Comparing test scores across multiple forms depends on test parallelism or test equating.

New Rule 3. Comparing test scores across multiple forms is optimal when test difficulty levels vary between persons.

What? Is test equating abolished? No - the emphasis has shifted. The goal is no longer to match the new test to the old test, it is to match the new test to the new person. Item banks are the key. (How did a reference to Wright & Bell, 1984, escape the editors of New Rules?) With pre-calibrated items, parallel forms and equi-percentile equating are obsolete.

Rule 4. Unbiased Assessment of Item Properties.

Old Rule 4. Unbiased assessment of item properties depends on representative samples from the target population.

New Rule 4. Unbiased estimates of item properties may be obtained from unrepresentative samples.

What does bias mean? It means incorrect decisions due to poor test-to-sample targeting. What does representative mean? It means the sample ability distribution matches that of the population. Classical item selection criteria, such as p-value for item difficulty and discrimination index for item quality, are optimal for items targeted on the sample. If the distribution of the pilot sample does not match the distribution of the test population, replacing "bad" items could make the test worse, not better! But even under the best of circumstances, classical analysis is biased against those items which best measure the high and low performers.

Now items are assessed on their own merits. Each item is chosen for the role it plays in constructing measures for those examinees on whom it is targeted, without giving misleading information about others who might happen to encounter it. Each item is designed to be as similar to the other items as possible, in the sense of measuring the same construct and eliciting the same type of behavior from respondents. Each item is also designed to be as different from the other items as possible, in the sense of obtaining its own share of brand-new information about the performance level of respondents.

These four rules are those identified by Susan Embretson (p. 11-14). But New Rules reaches much farther. For instance, a new rule is that raw scores have substantive implications (p. 247-8). Another new rule is that the hierarchy of item difficulty reflects a meaningful, valid construct (p. 248-9). An additional new rule is that examinee response patterns have diagnostic meaning (p. 250-252). And still more rules emerge in chapter after chapter.

John Michael Linacre

Guilford JP. 1965. Fundamental Statistics in Psychology and Education. New York: McGraw-Hill.

Wright BD, Bell SR. 1984. Item banks: what, why, how? Journal of Educational Measurement, 21:4, 331-345.

The New Rules for Measurement Embretson S.E. commented by Linacre, J.M. … Rasch Measurement Transactions, 1999, 13:2 p. 692

Rasch Publications
Rasch Measurement Transactions (free, online) Rasch Measurement research papers (free, online) Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch Applying the Rasch Model 3rd. Ed., Bond & Fox Best Test Design, Wright & Stone
Rating Scale Analysis, Wright & Masters Introduction to Rasch Measurement, E. Smith & R. Smith Introduction to Many-Facet Rasch Measurement, Thomas Eckes Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr. Statistical Analyses for Language Testers, Rita Green
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar Journal of Applied Measurement Rasch models for measurement, David Andrich Constructing Measures, Mark Wilson Rasch Analysis in the Human Sciences, Boone, Stave, Yale
in Spanish: Análisis de Rasch para todos, Agustín Tristán Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez

 Forum Rasch Measurement Forum to discuss any Rasch-related topic

Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
March 21, 2019, Thur. 13th annual meeting of the UK Rasch user group, Cambridge, UK, http://www.cambridgeassessment.org.uk/events/uk-rasch-user-group-2019
April 4 - 8, 2019, Thur.-Mon. NCME annual meeting, Toronto, Canada,https://ncme.connectedcommunity.org/meetings/annual
April 5 - 9, 2019, Fri.-Tue. AERA annual meeting, Toronto, Canada,www.aera.net/Events-Meetings/Annual-Meeting
April 12, 2019, Fri. On-line course: Understanding Rasch Measurement Theory - Master's Level (G. Masters), https://www.acer.org/au/professional-learning/postgraduate/rasch
May 24 - June 21, 2019, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
May 22 - 30, 2019, Wed.-Thu. Measuring and scale construction (with the Rasch Model), University of Manchester, England, https://www.cmist.manchester.ac.uk/study/short/intermediate/measurement-with-the-rasch-model/
June 4 - 7, 2019, Tue.-Fri.In-Person Italian Rasch Analysis Workshop based on RUMM (Fabio La Porta and Serena Caselli; entirely in Italian). Prof David Andrich from Western Australia University will be hosted by the workshop. For enquiries and registration email to workshop.rasch@gmail.com
June 17-19, 2019, Mon.-Wed. In-person workshop, Melbourne, Australia: Applying the Rasch Model in the Human Sciences: Introduction to Rasch measurement (Trevor Bond, Winsteps), Announcement
June 20-21, 2019, Thurs.-Fri. In-person workshop, Melbourne, Australia: Applying the Rasch Model in the Human Sciences: Advanced Rasch measurement with Facets (Trevor Bond, Facets), Announcement
June 28 - July 26, 2019, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
July 2-5, 2019, Tue.-Fri. 2019 International Measurement Confederation (IMEKO) Joint Symposium, St. Petersburg, Russia,https://imeko19-spb.org
July 11-12 & 15-19, 2019, Thu.-Fri. A Course in Rasch Measurement Theory (D.Andrich), University of Western Australia, Perth, Australia, flyer - http://www.education.uwa.edu.au/ppl/courses
Aug 5 - 10, 2019, Mon.-Sat. 6th International Summer School "Applied Psychometrics in Psychology and Education", Institute of Education at HSE University Moscow, Russia.https://ioe.hse.ru/en/announcements/248134963.html
Aug. 9 - Sept. 6, 2019, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com
Aug. 14 - 16, 2019. Wed.-Fri. An Introduction to Rasch Measurement: Theory and Applications (workshop led by Richard M. Smith) https://www.hkr.se/pmhealth2019rs
August 25-30, 2019, Sun.-Fri. Pacific Rim Objective Measurement Society (PROMS) 2019, Surabaya, Indonesia https://proms.promsociety.org/2019/
Oct. 11 - Nov. 8, 2019, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Nov. 3 - Nov. 4, 2019, Sun.-Mon. International Outcome Measurement Conference, Chicago, IL,http://jampress.org/iomc2019.htm
Jan. 24 - Feb. 21, 2020, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
May 22 - June 19, 2020, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 26 - July 24, 2020, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
Aug. 7 - Sept. 4, 2020, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com
Oct. 9 - Nov. 6, 2020, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 25 - July 23, 2021, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com