Miscellaneous Rasch, Measurement and Related References

Ackerman T.A. (1992) A Didactic Explanation of Item Bias, Item Impact, and Item Validity from a Multidimensional Perspective. Journal of Educational Measurement 29(1): 67-91.

Ackerman T.A. (1994) Using multidimensional item response theory to understand what items and tests are measuring. Applied Measurement in Education, 7, 255-278.

Ackerman T.A. (1996) Developments in multidimensional item response theory. Applied Psychological Measurement 20, 309-310.

Ackerman T.A. (1996) Graphical representation of multidimensional item response theory analyses. Applied Psychological Measurement 20, 311-329.

Adams R.J. & Khoo, S-T. (1996) Quest: The interactive test analysis system, Version 2. Camberwell: ACER.

Adams R.J., Wilson M.R. & Wang W.C. (1997) The Multidimensional Random Coefficients Multinomial Logit Model. Applied Psychological Measurement, 21, 1, 1-23.

Adams R.J., Wilson M.R. & Wang W.C. (1997) The random coefficients multinomial logit model Applied Psychological Measurement 2111-25.

Adams R.J., Wilson M.R. & Wu M.L. (1997) Multilevel item response models: An approach to errors in variables regression. Journal of Educational and Behavioural Statistics, 22(1), 46-57.

Agresti A. & Lang, Joseph B. (1993) A Proportional Odds Model With Subject-specific Effects for Repeated Ordered Categorical Responses, Biometrika, 80, 527-534,

Agresti A. (1992) Modelling Patterns of Agreement and Disagreement, Statistical Methods in Medical Research, 1, 201-218,

Agresti A. (1993) Computing Conditional Maximum Likelihood Estimates CMLE for Generalized Rasch Models Using Simple Loglinear Models With Diagonals Parameters, Scandinavian Journal of Statistics, 20, 63-71

Akkermans W. & Muraki E. (1997) Item information and discrimination functions for trinary PCM items. Psychometrika, 62, 569-578.

Albers W., Does R.J.M.M., Imbos, T.J. & Janssen. (1989) M.P.E., A Stochastic Growth Model Applied to Repeated Tests of Academic Knowledge, Psychometrika, 54, 451-466,

Alderson Charles J. (1990) Testing reading comprehension skills (part one). Reading in a Foreign Language, 6(2) 425-438.

Allalouf A., Hambleton, R.K. & Sireci, S.G. (1999) Identifying the cause of DIF in translated verbal items. Journal of Educational Measurement, 36(3), 185-198.

Allen M. & Yen, W.M. (1979) Introduction to measurement theory. Monterey, CA: Brooks/Cole.

Allerup P., Bech, P., Loldrup, D., Alvarez, P., Banegil, T., Styles, I. & Tenenbaum, G. (1994) Psychiatric, business, and psychological applications of fundamental measurement models. International Journal of Educational Research, 21(6), 611-22.

Altman D.G., Bland J.M. (1983) Measurement in Medicine: The analysis of method comparison studies. The Statistician 32: 307-317.

Alvarez P. & Galera, C. (2001) Industrial Marketing Applications of Quantum Measurement Techniques. Industrial Marketing Management 30, 13-22.

Alvarez P., Blanco, M. (2000) Reliability of the sensory analysis of a panel of tasters. Journal of the Science of Food and Agriculture 80, 409-418.

Alvarez P., Escalona, I. & Pulgarin, A. ( 2000) What is wrong with Obsolescence? Journal of the American Society for Information Science 51 (9), 812-815.

Alvarez P., Jaen, J., Roman, P., Alonso, E. Salas, C., Bayo, E. Peña, M.D., Pimentel, F.L. (1996) Quality of life as a latent variable measured by Rasch model in cancer patients. Journal d'Economie Medicale. Numero hors serie-14 année.

Alvarez P., Pulgarin, A. (1996) The Rasch model; measuring information from keywords: the diabetes field. Journal of the American Society for Information Science. 47, 468-76.

Alvarez P., Pulgarin, A. (1996) The Rasch model; measuring the impact of scientific journals: analytical chemistry (using SCI journal citation reports) Journal of the American Society for Information Science. 47, 458-67.

Alvarez P., Pulgarin, A. (1997) Application of the Rasch Model to Measuring the Impact of Scientific Journals. Publishing Research Quaterly. Volume 12, Number 4, 57-64 Winter.

Alvarez P., Pulgarin, A. (1997) The Diffusion of Scientific Journals Analyzed through Citations. Journal of the American Society for Information Science. 48 (10), 953-958.

Alvarez P., Pulgarin, A. (1998) Equating Research Production in Different Fields. Information Processing and Management. Vol. 34, No 4, pp 465-470.

Alvarez P., Pulgarin, A. (1999) Measuring information through the topical subheadings of the Medline database. Journal of Information Science. 25 (5), pp. 395-402.

Ames C. (1992) Classrooms: Goals, structures and student motivation. Journal of Educational Psychology, 84, 261-271.

Andersen Erling B. (1972) The Numerical Solution of a Set of Conditional Estimation Equations, Journal of the Royal Statistical Society, Series B, Methodological, 34, 42-54,

Andersen Erling B. (1973) Conditional inference and models for measuring. Copenhagen: Mentalhygiejnisk Forskninginstitut.

Andersen Erling B. (1973) Conditional inference for multiple choice questionnaires. British Journal of Mathematical and Statistical Psychology, 26, 31-44.

Andersen Erling B. (1973) A goodness-of-fit test for the Rasch model. Psychometrika, 38, 123-140.

Andersen Erling B. (1977) Sufficient statistics and latent trait models. Psychometrika, 42, 69-81.

Andersen Erling B. (1982) Obituary: Georg Rasch, 1901-1980, Psychometrika, 47, 375-376

Andersen Erling B. (1983) A general latent structure model for contingency table data. In H. Wainer & S. Messick (Eds.), Principles of Modern Psychological Measurement (pp. 117-139) New Jersey : Lawrence Erlbaum Associates.

Andersen Erling B. (1985) Estimating latent correlations between repeated testings. Psychometrika 50, 3-16.

Andersen Erling B. (1986) Georg Rasch, Encyclopedia of Statistical Sciences (9 vols. plus Supplement), 7, Wiley (New York), 626-627

Andersen Erling B. (1997) The rating scale model. In W.J. van der Linden & R.K. Hambleton (Eds.), Handbook of modern item response theory (pp. 67 - 84) New York: Springer.

Anderson Erling B. & Madsen, Mette. (1977) Estimating the Parameters of the Latent Population Distribution, Psychometrika, 42, 357-374,

Andiel C. (1995) Rasch Analysis : A description of the model and related issues. Canadian Journal of Rehabilitation. 9(1) : 17-25.

Andrich D. & Luo G. (1993) A Hyperbolic Cosine Latent Trait Model for Unfolding Dichotomous Single-stimulus Responses, Applied Psychological Measurement, 17, 253-276,

Andrich D. & Luo, G. (1993) A hyperbolic cosine latent trait model for unfolding dichotomous single-stimulas responses. Applied Psychological Measurement, 17, 253-276.

Andrich D. (1978) A rating scale formulation for ordered response categories. Psychometrika, 43, 561-573.

Andrich D. (1978) Application of a psychometric rating model to ordered categories which are scored with successive integers. Applied Psychological Measurement, 2(4) 581-594.

Andrich D. (1982) An Extension of the Rasch Model for Ratings Providing Both Location and Dispersion Parameters, Psychometrika, 47, 105-113

Andrich D. (1982) An index of person separation in latent trait theory, the traditional KR.20 index and the Guttman scale response pattern. Education Research and Perspectives, 9, 95-104.

Andrich D. (1982) Using latent trait measurement to analyse attitudinal data:a synthesis of viewpoints. In D. Spearitt (Ed.), The improvement of measurement in education and psychology, pp 89-126. Melbourne: ACER

Andrich D. (1985) A latent trait model for items with response dependencies: Implications for test construction and analysis. In S.E. Embretson (Ed.), Test design: developments in psychology and psychometrics (245-275) Orlando: Academic Press.

Andrich D. (1985) An Elaboration of Guttman Scaling With Rasch Models for Measurement, Sociological Methodology, 33-80,

Andrich D. (1988) A General Form of Rasch's Extended Logistic Model for Partial Credit Scoring. Applied Measurement in Education,1 (4), 363-378.

Andrich D. (1988) Rasch models for measurement. Sage university paper on quantitative applications in the social sciences, series number 07/068. Newbury Park, CA: Sage

Andrich D. (1988) The application of an unfolding model of the PIRT Type to the measurement of attitude. Applied Psychological Measurement, 12, 33-51.

Andrich D. (1989) Distinctions between assumptions and requirements in measurement in the social sciences. In J.A.Keats, R.Taft, R.A.Heath & S.Lovibond (Eds.), pp 7-16, Mathematical and Theoretical Systems, Elsevier Science Publishers: Amsterdam (North-Holland)

Andrich D. (1989) Statistical reasoning in psychometric models and educational measurement. JEM 26:1 pp. 81-90.

Andrich D. (1995) Distinctive and incompatible properties of two common classes of IRT models for graded responses. Applied Psychological Measurement, 19, 101-119.

Andrich D. (1996) A general hyperbolic cosine latent trait model for unfolding polytomous responses: Reconciling Thurstone and Likert methodologies. British Journal of Mathematical and Statistical Psychology, 49, 347-365.

Andrich D. (1996) A hyperbolic cosine latent trait model for unfolding polytomous responses: Reconciling Thurstone and Likert methodologies. British Journal of Mathematical and Statistical Psychology, 49, 347-365.

Andrich D. (Auth) & Chalmers C.P. (Rev) (1991) Review of ``Rasch Models for Measurement'', The Statistician, 40, 119-120

Andrich D. (Auth) & Engelhard, G. Jr (Rev) (1988) Review of ``Rasch Models for Measurement'', Applied Psychological Measurement, 12, 435-436

Andrich D., Sheridan, B., Lyne, A. & Luo, G. (1998) RUMM: A windows-based item analysis program employing Rasch unidimensional measurement models. Perth: Murdoch University

Angoff W.H. (1971) Scales, norms, and equivalent scores. In R.L. Thorndike (Ed.), Educational measurement (2nd ed.) (pp. 508-600) Washington, DC: American Council on Education.

Angoff W.H. (1993) Perspectives on differential item functioning methodology. In P.W. Holland & H. Wainer (Eds.), Differential Item Functioning, (pp.3-23) Lawrence Erlbaum Associates, Hillsdale, NJ.

Arnold B.C. & Strauss, D. (1991) Pseudolikelihood Estimation: Some Examples, Sankhya, Series B, Indian Journal of Statistics, 53, 233-243,

Arnold S.F. (1985, September) Sufficiency and invariance. Statistics & Probability Letters, 3, 275-279.

Avlund K., Schultz-Larsen, K. & Kreiner, S. (1993) The measurement of Instrumental ADL: Content validity and construct validity. Aging Clinical Experimental Research. 5 :371-383.

Bachman L., Lynch B. & Mason M. (1995) Investigating variability in tasks and rater judgements in a performance test of foreign language speaking. Language Testing 12, 238-258.

Bachman L., Savignon S. (1986) The evaluation of communicative language proficiency: a critique of the ADTFL oral interview. The Modern Language Journal 70, 380-390.

Bachman L.F. (1990) Fundamental considerations in language testing. Oxford: Oxford University Press.

Baek S-G. (1997) Computerized adaptive testing using the partial credit model for attitude measurement. In: M. Wilson G. Engelhard jr. & K. Draney (ed.), Objective measurement: Theory into practice, vol.4. Norwood, NJ: Ablex.

Baker F.B. (1992) Item response theory, parameter estimation techniques. New York, Basel, Hong Kong: Marcel Dekker, Inc.

Baker J.G. & Granger, C.V. (1997) Application of Rasch analysis in the development of the Medical Rehabilitation Follow Along Measure (MRFA) Physical medicine and rehabilitation: State of the Art Reviews. 11(2) : 305-313.

Baker J.G., Granger, C.V. & Fiedler, R.C. (1997) A brief outpatient functional assessment measure. Validity using Rasch measures. American Journal of Physical Medicine and Rehabilitation. 76 : 8-13.

Baker, F.B. & Al-Karni, A. (1991) A Comparison of two procedures for computing IRT equating coefficients. Journal of Educational Measurement, 28 (2), 147-162.

Bartolucci F., Forcina, A. (????) A likelihood ratio test for MTPR within binary variables. Annals of Statistics.

Barton, Marc A & Lord, Frederic M. (1981) An upper asymptote for the three-parameter logistic item-response model. Princeton, N.J.: Educational Testing Service.

Bayley S. (2000) Measuring customer satisfaction: Comparing traditional and latent trait approaches using the Auditor General's survey. Evaluation Journal of Australasia, 1:1, 8-17.

Beale E.M.L. & Little, R.J.A. (1975) Missing data in multivariate analysis. Journal of the Royal Statistical Society (B), 129-145.

Beaton A., Martin M., Mullis I.V.S., Gonzalez E.J., Kelly D. & Smith T. (1996) Science achievement in the middle school years: IEA's Third International Mathematics and Science Study. Chestnut Hill, MA: Boston College.

Beguin A.A., Glas, C.A.W. (1999) MCMC estimation of multidimensional IRT models. (Research Report 98-14) Department of Educational Measurement and Data Analysis, University of Twente, the Netherlands.

Bell RC, Low LH, Jackson HJ, Dudgeon PL, Copolov DL & Singh BS (1994) Latent trait modelling of symptoms of schizophrenia. Psychological Medicine, 34, 335-345.

Bentler P.M. & Bonett D.G. (1980) Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin, 88, 588-606.

Bentler P.M. (1990) Comparative fit indexes in structural models. Psychological Bulletin, 107, 238-246.

Berger M.F. & Veerkamp W.J.J. (1997) Some new item selection criteria for adaptive testing. Journal of Educational and Behavioral Statistics, 22, 203-26.

Bergner M., Bobbitt, R.A., Carter, W.B. & Gilson, B.S. (1981) The Sickness Impact Profile: Development and final revision of a health status assessment measure in clinical settings. Medical Care. 19(8) : 787-800.

Bernardin H.J. & Pence, E.C. (1980) Effects of rater training: Creating new response sets and decreasing accuracy. Journal of Applied Psychology, 65, 60-66.

Besag J. & Clifford P. (1989) Generalized Monte Carlo Significance Tests, Biometrika, 76, 633-642,

Bielinski J. & Davison M.L. (1998) Gender differences by item difficulty interactions in multiple-choice items. American Educational Research Journal, 35, 455-476.

Binkley M., Atash, M.N. & Bourque, M. (in press) Standard setting and reporting. In T. Husen and N. Postlethwaite (Eds.), The International Encyclopedia of Education, 2nd ed.

Birnbaum A. (1968) Some latent trait models and their uses in inferring an examinee's ability. In F.M. Lord & M.R. Novick, Statistical theories of mental test scores (pp. 395-479) Reading, MA: Addison-Wesley.

Blackwood Larry G. & Bradley, Edwin L. (1989) The Equivalence of Two Methods of Parameter Estimation for the Rasch Model, Psychometrika, 54, 751-754,

Blais J.-G., Laurier, M.D. (1995) The dimensionality of a placement test from several analytical perspectives. Language Testing, 12, 72-98.

Blais J.-G., Laurier, M.D. (1997) La determination de l'unidimensionalite de l'ensemble des scores a un test. Mesure et evaluation en education, 20, 65-90.

Blinkhorn S. (1997) Past imperfect, future conditional: Fifty years of test theory. British Journal of Mathematical and Statistical Psychology, 50, 2, 175-186.

Bock, R.D. (1997). A brief history of item response theory. Educational Measurement. Issues and Practice, 4,21-33.

Bock R.D. & Aitken M. (1981) Marginal maximum likelihood estimation of item parameters: An application of the EM algorithm. Psychometrika 46, 443-459.

Bock R.D. (1972) Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29-51.

Bock R.D., Gibbons R.D. & Muraki E. (1988) Full-information factor analysis. Applied Psychological Measurement 12, 261-280.

Bock R.D., Zimowski, M.F. (1997) Multi-group IRT. In W.J. Van der Linden, R.K. Hambleton (Eds.), Handbook of modern item response theory (pp. 433-448) New York: Springer Verlag.

Bode RK, Heinemann AW, Semik P. (2000) Measurement properties of the Galveston Orientation and Amnesia Test (GOAT) and improvement patterns during inpatient rehabilitation. Journal of Head Trauma Rehabilitation 13:637-633.

Bollen K.A. (1990) Overall fit in covariance structure models: Two types of sample size effects. Psychological Bulletin, 107, 256-259.

Bollinger Guenter and Hornke, Lutz F. (1978) On the Relation of Item Discrimination and Rasch Scaling (German), Archiv für Psychologie, 130, 89-96

Bradlow E.T., Weiss R.E. & Cho M. (1998) Bayesian identification of outliers in computerized adaptive tests. Journal of the American Statistical Association, 93, 910-919.

Bradlow E.T. & Thomas N. (1998) Item response theory models applied to data allowing examinee choice. Journal of Educational and Behavioral Statistics 23, 236-243.

Bradlow E.T., Wainer, H., Wang, X. (1999) A Bayesian random effects model for testlets. Psychometrika, 64, 153-168.

Braun H.I. & Wainer, H. (1989) Making essay test scores fairer with statistics (ETS Program Statistics Research Technical Report No. 89-90) Princeton, NJ: Educational Testing Service. (ERIC Document Reproduction Service No. ED395028)

Braun H.I. (1988) Understanding scoring reliability: Experiments in calibrating essay readers. Journal of Educational Statistics, 13(1), 1-18.

Bridgeman B., Morgan, R. & Wang, M. (1996) Reliability of Advanced Placement examinations (ETS Research Report RR-96-3) Princeton, NJ: Educational Testing Service. (ERIC Document Reproduction Service No. ED400331)

Brogden H.E. (1977) The Rasch Model, the Law of Comparative Judgment and Additive Conjoint Measurement, Psychometrika, 42, 631-634,

Brown A. (1995) The effect of rater variables in the development of an occupation-specific language performance test. Language Testing 12: 1-15.

Bryk A., Raudenbush, S. & Congdon, R. (1996) Hierarchical Linear and Nonlinear Modeling with the HLM/2L and HLM/3L Programs. Scientific Software International, Inc. Chicago, IL.

Buja A. & Eyuboglu, N. (1992) Remarks on parallel analysis. Multivariate Behavioral Research, 27, 509-540.

Burstein J. & Boodoo, G.M. (in preparation) Automated essay scoring for Advanced Placement. Princeton, NJ: Educational Testing Service.

Burstein J.C. & Chodorow, M. (1999) Automated essay scoring for nonnative English speakers. Proceedings of, Workshop on Computer-Mediated Language Assessment and Evaluation of Natural Language Processing, Joint Symposium of the Association of Computational Linguistics and the International Association of Language Learning Technologies College Park, Maryland.

Burstein J.C., Kukich, K., Wolff, W., Lu, C. & Chodorow, M. (1998) Enriching automated scoring using discourse marking. Proceedings of the Workshop on Discourse Relations and Discourse Marking, Annual Meeting of the Association of Computational Linguistics, Montreal, Canada, 206-210.

Camilli G. (1992) A conceptual analysis of differential item functioning in terms of a multidimensional item response model. Applied Psychological Measurement, 16, 129-147.

Candell G.L. & Drasgow, F. (1988) An iterative procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement, 12, 253-260.

Carstensen C.H. & Rost J. (1999) MULTIRA - Computerprogram. Kiel: IPN - Institute for Science Education.

Cason G.J. & Cason, C.L. (1984) A deterministic theory of clinical performance rating. Evaluation and the Health Professions, 7, 221-247.

Caulkins J.P., Larkey P.D. & Wei J. (1996) Adjusting GPA to Reflect Course Difficulty. The Heinz School of Public Policy and Management, Carnegie Mellon University.

Chang C.H. & Cella, D. (1997) Equating health-related quality of life instruments in applied oncology setting. Physical medicine and rehabilitation : State of the Art Reviews. 11(2) : 397-406.

Chang H.H. & Mazzeo J. (1994) The unique correspondence of the item category response functions in polytomously scored item response models. Psychometrika, 59, 391-404.

Chang H.H. & Ying Z. (1996) A global information approach to computerized adaptive testing. Applied Psychological Measurement, 20, 213-229.

Chang H.H. & Ying Z. (1999) A-stratified Multistage Computerized Adaptive Testing. Applied Psychological Measurement, 23, 211-222.

Chang H.H., Mazzeo J. & Roussos L. (1996) Detecting DIF for polytomously scored items: An adaptation of the SIBTEST procedure. Journal of Education Measurement, 33, 333-353.

Chang W. & Chan, C. (1995) Rasch analysis for outcomes measures : Some methodological considerations. Archives of Physical Medicine and Rehabilitation. 76 : 934-939.

Chase C.I. (1986) Essay test scoring: Interaction of relevant variables. Journal of Educational Measurement, 23(1), 33-41.

Chira-Adisai W, Yan K, Shahani BT. (2001) Changes in serial correlation coefficients and fractional parameters during functional recovery it stroke patients. Electromyography & Clinical Neurophysiology 41:79-86.

Choppin B. (1968) Item banking using sample-free calibration. Nature, 219(5156), 870-872.

Choppin B. (1985) A fully conditional estimation procedure for Rasch Model parameters. Evaluation in Education (now Internal Journal of Educational Rsearch), 9, 29-42.

Christiansen H. Dalgas. (1974) Distribution-free Reliability Analysis of Two-component Systems By Means of Rasch's Item Analysis Model, Scandinavian Journal of Statistics, 1, 33-35,

Cizek G.J. (1996) Setting passing scores. Educational Measurement: Issues and Practice, 15(2), 20-31.

Cizek G.J. (1996) Standard setting guidelines. Educational Measurement: Issues and Practice, 15(1), 12-21.

Clark A., Oswald, A. & Warr, P. (1996) Is job satisfaction U-shaped in age? Journal of Occupational & Organizational Psychology, 69(1), 57-81.

Clauser B., Clyman S., Swanson D. (1997) Components of rater error in a complex performance assessment. National Board of Medical Examiners, 29-45.

Cliff N. (1992) Abstract Measurement Theory and the revolution that never happened. Psychological Science, 3, 3, 186-190.

Cliff N., Donoghue, J.R. (1992) Ordinal test fidelity estimated by an item sampling model. Psychometrika, 57, 217-236.

Clogg Clifford C. (1988) Latent Class Models for Measuring, Latent Trait and Latent Class Models, Plenum (New York; London), 173-205,

Coffman W.E. & Kurfman, D. (1968) A comparison of two methods of reading essay examinations. American Educational Research Journal, 5(1), 99-107.

Cohen A.S., Kim S.-H. & Baker F.B. (1993) Detection of differential item functioning in the graded response model. Applied Psychological Measurement, 17, 335-350.

Cohen Leslie. (1979) Approximate Expressions for Parameter Estimates in the Rasch Model, The British Journal of Mathematical and Statistical Psychology, 32, 113-120

Cole B., Finch, E., Gowland, C. & Mayo, N. (1994) Physical Rehabilitation Outcome Measures, Editor John Basmajian, Toronto, 222 p.

Colonius Hans. (1977) On Keats' Generalization of the Rasch Model, Psychometrika, 42, 443-446,

Colvez A. & Gardent, H. (1990) Les indicateurs d'incapacite fonctionnelle en gerontologie- information, validation, utilisation. Paris : CTNERHI-INSERM.

Conaway Mark R. (1989) Analysis of Repeated Categorical Measurements With Conditional Likelihood Methods, Journal of the American Statistical Association, 84, 53-62,

Conaway Mark R. (1990) A Random Effects Model for Binary Data, Biometrics, 46, 317-328,

Cressie Noel and Holland, Paul W. (1983) Characterizing the Manifest Probabilities of Latent Trait Models, Psychometrika, 48, 129-141,

Cristante F.(1991) La misurazione della dimensionalitá degli atteggiamenti. In Trentin R. (Ed.) Gli atteggiamenti sociali, Bologna, Il Mulino.

Crocker L. & Algina, J. (1986) Introduction to classical & modern test theory. Fort Worth, TX: Harcourt Brace Jovanovich College Publishers.

Daigon A. (1966) Computer grading of English composition. English Journal, 55, 46-52.

Darragh AR. Sample PL. Fisher AG. (1988) Environment effect of functional task performance in adults with acquired brain injuries: use of the assessment of motor and process skills. Archives of Physical Medicine & Rehabilitation. 79(4):418-23, Apr.

David H.A. (1988) The method of paired comparisons. New York, NY: Oxford University Press.

Davis L.L. & Dodd, B.G. (2001) An examination of testlet scoring and item exposure constraints in the verbal reasoning section of the MCAT. MCAT Monograph Series: Association of American Medical Colleges.

Davis L.L. Pastor, D.A., Dodd, B.G., Chiang, C. & Fitzpatrick, S. (in press) An examination of exposure control and content balancing restrictions on item selection in CATs using the partial credit model. Journal of Applied Measurement.

De Ayala R.J. (1993) An introduction to polytomous item response theory models. Measurement and Evaluation in Counseling and Development, 25, 172-189.

de Gruijter, Dato N.M. (1985) A Note on the Asymptotic Variance-covariance Matrix of Item Parameter Estimates in the Rasch Model, Psychometrika, 50, 247-249,

de Gruijter, Dato N.M. (1987) On the Robustness of the `minimum-chi-square' Method for the Rasch Model, Tijdschrift Voor Onderwijs Research, 12, 225-232

de Gruijter, Dato N.M. (1990) A Note on the Bias of JMLE (UCON) Item Parameter Estimation in the Rasch Model, Journal of Educational Measurement, 27, 285-288

de Gruijter, Dato N.M. (1990) The Treat of Scale Drift in Item Banks, Tijdschrift Voor Onderwijs Research, 15, 104-109,

de Leeuw, Jan and Verhelst, Norman. (1986) Maximum Likelihood Estimation in Generalized Rasch Models, Journal of Educational Statistics, 11, 183-196,

DeBruin A.F., de Witte, L.P., Stevens, F. & Diederiks, J.P.M. (1992) Sickness Impact Profile : the state of the art of a generic functional status measure. Social Science Medicine. 35(8) : 1003-1014.

DeGruijter D.N.M. (1984) Two simple models for rater effects. Applied Psychological Measurement, 8(2), 213-218.

Deno S.L. (1985) Curriculum Based Measurement: The emerging alternative. Exceptional Children, 52 219-232

Deno S.L. (1992) The nature and development of curriculum-based measurement. Preventing School Failure, 36 (2), 5-10.

DiBello L., Jiang, H., Stout, W.F. (????) A multidimensional IRT model for practical cognitive diagnosis. Applied Psychological Measurement.

Dickerson A E. Fisher AG. (1997) Effects of familiarity of task and choice on the functional performance of younger and older adults. Psychology & Aging. 12(2):247-54, Jun.

Dickson H.G. & Kohler, F. (1996) The multi-dimensionality of the FIM motor items precludes an interval scaling using Rasch analysis. Scandinavian Journal of Rehabilitation Medicine. 26 : 159-162.

Dijkers MP. Yavuzer G. (1999) Short versions of the telephone motor Functional Independence Measure for use with persons with spinal cord injury. Archives of Physical Medicine & Rehabilitation 80:1477-1494.

Divgi D.R. (1986) Does the Rasch model really work for multiple choice items? Not if you look closely. Journal of Educational Measurement, 23 (4), pp 283-298

Doble S.E. & Fisher A.G. (1998) The dimensionality and validity of the Older Americans Resources and Services (OARS) Activities of Daily Living (ADL) Scale. Journal of Outcome Measurement, 2(1), 4-24.

Doble S.E., Fisk JD. MacPherson KM. Fisher AG. (1997) Rockwood K. Measuring functional competence in older persons with Alzheimer's disease. International Psychogeriatrics. 9(1):25-38, Mar.

Dodd B.G., De Ayala R.J. & Koch W.R. (1995) Computerized adaptive testing with polytomous items. Applied Psychological Measurement, 19, 5-22.

Dolan P. & Kind, P. (1996) Inconsistency and health state valuations. Social Science Medicine. 42(4) : 609-615.

Donoghue J.R. (1994) An empirical examination of the IRT information function of polytomously scored reading items under the generalized partial credit model. Journal of Educational Measurement, 31, 295-311.

Dorman P.J., Waddell, F., Slattery, J., Dennis M. & Sandercock, P. (1997) Is the EuroQol a valid measure of health-related quality of life after stroke? Stroke. 28 : 1876-1882.

Draney K.L., Pirolli, P., Wilson, M. (1995) A measurement model for a complex cognitive skill. In P.D.

Drasgow E, Levine M. V & Williams E.A. (1985) Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38, 67-86.

Drasgow F. & Levine, M.V. (1986) Optimal detection of certain forms of inappropriate test scores. Applied Psychological Measurement, 10, 59-67.

Drasgow F. & Levine, M.V., and Williams, E.A. (1985) Appropriateness measurement with polytomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38, 67-86.

Drasgow F. & Lissak, R.I. (1983) Modified parallel analysis: A procedure for examining the latent dimensionality of dichotomously scored item responses. Journal of Applied Psychology, 68, 363-373.

Drasgow F., Levine M.V., Tsien, S., Williams B.A. & Mead, A.D. (1995) Fitting polytomous item response theory models to multiple-choice tests. Applied Psychological Measurement, 19, 143-165.

Drasgow F., Levine, M.V., Tsien, S., Williams, B., Mead, A.D. (1995) Fitting polytomous item response theory models to multiple-choice tests. Applied Psychological Measurement, 19, 143-165.

Du Y. & Wright, B.D. (1997) Effects of student characteristics in a large-scale direct writing assessment. In M. Wilson, G. Engelhard, Jr. & K. Draney (Eds.), Objective Measurement: Theory into Practice (Vol. 4, pp. 1-24) Stamford, CT: Ablex Publishing Company.

Dumont C., Trudel, L. & Fougeyrollas, P. (1997) La mesure des resultats des programmes de readaptation. La Revue canadienne d'evaluation de programmes. 12(2) : 35-59.

Dunbar S.B., Koretz D.M. & Hoover H.D. (1991) Quality control in the development and use of performance assessment. Applied Psychological Measurement, 4, 289-303.

Duran L.J. Fisher A.G. (1996) Male and female performance on the assessment of motor and process skills. Archives of Physical Medicine & Rehabilitation. 77(10):1019-24, Oct.

Ebel R.L. (1951) Estimation of the reliability of ratings. Psychometrika, 16, 407-424.

Ebel R.L. (1979) Essentials of educational measurement, 3rd ed. Englewood Cliffs, NJ: Prentice Hall.

Ebrahim S. (1995) Clinical and public health perspectives and applications of health-related quality of life measurement. Social Science Medicine. 41(10) : 1383-1394.

Edgeworth Francis Y. (1890) The element of chance in competitive examinations.J. Roy. Stat. Soc. 53, 460-475.

Efron B. & Tibshirani, R.J. (1993) An introduction to the bootstrAU. New York, NY: Chapman & Hall.

Eggen T.J.H.M. & Straetmans G.J.J.M. (2000) Computerized adaptive testing for classifying examinees into three categories. Educational and Psychological measurement 60 713-734.

Eggen T.J.H.M. (1999) Item selection in adaptive testing with the sequential probability ratio test. Applied Psychological Measurement, 23, 249-261.

Eggen Theo J.H.M. & Kelderman, Henk. (1987) Reporting Parameters of the Rasch Model (German), Tijdschrift Voor Onderwijs Research, 12, 121-132

Elliott S.N. & Fuchs, L.S. (1997) The utility of curriculum-based measurement and performance assessment as alternatives to traditional intelligence and achievement tests. School Psychology Review, 26(2) 224-233

Ellis J.L. (1994) Foundations of monotone latent variable models. Nijmegen: Nijmegen Institute for Cognition and Information. Nonparametric and Parametric IRT, and the Future 21

Ellis J.L., Junker, B.W. (1997) Tail-measurability in monotone latent variable models. Psychometrika, 62, 495-523.

Elsas Donald A. (1990) The Scheiblechner Model: A Loglinear Analysis of Social Interaction Data, Social Networks, 12, 57-82,

Embretson S.E. & Reise, S.P. (2000) Item Response Theory for Psychologists, Lawrence Erlbaum Accociates, Mahwah, New Jersey.

Embretson S.E. & Reise, S.P. (2000) Item response theory for psychologists. Mahwah, NJ, US: Lawrence Erlbaum Associates, Inc., Publishers.

Embretson S.E. (1985) Multicomponent latent trait models for tests design. In S.E. Embretson (Ed.), Test design: developments in psychology and psychometrics (pp. 195-218) New York: Academic Press.

Embretson S.E. (1991) A multidimensional latent trait model for measuring learning and change. Psychometrika, 56, 495-515.

Embretson S.E. (1995) Developments toward a cognitive design system for psychological tests. In D. Lubinski, R.V. Dawis (Eds.), Assessing individual differences in human behavior: new concepts, methods and findings (pp 17-48) Palo Alto CA: Davies-Black Publishing.

Embretson S.E. (1996) The new rules of measurement. Psychological Assessment, 8, 341-349.

Embretson S.E. (1997) Multicomponent response models. In W.J. Van der Linden R.K. Hambleton (Eds.), Handbook of modern item response theory (pp. 305-321) New York: Springer Verlag.

Embretson S.E. (1998) A cognitive-design system approach to generating valid tests: applications to abstract reasoning. Psychological Methods, 3, 380-396.

Embretson S.E. (1999) Generating items during testing: Psychometric Issues and Models. Psychometrika, 64, 4, 407-433.

Embretson S.E. (2000) Generating abstract reasoning items with cognitive theory. In S.Irvine and P.Kyllonen (eds.) Item Generation for Test Development. Lawrence Erlbaum.

Embretson Susan E. (1991) A Multidimensional Latent Trait Model for Measuring Learning and Change, Psychometrika, 56, 495-515,

Engelhard G., Jr. & Anderson D. (1998) A binomial trials model for examining the ratings of standard-setting judges. Applied Measurement In Education, 3, 209-230.

Engelhard G., Jr. (1984) Thorndike, Thurstone, and Rasch: A Comparison of Their Methods of Scaling Psychological and Educational Tests, Applied Psychological Measurement, 8, 21-38

Engelhard G., Jr. (1990) Gender differences in performance on mathematics items: Evidence from the United States and Thailand. Contemporary Educational Psychology, 15, 13- 26.

Engelhard G., Jr. (1992) The measurement of writing ability with a many-faceted Rasch model. Applied Measurement in Education, 5(3), 171-191.

Engelhard G., Jr. (1994) Examining rater errors in the assessment of written composition with a many-faceted Rasch model. Journal of Educational Measurement, 31(2), 93-112.

Engelhard G., Jr. (1996) Evaluating rater accuracy in performance assessments. Journal of Educational Measurement, 33(1), 56-70.

Engelhard G., Jr. (1997) Constructing rater and task banks for performance assessments. Journal of Outcome Measurement, 1(1), 19-33.

Engelhard G., Jr. (2000) Monitoring raters in performance assessments. In G. Tindal & T. Haladyna (Eds.), Large-scale assessment programs for ALL students: Development, implementation, and analysis. Mahway, NJ: Lawrence Erlbaum Associates, Pub.

Engelhard G., Jr., Gordon, B. & Gabrielson, S. (1992) The influences of mode of discourse, experimental demand and gender on the quality of student writing. Research in the Teaching of English, 26(3), 120-142.

Engelhard G., Jr., Gordon, B., Siddle-Walker, E.V. & Gabrielson, S. (1994) Writing tasks and gender: Influences on writing quality of black and white students. The Journal of Educational Research, 87(4), 197-209.

Engelhard G., Jr., Myford, C.M. & Cline, F. (2000) Investigating assessor effects in NBPTS assessments for Early Childhood/Generalist and Middle Childhood/Generalist (ETS Research Report RR-00-13) Princeton, NJ: Educational Testing Service.

Evans M., Hastings, N. & Peacock, B. (1993) Statistical distributions (2nd ed.) NY: John Wiley.

Fan X. (1998) Item Response Theory and Classical Test Theory: an empirical comparison of their item/person characteristics. Educational and Psychological Measurement, 58, 3, 357-381.

Fay Robert E. & Carter, Woody and Dowd, Kathryn. (1990) Multiple Causes of Nonresponse: Analysis of the Survey of Census Participation (Disc: P531-533), 1991, ASA Proceedings of the Social Statistics Section, American Statistical Association (Alexandria, VA), 525-530,

Fechner G.T. (1860) Elemente der psychophysik. Leipzig: Breitkopf & Hartel. [Translation: Adler H.E. (1966) Elements of Psychophysics. New York: Holt, Rinehart & Winston.].

Fienberg Stephen E. (1986) The Rasch Model, Encyclopedia of Statistical Sciences (9 vols. plus Supplement), 7, Wiley (New York), 627-632

Fischer G.H. & Parzer, P. (1991) An Extension of the Rating Scale Model With An Application to the Measurement of Change, Psychometrika, 56, 637-651,

Fischer G.H. & Tanzer, N. (1994) Some LBTL and LLTM relationships. In G.H. Fischer & D. Laming (Eds.), Contributions to mathematical psychology. psychometrics, and methodology (pp. 277-303) New York, NY: Springer Verlag.

Fischer G.H. (1973) Linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359-374.

Fischer G.H. (1974) Einfuhrung in die Theorie psycholigischer Tests: Grundlagen and Anwendungen. Berlin: Springer.

Fischer G.H. (1983) Logistic latent trait models with linear constraints. Psychometrika, 48, 3-26.

Fischer G.H., Molenaar, I.W. (eds.) (1995) Rasch models: foundations, recent developments, and applications. New York: Springer-Verlag.

Fischer G.H. (1981) On the Existence and Uniqueness of Maximum-likelihood Estimates in the Rasch Model, Psychometrika, 46, 59-77,

Fischer G.H. (1983) Logistic Latent Trait Models With Linear Constraints, Psychometrika, 48, 3-26,

Fischer G.H. (1989) An IRT-based Model for Dichotomous Longitudinal Data, Psychometrika, 54, 599-624,

Fischer G.H. & Ponocny, I. (1994) An extension of the partial credit model with an application to the measurement of change. Psychometrika, 59, 177-192.

Fisher A.G. (1993, April) The assessment of IADL motor skills: An application of many-faceted Rasch analysis. American Journal of Occupational Therapy, 47(4), 319-329.

Fisher A.G. (1994) Development of a functional assessment that adjusts ability measures for task simplicity and rater leniency. In M. Wilson (Ed.), Objective measurement: Theory into practice. Vol II (pp. 145-175) Norwood, New Jersey: Ablex Publishing Corporation.

Fisher A.G. (1997) Multifaceted measurement of daily life task performance : Conceptualizing a test of instrumental ADL and validating the addition of personal ADL tasks. Physical medicine and rehabilitation : State of the Art Reviews. 11(2) : 289-303.

Fisher A.G. (1998) Uniting practice and theory in an occupational framework. 1998 Eleanor Clarke Slagle Lecture. [Review] [68 refs] American Journal of Occupational Therapy. 52(7):509-21, Jul-Aug.

Fisher A.G., Bryze K.A., Granger C.V., Haley S.M., Hamilton B.B., Heinemann A.W., Puderbaugh J.K., Linacre J.M., Ludlow L.H., McCabe M.A. & Wright B.D. (1994) Applications of conjoint measurement to the development of functional assessments. International Journal of Educational Research, 21(6), 579-593.

Fisher W.P., Jr. & Fisher A.G. (1993) Applications of Rasch analysis to studies in occupational therapy. In C.V. Granger & G.E. Gresham (Eds.), New developments in functional assessment. Physical Medicine and Rehabilitation Clinics of North America, 4(3), 551-569.

Fisher W.P., Jr. (1988) Truth, method, and measurement: the hermeneutic of instrumentation and the Rasch model [diss]. Dissertation Abstracts International 1988; 49:0778A. Chicago, Illinois: University of Chicago. 376 pages, 23 figures, 31 tables.

Fisher W.P., Jr. (1990, April) Conversing, testing, questioning. ERIC Document #TM 016 413 presented at the American Educational Research Association. Boston.

Fisher W.P., Jr. (1991, April) The hermeneutic of additive conjoint measurement in educational research. [ERIC Document #TM 016 361] presented at the American Educational Research Association. Chicago.

Fisher W.P., Jr. (1992) Objectivity in measurement: A philosophical history of Rasch's separability theorem. In M. Wilson (Ed.), Objective measurement: Theory into practice. Vol. I (pp. 29-58) Norwood, New Jersey: Ablex Publishing Corporation.

Fisher W.P., Jr. (1993) Measurement-related problems in functional assessment. The American Journal of Occupational Therapy, 47(4), 331-338.

Fisher W.P., Jr. (1994) The Rasch debate: Validity and revolution in educational measurement. In M. Wilson (Ed.), Objective measurement: Theory into practice. Vol. II (pp. 36-72) Norwood, New Jersey: Ablex Publishing Corporation.

Fisher W.P., Jr. (1997) Physical disability construct convergence across instruments: Towards a universal metric. Journal of Outcome Measurement, 1(2), 87-113.

Fisher W.P., Jr. (1997) What scale-free measurement means to outcomes research. Physical medicine and rehabilitation : State of the Art Reviews. 11(2) : 357-374.

Fisher W.P., Jr. (2000) Objectivity in psychosocial measurement: What, why, how. Journal of Outcome Measurement, 4, in press.

Fisher W.P., Jr., Harvey R.F., Taylor P., Kilgore K.M. & Kelly C.K. (1995) Rehabits: A common language of functional assessment. Archives of Physical Medicine and Rehabilitation, 76, 113-122.

Fitzpatrick A.R., Ercikan, K., Yen, W.M. & Ferrara, S. (1998) The consistency between raters scoring in different test years. Applied Measurement in Education, 11(2), 195-208.

Fitzpatrick A R., Link, V, Yen, W. M., Burket, G. R., Ito, K., & Sykes, R. (1996). Scaling performance assessments: A comparison of one-parameter and two-parameter partial credit models. Journal of Educational Measurement.

Fleiss J.L. (1981) Balanced incomplete block designs for inter-rater reliability studies. Applied Psychological Measurement, 5, 105-112.

Folland D. & Robertson D. (1976) Towards objectivity in group oral testing. English Language Teaching Journal 30: 156-167.

Follmann Dean. (1988) Consistent Estimation in the Rasch Model Based on Nonparametric Margins, Psychometrika, 53, 553-562,

Formann Anton K. & Rop, Ilse. (1987) On the Inhomogeneity of a Test Compounded of Two Rasch Homogeneous Subscales, Psychometrika, 52, 263-267,

Formann Anton K. & Spiel, Christiane. (1989) Measuring Change By Means of a Hybrid Variant of the Linear Logistic Model With Relaxed Assumptions, Applied Psychological Measurement, 13, 91-103,

Formann Anton K. (1986) A Note on the Computation of the Second-order Derivatives of the Elementary Symmetric Functions in the Rasch Model, Psychometrika, 51, 335-339,

Forsyth Robert and Saisangjan, Upatham and Gilmer, Jerry. (1981) Some Empirical Results Related to the Robustness of the Rasch Model, Applied Psychological Measurement, 5, 175-186,

Fougeyrollas P., et al. (1998) Social consequences of long term impairments and disabilities: Conceptual approach and assessment of handicap. International Journal of Rehabilitation Research. 21(2) : 127-141.

Fox C. & Jones J. (1998) Uses of Rasch modeling in counseling psychology research. Journal of Counseling Psychology 45/1: 30-45.

Fox G.J.A., C.A.W. Glas (1998) A multi-level IRT model with measurement error in the predictor variables. (Research Report 98-16) Department of Educational Measurement and Data Analysis, University of Twente, the Netherlands.

Fraser C & McDonald, R.P. (1988) NOHARM: Least Squares item factor analysis. Multivariate Behavioral Research, 23, 267-269.

Fraser C., McDonald, R.P. (1988) NOHARM: Least squares item factor analysis. Multivariate Behavioral Research 23, 267-269.

Fred Li M. & Olejnik S. (1997) The power of Rasch person-fit statistics in detecting unusual response patterns. Applied Psychological Measurement, 21(3) 215-231.

Frederiksen N., Mislevy R.J. & Bejar I. (1993) Test theory for a new generation of tests. Lawrence Erlbaum. ISBN: 0-8058-0593-1.

Fredriksen R. Glaser, A. Lesgold, M.G. Shafto (Eds.), Diagnostic monitoring of skill and knowledge acquisition (pp. 453-488) Hillsdale, NJ: Lawrence Erlbaum Associates.

Frenette & Bertrand, R. (2000) Assessing dimensionality with TESTFACT and DIMTEST using large-scale assessment data sets. Communication presented at the Annual Meeting of the American Educational Research Association. New Orleans, LA.

Friedlmeier Wolfgang and Meyer, Harald. (1991) Application of the Rasch Model for a Posterior Analysis of Qualitative Data (German), Zeitschrift für Experimentelle und Angewandte Psychologie, 38, 26-42

Froberg D.G. & Kane, R.L. (1988) Methodology for measuring health-state preferences-11: Scaling methods. Journal of Clinical Epidemiology. 42(5) : 459-471.

Froberg D.G. & Kane, R.L. (1988) Methodology for measuring health-state preferences-1: Measurement strategies. Journal of Clinical Epidemiology. 42(4) : 345-354.

Fulcher G. (1996) Testing tasks: Issues in task design and the group oral. Language Testing 13: 23-51.

Gardner H. (1992) Assessment in context: the alternative to educational testing. In B.R. Gifford, M.C. O'Connor (Eds.), Changing assessments: alternative views of aptitude, achievement, and instruction (pp. 77-119) Norwell, MA: Kluwer Academic Publishers.

Garner M. & Engelhard G. (2000) Rasch measurement theory, the method of paired comparisons and graph theory. In M. Wilson & G. Engelhard (Eds.), Objective measurement: Theory into practice (Vol. 5, pp. 259-286) Norwood, NJ: Ablex.

Garner M. & Engelhard, G. (2000) The method of paired comparisons, Rasch measurement theory, and Graph Theory. In M. Wilson, G. Engelhard & M. Stone (Eds.), Objective measurement: Theory into practice. Vol.5.

Garner M. & Engelhard, G., Jr. (1999) Gender differences in performance on multiple-choice and constructed response mathematics items. Applied Measurement in Education, 12(10), 29-51.

Gelman A., Carlin, J.B., Stern, H.S., Rubin, D.B. (1995) Bayesian data analysis. New York: Chapman and Hall.

George J.D., Fellingham G.W., Fisher AG. (1998)A modified version of the Rockport Fitness Walking Test for college men and women. Research Quarterly for Exercise & Sport. 69(2):205-9.

Gibbons R.D., Hedeker, D.R. (1992) Full-information item bi-factor analysis. Psychometrika, 57, 423- 436. Nonparametric and Parametric IRT, and the Future 22

Gibbons R.D., Hedeker, D.R. (1997) Random effects probit and logistic regression models for three-level data. Biometrics, 53, 1527-1537.

Glas C.A.W. & Verhelst N.D., (1995) Testing the Rasch Model. In G.H. Fischer & I.W. Molenaar (Eds.), Rasch Models: Foundations, recent developments, and applications (pp. 69-95) New York: Springer-Verlag.

Glas C.A.W. & Verhelst, N.D. (1989) Extensions of the Partial Credit Model, Psychometrika, 54, 635-659,

Glas C.A.W. (1988) The derivation of some tests for the Rasch model from the multinomial distribution. Psychometrika, 53, 525-546.

Glas C.A.W. (1988) The Rasch Model and Multistage Testing, Journal of Educational Statistics, 13, 45-52,

Gluck J. & Indurkhya, A. (2001.) Assessing changes in the longitudinal salience of items within constructs. Journal of Adolescent Research, 16, 169-187.

Goldstein H. (1980) Dimensionality, bias, independence and measurement scale problems in latent trait test score models. British Journal of Mathematical and Statistical Psychology, 33, pp 234-246.

Golembiewski R.T., Billingsley, K. & Yeager, S. (1976) Measuring change and persistency in human affairs: Types of change generated by OD designs. Journal of Applied Behavioral Science, 12, 133-57.

Gonin R. (1996) Establishing equivalence between scaled measures of quality of life. J. Quality of Life Research, 1996, 5, 20-26.

Goodman Leo A. (1990) Total-score Models and Rasch-type Models for the Analysis of a Multidimensional Contingency Table, Or a Set of Multidimensional Contingency Tables, With Specified And/or Unspecified Order for Response Categories, Sociological Methodology, 20, 249-294

Gorsuch R.L. (1988) Psychology of religion. Annual Review of Psychology, 39. 201-221.

Goto S., Fisher AG., Mayberry WL. (1996) The assessment of motor and process skills applied cross-culturally to the Japanese. American Journal of Occupational Therapy. 50(10):798-806, Nov-Dec.

Graham S. & Dwyer, A. (1987) Effects of the learning disability label, quality of writing performance, and examiner's level of expertise on the evaluation of written products. (ERIC Document Reproduction Service No. ED294351)

Granger C.V., Deutsch, A. & Linn, R.T. (1998) Rasch analysis of the Functional Independence Measure (FIM) mastery test. Archives of physical medicine and rehabilitation. 79 : 52-57.

Granger C.V., Hamilton, B.B., Linacre, J.M., Heinemann, A.W. & Wright, B.D. (1993) Performance profiles of the functional independence measure. American Journal of Physical Medicine and Rehabilitation. 72 : 84-89.

Granger C.V., Ottenbacher, K.J., Baker, J.G. & Sehgal, A. (1995) Reliability of a brief outpatient functional outcome assessment measure. American Journal of Physical Medicine and Rehabilitation. 74 : 469-475.

Grayson D.A. (1988) Two-group classification in latent trait theory: Scores with monotone likelihood ratio. Psychometrika, 53, 383-392.

Grayson D.A. (1988) Two-group classification in latent trait theory: scores with monotone likelihood ratio. Psychometrika, 53, 383-392.

Green B. (1995) Compatibility of scores from performance assessment. Educational Measurement, Winter, 13-15.

Green B.F., Bock, R.D., Humphreys, L.G., Linn, R.L. & Reckase, M.D. (1984) Technical guidelines for assessing computerized adaptive tests, Journal of Educational Measurement, 21, 347-360.

Green D.R., Yen, W. M., & Burket, G. R. (1989). Experiences in the application of item response theory in test construction. Applied Measurement in Education, Z, 297-312.

Green K. (1986) Fundamental measurement: A review and application of additive conjoint measurement in educational testing. Journal of Experimental Education, 54(3), 141-147.

Greenleaf E. (1992) Improving rating scale measures by detecting and correcting bias components in some response styles. Journal of Marketing Research, 29, 176-188.

Grego John M. (1993) PRASCH: A Fortran Program for Latent Class Polytomous Response Rasch Models, Applied Psychological Measurement, 17, 238-238

Griffith Priscilla L. et al. (1992) Student-Curriculum Maps: Applying the Rasch Model to Curriculum and Instruction. Journal of Research in Education, 2,1, 13-22

Grimby G., Andren E, Daving Y., Wright B. (1998) Dependence and perceived difficulty in daily activities in community-living stroke survivors 2 years after stroke: a study of instrumental structures. Stroke 29:1843-1849.

Grimby G., Andren E., Daving Y., and Wright B.D. (1998) Dependence and perceived difficulty in daily activities in community-living stroke survivors 2 years after stroke - a study of instrumental structures. Stroke, 29(9):1843-1849.

Grosse M.E. & Wright B.D. (1986) Setting, evaluating, and maintaining certification standards with the Rasch Model. Evaluation & the Health Professions, 9: 267-285.

Gulliksen H. (1945) The relation of item difficulty and inter-item correlation to test variance and reliability. Psychometrika, 10, 79-91.

Gumpel T., Wilson M. & Shalev R. (1998) An item response theory analysis of the Conner's Teachers Rating-Scale. Journal of Learning Disabilities, 31, 525-532.

Gustafsson J. (1980) Testing and obtaining fit of data to the Rasch model. British Journal of Mathematical and Statistical Psychology, 33. 205-233.

Gustafsson Jan-Eric. (1980) A Solution of the Conditional Estimation Problem for Long Tests in the Rasch Model for Dichotomous Items, Educational and Psychological Measurement, 40, 377-385,

Gustafsson Jan-Eric. (1980) Testing and Obtaining Fit of Data to the Rasch Model, The British Journal of Mathematical and Statistical Psychology, 33, 205-233

Guttman L. (1944) A basis for scaling qualitative data. American Sociological Review, 9, 139-150.

Guttman L. (1950) The basis for scalogram analysis. In Stouffer et al. Measurement and Prediction, Volume 4. Princeton N.J.: Princeton University Press, 60-90.

Haberman S. (1977) Maximum likelihood estimates in exponential response models. The Annals of Statistics, 5, 815-841.

Haberman Shelby J. (1977) Maximum Likelihood Estimates in Exponential Response Models, The Annals of Statistics, 5, 815-841,

Haertel E.H. (1989) Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26, 301-321.

Haladyna T.M. (1994, Jun) A research agenda for licensing and certification testing validation studies. Evaluation & the Health Professions., 17(2), 242-256.

Haley S.M., Ludlow L.H. & Coster W.J. (1993) Pediatric Evaluation of Disability Inventory: Clinical interpretation of summary scores using Rasch rating scale methodology. Physical Medicine and Rehabilitation Clinics of North America, 4(3), 529-540.

Haley SM, McHorney CA, Ware JE Jr. (1994) Evaluation of the MOS SF-36 physical functioning scale (PF-10): I. unidimensionality and reproducibility of the Rasch item scale. Journal of Clinical Epidemiology 47(6):671-84.

Hall W.J., Wijsman, R.A. & Ghosh, J.K. (1965) The relationship between sufficiency and invariance with applications in sequential analysis. Annals of Mathematical Statistics, 36, 575-614.

Hambleton R. (Editor) (1983) Applications of Item Response Theory. Vancouver, BC: Educational Research Institute of British Columbia.

Hambleton R., Swaminathan H. & Rogers H. (1991) Fundamentals of item response theory. Newbury Park, California: Sage Publications.

Hambleton R.K. & Jones R.W. (1993) Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practices, 13, 38-47.

Hambleton R.K. & Murray L. (1983) Some goodness of fit investigations for item response models. En R.K. Hambleton (Ed.) Applications of item response theory. (pp.71-94) Vancouver B.C.: Educational Research Institute of British Columbia.

Hambleton R.K. & Swaminathan, H. (1985) A look at psychometrics in the Netherlands. Nederlands Tijdschrift voor de Psychologie en Haar Grensgebieden, 40(7), 446-451.

Hambleton R.K. (1989) Principles and selected applications of item response theory. In R.L. Linn (Ed.), Educational Measurement 3rd ed. (pp. 147-200) New York : Macmillan Publishing Company

Hambleton R.K. (1990) Item response theory: introduction and bibliography. Psicothema, 2(1) 97-107.

Hambleton R.K., and Swaminathan, H. (1985) Item response theory: Principles and applications. Norwell, MA: Kluver Academic Publishers.

Hambleton R.K., Swaminathan H. & Rogers H.J. (1991) Fundamentals of item response theory. Newbury Park, CA: Sage.

Hambleton R.K., Swaminathan, H. & Rogers, H.J. (1991) Fundamentals of item response theory. Newbury Park, CA: Sage.

Hamerle A. (1979) Foundations of Measurement in Latent Trait Models (German), Archiv für Psychologie, 132, 19-39,

Hamilton L.S. (1998) Gender differences on high school science achievement tests: Do format and content matter? Educational Evaluation and Policy Analysis, 20, 179-195.

Handbook of Research on Educational Communications and Technology (pp. 570-600) New York: Macmillan Press..

Harman H.H. (1976) Modern factor analysis (3rd ed., revised) Chicago: University of Chicago Press.

Harnish D.L. (1983) Item response patterns: applications for educational practice. Journal of Educational Measurement, 20. 191-206.

Harris D. (1989) Comparison of 1-, 2-, and 3-parameter IRT models. Educational Measurement: Issues and Practice, 9, 35-41.

Hart DL, Wright BD. (2002) Development of an index of physical functional health status. Arch Phys Med Rehabil. 83(5) 655-665.

Harvey RL, Roth EJ, Heinemann AW, Lovell LL, McGuire JR, Diaz S. (1998) Stroke rehabilitation clinical predictors of resource utilization. Archives of Physical Medicine & Rehabilitation 79:1349-1353.

Hattie J.A. (1985) Methodology review: Assessing unidimensionality of tests and items. Applied Psychological Measurement, 9, 139-164.

Hattori T. (1986) Method of Simultaneous Equating for Scales With the Rasch Model Using Common-item Technique (Japanese), Japanese Journal of Behaviormetrics, 14, 39-46

Hatzinger Reinhold. (1989) The Rasch Model, Some Extensions and Their Relation to the Class of Generalized Linear Models, Statistical Modelling, Springer-Verlag (Berlin; New York), 172-179,

Hawley CA, Taylor R, Hellawell DJ, Pentland B. (1999) Use of the functional assessment measure (FIM+FAM) in head injury rehabilitation - a psychometric analysis. Journal of Neurology, Neurosurgery & Psychiatry 67:749-734.

Heckman J. (1979) Sample selection bias as a specification error. Econometrika 46, 931-961.

Heelan P. (1983, June) Natural science as a hermeneutic of instrumentation. Philosophy of Science, 50, 181-204.

Heinemann A.W., et al. (1997) Measurement properties of the NIH Stroke Scale during acute rehabilitation. Stroke. 28 : 1174-1180.

Hemker B.T. (2001) Reversibility revisited and other comparisons of three types of polytomous IRT models. In A. Boomsma M.A.J. van Duijn & T.A.B. Snijders (Eds.), Essays on item response theory (pp. 277 - 296) New York: Springer.

Hemker B.T., Sijtsma K., Molenaar I.W. & Junker B.W. (1996) Polytomous IRT models and monotone likelihood ratio of the total score. Psychometrika, 61, 679-693.

Hemker B.T., Sijtsma K., Molenaar I.W. & Junker B.W. (1997) Stochastic ordering using the latent trait and the sum score in polytomous IRT models. Psychometrika, 62, 331-347.

Hemker B.T., Sijtsma K., Molenaar, I.W., Junker, B.W. (1996) Polytomous IRT models and monotone likelihood ratio of the total score. Psychometrika, 61, 679-693.

Hemker B.T., Sijtsma K., Molenaar, I.W., Junker, B.W. (1997) Stochastic ordering using the latent trait and the sum score in polytomous IRT models. Psychometrika, 62, 331-347.

Hemker B.T., Sijtsma, K., Molenaar, I.W. (1995) Selection of unidimensional scales from a multidimensional item bank in the polytomous Mokken IRT model. Applied Psychological Measurement, 19, 337-352.

Hemker B.T., van der Ark L.A., Sijtsma K. (2000) On Measurement Properties of Continuation Ratio Models. Measurement and Research Department Reports2000-6. Arnhem, the Netherlands: Citogroep.p. 12-13.

Henning G. (1989) Does the Rasch model really work for multiple-choice items? Take another look: a response to Divgi. JEM 26:1 pp. 91-97.

Henning G. (1992) Dimensionality and construct validity of language tests. Language Testing 9/1: 1-11.

Henning G., Hudson, T. & Turner, J. (1985) Item response theory and the assumption of unidimensionality for language tests. Language Testing, 2(2), 141-54.

Hills J.R. (1989) Screening for potentially biased items in testing programs. Educational Measurement: Issues and practice. 8:4 pp.5-11.

Hilsdon J. (1995) The group oral exam: Advantages and limitations. In Alderson J. & North B., editors, Language testing in the 1990s: The communicative legacy. Hertforshire: Prentice Hall International, 189-197.

Hintze J.M. , Shapiro, E.S., Conte, K.L. & Basile, I.M. (1997) Oral reading and authentic reading material: Criterion validity of the technical features of CBM survey-level assessment. School Psychology Review 26(4) 535-553

Hoijtink H. (1990) A latent trait model for dichotomous choice data. Psychometrika, 55, 641-656.

Hoijtink H. (1991) The measurement of latent traits by proximity items. Applied Psychological Measurement, 15, 153-169.

Hoijtink H., Molenaar I. & Post W. (1994) PARELLA. User=s manual. IEC ProGAMMMA. Groningen- The Netherlands.

Hoijtink H., Molenaar, I.W. (1997) A multidimensional item response model: Constrained latent class analysis using the Gibbs sampler and posterior predictive checks. Psychometrika, 62, 171-189.

Holland P.W. & Thayer, D.T. (1988) Differential item functioning and the Mantel-Haenszel procedure. In H. Wainer & H.I. Braun (Eds.), Test validity (pp. 129 - 145) Hillsdale, NJ: Lawrence Erlbaum.

Holland P.W. (1981) When are item response models consistent with observed data? Psychometrika, 46, 79-92.

Holland P.W. (1990) The Dutch identity: a new tool for the study of item response models. Psychometrika, 55, 5-18.

Holland P.W., Rosenbaum, P.R. (1986) Conditional association and unidimensionality in monotone latent trait models. Annals of Statistics, 14, 1523-1543.

Hood R.W. Jr. (1970) Religious orientation and the reported religious experience. Journal for the Scientific Study of Religion, 9(4) 285-291.

Hopman-Rock M, Van Buuren S, Kleijn-De Vrankrijker M. (2000) Polytomous Ranch analysis as a tool for revision of the severity of disability code of the ICIDH. Disability & Rehabilitation 22:363-371.

Hornberger J.C., Redelmeier, D.A. & Petersen, J. (1992) Variability among methods to assess patients' well-being and consequent effect on a cost-effectiveness analysis. Journal of Clinical Epidemiology. 45(5) : 505-512.

Horst Paul (1933) The difficulty of a multiple choice test item. Journal of Educational Psychology, 24, 229-232.

Houston W.M., Raymond, M.R. & Svec, J.C. (1991) Adjustments for rater effects in performance assessment. Applied Psychological Measurement, 15(4), 409-421.

Huguenard B.R., Lerch, F.J., Junker, B.W., Patz, R.J., Kass, R.E. (1997) Working memory failure in phone-based interaction. ACM Transactions on Computer-Human Interaction, 4, 67-102.

Hulin C.L., Drasgow F. & Parsons C.K. (1983) Item response theory: Applications to psychological measurement. Homewood: Dow Jones Irwin.

Hulin C.L., Drasgow, F. & Parsons, C.K. (1983) Item response theory: Applications to psychological measurement. Homewood, IL: Dow Jones Irwin.

Huynh H. (1994) A new proof for monotone likelihood ratio for the sum of independent Bernoulli random variables. Psychometrika, 59, 77-79.

Huynh H. (1994) On equivalence between a partial credit item and a set of independent Rasch binary items. Psychometrika, 59, 111-119.

Huynh H. (1996) Decomposition of a Rasch partial credit item into independent binary and indecomposable trinary items. Psychometrika, 61, 31-39.

Huynh Huynh and Casteel, Jim. (1985) A Comparison of the Minimax and Rasch Approaches to Set Simultaneous Passing Scores for Subtests, Journal of Educational Statistics, 10, 334-344,

Huynh Huynh. (1990) Computation and Statistical Inference for Decision Consistency Indexes Based on the Rasch Model, Journal of Educational Statistics, 15, 353-368,

Huynh Huynh. (1990) Error Rates in Competency Testing When Test Retaking Is Permitted, Journal of Educational Statistics, 15, 39-52,

Huynh Huynh. (1994) On Equivalence Between a Partial Credit Item and a Set of Independent Rasch Binary Items, Psychometrika, 59, 111-119

Jackson TR, Draugalis JR, Slack MK, Zachry III WM, D'Agostino J. (2002) Validation of authentic performance assessment: a Process suited for Rasch modeling. Am J Pharm Educ, 66(2), 233-42.

Jackson TR, Popovich NG. (2003) The development, implementation, and evaluation of a self-assessment instrument for use in a pharmacy student competition. Am J Pharm Educ, 67(2) article 59

Jaeger A.O., Suess H.-M. & Beauducel A. (1997) Berliner Intelligenzstructur-Test. BIS-Test, Form 4. Goettingen: Hogrefe.

Jannarone R.J., K.F. Yu, et al. (1990) "Easy Bayes estimation for Rasch-type models." Psychometrika 55(3): 449-460.

Jannarone Robert J. (1986) Conjunctive Item Response Theory Kernels, Psychometrika, 51, 357-373,

Jannarone Robert J., Yu, K.F. & Laughlin, J.E. (1990) Easy Bayes Estimation for Rasch-type Models, Psychometrika, 55, 449-460

Jansen M.G.H. & Snijders, T.A.B. (1991) Comparisons of Bayesian Estimation Procedures for Two-way Contingency Tables Without Interaction, Statistica Neerlandica, 45, 51-65,

Jansen Margo G.H. & van Duijn, Marijtje A.J. (1992) Extensions of Rasch's Multiplicative Poisson Model, Psychometrika, 57, 405-414,

Jansen Margo G.H. (1986) A Bayesian Version of Rasch's Multiplicative Poisson Model for the Number of Errors of An Achievement Test, Journal of Educational Statistics, 11, 147-160

Jansen Paul G.W. & Roskam, Edward E. (1986) Latent Trait Models and Dichotomization of Graded Responses, Psychometrika, 51, 69-91,

Jansen Paul G.W. (1984) Relationships Between the Thurstone, Coombs, and Rasch Approaches to Item Scaling, Applied Psychological Measurement, 8, 373-383

Janssen R., De Boeck, P. (1997) Psychometric modeling of componentially designed synonym tasks. Applied Psychological Measurement, 21, 37-50.

Jenkinson C., Fitzpatrick R, Garratt A, Peto V, Stewart-Brown S. (2001) Can item response theory reduce patient burden when measuring health status in neurological disorders? Results form Rasch analysis of the SF-36 physical functioning scale (PF-10) Journal of Neurology, Neurosurgery & Psychiatry 71, 220-224.

Jew C.L., Green, K.E. & Kroger, J. (1999) Development and validation of a measure of resiliency. Measurement and Evaluation in Counseling and Development, 32, 75-89.

Jogdeo K. (1978) On a probability bound of Marshall and Olkin. Annals of Statistics, 6, 232-234.

Johnson E.G., Mislevy, R.J., Thomas, N. (1994) Theoretical background and philosophy of NAEP scaling procedures. In E.G. Johnson, J. Mazzeo, D.L. Kline (Eds.), Technical Report of the NAEP 1992 Trial State Assessment Program in Reading (pp. 133-146) Washington, DC: Office of Educational Research and Improvement, U.S. Department of Education.

Johnson Robert A. & Woltman, Henry F. (1986) Evaluating Census Data Quality Using Intensive Reinterviews: A Comparison of U.S. Census Methods and Rasch Methods, ASA Proceedings of the Section on Survey Research Methods, American Statistical Association (Alexandria, VA), 293-298

Johnson Robert A. & Woltman, Henry F. (1987) Evaluating Census Data Quality Using Intensive Reinterviews: A Comparison of U.S. Census Bureau Methods and Rasch Methods, Sociological Methodology, 185-204,

Johnson V.E. (1997) An alternative to traditional GPA for evaluating student performance. Statistical Science, 12, 251-278.

Jolly S.J., Johnson, R., Jones, B., & Abalus, J. (1986, April). The effect of test speededness and random guessing on the validity of reading comprehension scores. Paper presented at the annual meeting of the American Educational Research Association. San Francisco, CA

Joreskog K.G. & Sorbom D. (1993) LISREL 8.0 and PRELIS 2.10 for Windows (Computer software) Chicago: Scientific Software International, Inc.

Junker B. & Stout, W.F. (1994) Robustness of ability estimation when multiple traits are present with one trait dominant. In D. Laveault, B. Zumbo, M. Gessaroli & M. Boss 44 (Eds.), Modern Theories of Measurement: Problems and Issues (pp. 31-61) Ottawa, Canada, University of Ottawa Press.

Junker B.W. (1991) Essential independence and likelihood-based ability estimation for polytomous items. Psychometrika, 56, 255-278.

Junker B.W. (1993) Conditional association, essential independence and monotone unidimensional item response models. Annals of Statistics, 21, 1359-1378.

Junker B.W. (1993) Conditional association, essential independence and monotone unidimensional item response models. The Annals of Statistics, 21, 1359-1378.

Junker B.W. (1998) Some remarks on Scheiblechner's treatment of ISOP models. Psychometrika, 63, 73-85.

Junker B.W., Ellis, J.L. (1997) A characterization of monotone unidimensional latent variable models. Annals of Statistics, 25, 1327-1343.

Junker B.W., Sijtsma, K. (2000) Latent and manifest monotonicity in item response models. Applied Psychological Measurement, 24, 65-81.

Kamae T. Krengel, U., O'Brien, G.L. (1977) Stochastic inequalities on partially ordered spaces. Annals of Probability, 5, 899-912.

Kamata A. (2001) Item Analysis by the Hierarchical Generalized Linear Model. Journal of Educational Measurement, 38, 79-93.

Karabatsos G. (1997) The Sexual Experiences Survey: Interpretation and validity. Journal of Outcome Measurement, 1, 4, 305-328

Karabatsos G. (1998) What's in the criminal's mind: A picture is worth a thousand words. Popular Measurement, 1, 1, 80.

Kass R.E., Tierney, L., Kadane, J.B. (1990) The validity of posterior expansions based on Laplace's method. In S. Geisser, J.S. Hodges, S.J. Press A. Zellner (Eds.), Bayesian and likelihood methods in statistics and econometrics: Essays in honor of George A. Barnard (pp. 473-488) New York: North- Holland.

Keeves J.P. & Masters G.N.(Eds.) (1999) Advances in Measurement in Educational Research and Assessment. Pergamon.

Kelderman H. & Rijkes C.P.M. (1994) Loglinear multidimensional IRT models for polytomously scored items. Psychometrika, 59, 437-450.

Kelderman H., Rijkes, C.P.M. (1994) Loglinear multidimensional IRT models for polytomously scored items. Psychometrika, 59, 149-176.

Kelderman Hendrikus. (1984) Loglinear Rasch Model Tests, Psychometrika, 49, 223-245,

Kelderman Henk and Macready, George B. (1990) The Use of Loglinear Models for Assessing Differential Item Functioning Across Manifest and Latent Examinee Groups, Journal of Educational Measurement, 27, 307-327,

Kelderman Henk. (1988) Common Item Equating Using the Loglinear Rasch Model, Journal of Educational Statistics, 13, 319-333,

Kelderman Henk. (1989) Item Bias Detection Using Loglinear IRT, Psychometrika, 54, 681-697,

Kiefer J. & Wolfowitz J. (1956) Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Annals of Mathematical Statistics 27, 887-903.

Kim S.H., Cohen, A.S. & Park, T.H. (1995) Detection of differential item functioning in multiple groups. Journal of Educational Measurement, 32, 261 - 276.

Kindlon D.J., Wright B.D., Raudenbush S.W. & Earls F. (1996) The measurement of children's exposure to violence: A Rasch analysis. International Journal of Methods in Psychiatric Research 6:4 187-194.

Kingsbury G., Zara A. (1991) A Comparison of Procedures for Content-Sensitive Item Selection in Computerized Adaptive Tests. Applied Measurement in Education, 4(3) 241-61.

Kingsbury G.G. & Zara A.R. (1989) Procedures for selecting items for computerized adaptive tests. Applied Measurement in Education, 2, 359-375.

Kingsbury G.G. & Zara A.R. (1991) A comparison of procedures for content-sensitive item selection in computerized adaptive tests. Applied Measurement in Education, 4, 241-261.

Kirkley KN. Fisher AG. (1999) Alternate forms reliability of the assessment of motor and process skills. Journal of Outcome Measurement. 3(1):53-70.

Klauer Karl Christoph. (1991) An Exact and Optimal Standardized Person Test for Assessing Consistency With the Rasch Model, Psychometrika, 56, 213-228,

Klauer Karl Christoph. (1991) Exact and Best Confidence Intervals for the Ability Parameter of the Rasch Model, Psychometrika, 56, 535-547,

Koch W.R. & Dodd, B.G. (1989) An investigation of procedures for computerized adaptive testing using partial credit scoring. Applied Measurement in Education, 2(4), 335-337,

Kolen M.J. & Brennan R.L. (1995) Test equating: Methods and practices. New York: Springer-Verlag.

Kolen M.J. & Brennan, R.L. (1995) Test equating: Method and practice. New York: Springer Verlag.

Krantz D.H., Luce, R.D., Suppes, P. & Tversky, A. (1971) Foundations of measurement, Volume 1, Additive and Polynomial Representations, Academic Press, Inc. New York.

Krantz D.H., Luce, R.D., Suppes, P., Tversky, A. (1971) Foundations of measurement: vol. 1: Additive and polynomial representations. Academic Press, New York.

Kreitzberg C., Stocking M.L. & Swanson L. (1978) Computerized Adaptive Testing: Principles and Directions, Computers and Education. 2, 4, pp. 319-329.

Kubinger Klaus D. (1988) On a Rasch-model-based Test for Noncomputerized Adaptive Testing, Latent Trait and Latent Class Models, Plenum (New York; London), 277-289

Kuhn T.S. (1961) The function of measurement in modern physical science. Isis, 52(168), 161-193. (Rpt. in The essential tension: Selected studies in scientific tradition and change (pp. 178-224) Chicago, Illinois: University of Chicago Press)

Lai J., Fisher A., Magalhaes L. & Bundy A. (1996) Construct validity of the sensory integration and praxis tests. Occupational Therapy Journal of Research, 16(2), 75-97.

Lance C.E., LaPointe, J.A. & Stewart, A.M. (1994) A test of the context dependency of three causal models of halo rater error. Journal of Applied Psychology, 79(3), 332-340.

Landauer T.K., Foltz, P.W. & Laham, D. (1998) An introduction to latent semantic analysis. Discourse Processes, 25, 259-284.

Langeheine R. & Rost J. (eds.) (1988) Latent Trait and Latent Class Analysis. Plenum, New York.

Langeheine Rolf and Rost, Jürgen (Ed). (1988) Latent Trait and Latent Class Models, Plenum, 311,

Latour B. (1987) Science in action: How to follow scientists and engineers through society. New York, New York: Cambridge University Press.

Latour B. (1994, May) Pragmatogonies: A mythical account of how humans and nonhumans swap properties. American Behavioral Scientist, 37(6), 791-808.

Lauritzen Steffen L. (1984) Extreme Point Models in Statistics, Scandinavian Journal of Statistics, 11, 65-91,

Lauritzen, S. L. (2008). Exchangeable Rasch matrices. Rendiconti di Matematica, Serie VII, Volume 28, Roma, 83-95.

Lawley, DN. (1943). On problems connected with item selection and test construction. Proceedings of the Royal Society of Edinburgh, 61-A, 273-287.

Lawson S. (1991) One parameter latent trait measurement: Do the results justify the effort? In B. Thompson (Ed.) Advances in educational research: Substantive findings, Methodological developments (Vol. 1, pp. 159-168) Greenwich, CT: JAI Press.

Lazaraton A. (1996) Interlocutor support in oral proficiency interviews: The case of CASE. Language Testing 13: 151-172.

Lee Y., Nelder, J.A. (1996) Hierarchical generalized linear models. Journal of the Royal Statistical Society, Series B, 58, 619-678.

Legler Julie and Ryan, Louise. (1993) Latent Variable Models for Multiple Outcomes, ASA Proceedings of the Section on Statistics and the Environment, American Statistical Association (Alexandria, VA), 53-58,

Levine M.V. & Drasgow F. (1982) Appropiateness measurement: review, critique and validating studies. British Journal of Mathematical and Statistical Psychology, 35. 42-56.

Levine M.V. & Drasgow, E (1988) Optimal appropriateness measurement. Psychometrika, 53, 161-176.

Levine M.V. & Rubin D.B. (1979) Measuring the appropriateness of multiple-choice test scores. Journal of Educational Statistics, 4, 269-290.

Lewis D.M., Green, D. R., Mitzel, H. C., Baum, K., & Patz, R. J. (April, 1998). The Bookmark Standard Setting Procedure: Methodology and Recent Implementations. Paper presented at the 1998 National Council for Measurement in Education annual meeting, San Diego, CA.

Lewis D.M., Mitzel, H. C., Green, D. R. (1996). Standard Setting: A Bookmark Approach. In D. R. Green (Chair), IRT-Based Standard-Setting Procedures Utilizing Behavioral Anchoring. Symposium presented at the 1996 Council of Chief State School Officers 1996 National Conference on Large Scale Assessment, Phoenix, AZ.

Li Y.H. & Lissitz, R.W. (2000a) An evaluation of the accuracy of multidimensional IRT linking. Applied Psychological Measurement, 24, 115-138.

Lim R. & Drasgow, F. (1990) Evaluation of two methods for estimating item response theory parameters when assessing differential item functioning. Journal of Applied Psychology, 75, 164 - 174.

Lin M.H. (1986). The impact of time limits on test behaviors. Paper presented at the annual meeting of the American Educational Research Association. San Francisco, CA.

Linacre J.M. (1998) Detecting multidimensionality: Which residual data-type works best? Journal of Outcome Measurement, 2(3), 266-283.

Linacre J.M. (1999) Investigating rating scale category utility. Journal of Outcome Measurement, 3(2), 103-122.

Linacre J.M. (2002) A user's guide to FACETS: Rasch measurement computer program. Chicago: MESA Press.

Linacre J.M., Engelhard, G., Jr., Tatum, D.S. & Myford, C.M. (1994) Measurement with judges: Many-faceted conjoint measurement. International Journal of Educational Research, 21(6), 569-577.

Linacre J.M., Heinemann, A.W., Wright, B.D., Granger, C.V. & Hamilton, B.B. (1994) The structure and stability of the Functional Independence Measure. Archives of Physical Medicine and Rehabilitation. 75 : 127-132.

Lindsay Bruce and Clogg, Clifford and Grego, John. (1991) Semiparametric Estimation in the Rasch Model and Related Exponential Response Models, Including a Simple Latent Class Model for Item Analysis, Journal of the American Statistical Association, 86, 96-107,

Lindsey J. (1997) Progressive Achievement Tests in Mathematics Teacher's Manual. Melbourne: ACER.

Lindsey J. (1998) Progressive Achievement Tests in Mathematics Norming Manual. Melbourne: ACER.

Linn R. & Grouland N. (1995) Measurement and Assessment in Teaching, Englewood Cliffs: Prentice Hall.

Linn R.L. & Harnisch, D. (1981) Interactions between item content and group membership in achievement test items. Journal of Educational Measurement, 18, 109-118.

Linn R.L. (1966) Grade adjustments for prediction of academic performance: A review. Journal of Educational Measurement, 3, 313-329.

Linn R.L. (1993) Educational assessment: Expanded expectations and challenges. Educational Evaluation and Policy Analysis 15: 1-16.

Linn R.L., Baker, E. & Dunbar, S.B. (1991) Complex performance-based assessments: Expectations and validation criteria. Educational Researcher, 20(8), 15-21.

Liou M. & Yu, L.C. (1991) Assessing statistical accuracy in ability estimation: a bootstrap approach. Psychometrika, 56(1), 55-67.

Liou Michelle and Chang, Chih-Hsin. (1992) Constructing the Exact Significance Level for a Pearson Fit Statistic, Psychometrika, 57, 169-181,

Liou Michelle. (1994) More on the Computation of Higher-order Derivatives of the Elementary Symmetric Functions in the Rasch Model, Applied Psychological Measurement, 18, 53-62

Lipsey M.W. (1996) Key issues in intervention research : A program evaluation perspective. American Journal of Industrial Medicine. 29 : 298-302.

Liski E. & Puntanen S. (1983) A study of the statistical foundations of group conversation tests in spoken English. Language Learning 33: 225-246.

Little R.J.A. & Rubin D.B. (1987) Statistical analysis with missing data. New York: Wiley.

Little R.J.A. & Rubin, D.B. (1987) Statistical analysis with missing data. New York: Wiley.

Liu C., Rubin, D.B. (1998) Maximum likelihood estimation of factor analysis using the ECME algorithm with complete and incomplete data. Statistica Sinica, 8, 729-747. Nonparametric and Parametric IRT, and the Future 24

Loevinger J. (1948) The technique of homogeneous tests compared with some aspects of scale analysis and factor analysis. Psychological Bulletin, 45, 507-530.

Longford N.T. (1993) Reliability of essay rating and score adjustment (ETS Technical Report No. 93-36) Princeton, NJ: Educational Testing Service.

Longford N.T. (1994) A case for adjusting subjectively rated scores in the Advanced Placement tests. (ETS Program Statistics Research Technical Report No. 94-5) Princeton, NJ: Educational Testing Service. (ERIC Document Reproduction Service No. ED380502)

Longford N.T. (1994) Reliability of essay rating and score adjustment. Journal of Educational and Behavioral Statistics, 19, 171-201.

Lopez Pina J.A. & Hidalgo M.D. (1996) Bondad de ajuste y teoria de respuesta a los items. En J.Muniz (Coor.) Psicometria. (pp. 643-703) Madrid: Universitas.

Lord F.M. & Novick M.R. (1968) Statistical Theories of Mental Test Scores. Reading, Mass: Addison-Wesley.

Lord Frederic M. (1952) A theory of test scores. Psychometric Monographs, No. 7.

Lord F.M. (1957) Do tests of the same length have the same standard error of measurement? Educational and Psychological Measurement, 17, 510-521.

Lord F.M. (1970) Some test theory for tailored testing. In W.H. Holzman (Ed.), Computer Assisted Instruction, Testing, and Guidance. New York: Harper and Row.

Lord F.M. (1974) Estimation of latent ability and item parameters when there are omitted responses. PSYCHOMETRIKA 39, 247-264.

Lord F.M. (1980) Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates.

Lord F.M. (1983) Maximum likelihood estimation of item response parameters when some responses are omitted. PSYCHOMETRIKA 48, 477-482.

Lord F.M. (1983) Small N justifies Rasch model. In D.J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 51-61) New York, NY: Academic Press, Inc.

Lord F.M. (1984) Maximum likelihood and bayesian parameter estimation in item response theory. Educational Document Reproduction Service ED250365.

Loyd B.H. & Hoover, H.D. (1980) Vertical equating using the Rasch model. Journal of Educational Measurement, 17, 179-193.

Luce R.D. & Tukey J.W. (1964) Simultaneous conjoint measurement. Journal of Mathematical Psychology,(1),1-27.

Ludlow L.H. & Haley S.M. (1995, December) Rasch model logits: Interpretation, use, and transformation. Educational and Psychological Measurement, 55(6), 967-975.

Ludlow L.H., Haley S. (1996) Effect of context in rating of mobility activities in children with disabilities: an assessment using the pediatric evaluation of disability inventory, Educational and Psychological Measurement,56, 122-129.

Luecht R.M. (1996) Multidimensional computerized adaptive testing in a certification or licensure context. Applied Psychological Measurement, 20, 389-404.

Lumley T. & McNamara T.F. (1995) Rater characteristics and rater bias: implications for training. Language Testing 12: 54-71.

Lumsden J. (1978) Tests are perfectly reliable. British Journal of Mathematical and Statistical Psychology 31:19-26.

Lunz M.E. & Stahl J.A. (1993, April) The effect of rater severity on person ability measures: A Rasch model analysis. American Journal of Occupational Therapy, 47(4), 311-317.

Lunz M.E. & Stahl, J.A. (1990) Judge consistency and severity across grading periods. Evaluation and the Health Professions, 13, 425-444.

Lunz M.E. & Stahl, J.A. (1993) The impact of examiners of candidate scores: An introduction to the use of multi-facet Rasch model analysis for oral examinations. Teaching and Learning in Medicine, 5, 3.

Lunz M.E. & Wright B.D. (1997) Latent Trait Models for Performance Examinations. In J. Rost & R. Langeheine (Hrsg.), Applications of latent trait and latent class models in the social sciences. Munster: Waxmann.

Lunz M.E. (2000) Setting standards on performance examinations. In M.R. Wilson & G. Engelhard Jr. (Eds), Objective measurement: Theory into practice (Vol. 5, pp. 181-199) Stamford, Connecticut: Ablex Publishing.

Lunz M.E., Stahl, J.A. & Wright, B.D. (1994) Interjudge reliability and decision reproducibility. Educational and Psychological Measurement, 54(4), 913-925.

Lunz M.E., Stahl, J.A. & Wright, B.D. (1996) The invariance of rater severity calibrations. In G. Engelhard, Jr. & M. Wilson (Eds.), Objective Measurement: Theory into Practice (Vol. 3, pp. 99-112) Norwood, NJ: Ablex.

Lunz M.E., Wright B. & Linacre J. (1990) Measuring the impact of judge severity on examination scores. Applied Measurement in Education 3: 331-345.

Lunz M.E., Wright, B.D. & Linacre, J.M. (1990) Measuring the impact of judge severity on examination scores. Applied Measurement in Education, 3(4), 331-345.

Lynch B. & McNamara T.F. (1998) Using g-theory and many-facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants. Language Testing 15: 158-180.

Müller-Schneider Thomas. (1993) Different Scaling Models - Different Findings? A Comparison of the Models According to Rasch and Mokken As Well As the Classical Test Construction (German), Zeitschrift für Soziologie, 22, 371-384

MacKnight C, Rockwood K. (2000) Rasch analysis of the hierarchical assessment of balance and mobility (HABAM) Journal of Clinical Epidemiology 33:1242-1247.

Macmillan/McGraw-Hill. (1993). Reflecting Diversity: Multicultural Guidelines for Educational Publishing Professionals. New York, NY.

Malec J.F., Buffington AL., Moessner AK Degiorgio L. (2000) A medical/vocational case; coordination system for persons with brain injury: an evaluation of employment outcomes. Archives of Physical Medicine & Rehabilitation 81:1007-1013.

Malec J.F., Moessner AM, Kragness M, Lezak MD. (2000) Refining a measure of brain injury sequelae to predict post-acute rehabilitation outcome: rating scale analysis of the Mayo-Portland Adaptability Inventory. Journal of Head Trauma Rehabilitation 13:670-682.

Malec J.F. (2001) Impact of comprehensive day treatment on societal participation for persons with acquired brain injury. Archives of Physical Medicine & Rehabilitation 82 :885-893.

Mallinson T., Mahaffey, L. & Kielhofner, G. (1998) The occupational performance history interview : Evidence for three underlying constructs of occupational adaptation. Canadian Journal of Occupational Therapist. 65(4) : 219-228.

Mandsen H.S. (1987) Utilizing Rasch analysis to detect cheating on language examinations. ERIC Document Reproduction Service N1 ED 287 284.

Mantal N. & Haenszel, W. (1959) Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719-748.

Maris E. (1995) Psychometric latent response models. Psychometrika, 60, 523-547.

Maris E., De Boeck, P., Van Mechelen, I. (1996) Probability matrix decomposition models. Psychometrika, 61, 7-29.

Marston D. (1989) A curriculum based measurement approach to assessing academic performance: What it is and why do it. In M.R. Shinn (Ed.), Curriculum -Based Measurement: Assessing special children (pp. 18-78) New York : Guilford Press.

Marston D. B & Deno, S.L. (1981) The reliability of simple direct measures of written expression. (Research Report N. 50) Minneapolis: University of Minnesota Institute for Research on Learning Disabilities.

Martinez-Martin P, Grandas F, Linazasoro G, Bravo JL (1999) Conversion to controlled-release levopoda/carbidopa treatment and quality of life as measured by the Nottingham Health Profile. The STAR Study Group. Neurologia 14:338-343.

Masters G.N. & et al. (1990) Profiles of Learning: The Basic Skills Testing Program in New South Wales, 1989. Camberwell, Victoria, Australia: ACER.

Masters G.N. & Evans J. (1986) Banking non-dichotomously scored items. Applied Psychological Measurement, 10(4), 355-367.

Masters G.N. & Wright B.D. (1984) The essential process in a family of measurement models. Psychometrika, 49(4) 529-544.

Masters G.N. & Wright B.D. (1997) The partial credit model. In W.J. van der Linden and R.K. Hambleton (Eds.) Handbook of modern item response theory. (pp. 101-121. New York: Springer-Verlag.

Masters G.N. (1980) A Rasch model for rating scales. Dissertation Abstracts International, 41, 215A-216A.

Masters G.N. (1982) A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.

Masters G.N. (1984) Constructing an item bank using partial credit scoring. Journal of Educational Measurement, 21, 19-32.

Masters G.N. (1988) Measurement models for ordered response categories. In R. Langeheine & J. Rost (Eds.), Latent trait and latent class models (pp. 11-29) New York: Plenum press.

Masters G.N. (1988) Partial credit model. In J.P.Keeves, (Ed.) Educational research, methodology and measurement: an international handbook. (pp. 292-297) Elmsford N.Y.: Pergamon Press.

Masters G.N. (1988) The analysis of partial credit scoring. Applied Measurement in Education, 1(4) 279-297.

Masters G.N. (1995) Scaling and Aggregation in IEA Studies (Technical report) University of California, Berkeley: Technical Advisory Committee, International Association for the Evaluation of Educational Achievement (IEA)

Masters G.N., Adams R.J. & Lokan J. (1994) Mapping student achievement. International Journal of Educational Research, 21(6), 595-610.

Masters G.N. (1985) Common-Person Equating with the Rasch Model. Applied Psychological Measurement, 9 (1), 73-82.

Matthews M. (1990) Skill taxonomies and problems for the testing of reading. Reading in a foreign language. 7(1): 511-517.

Mauraun M.D. & Rossi, N.T. (2001) The extra-factor phenomenon revisited: unidimensional unfolding as quadratic factor analysis. Applied Psychological Measurement, 25, 77-87.

McArthur D.L. (1981) Bias in the writing of prose and its appraisal (CSE Report No. CSE-RR-162) Los Angeles, CA: Center for the Study of Evaluation. (ERIC Document Reproduction Service No. 217073)

McArthur D.L., Cohen, M.J. & Shandler, S.L. (1991) Rasch analysis of functional assessment scales : An example using pain behaviors. Archives of Physical Medicine and Rehabilitation. 72 : 296-304.

McBride J.R., Martin J.T. (1983) Reliability and Validity of Adaptive Ability Tests in a military setting. in Weiss D.J. (Ed.) "New Horizons in Testing" New York: Academic Press.

McColl M.A., Davies, D., Carlson, P., Johnston, J. & Minnes, P. (2001) The community integration measure: development and preliminary validation. Archives of Physical Medicine and Rehabilitation . 82(4): 429-34.

McCullagh P., Nelder, J.A. (1989) Generalized Linear Models (2nd Edition) New York: Chapman and Hall. Nonparametric and Parametric IRT, and the Future 25

McDonald R.P. (1994) Testing for approximate dimensionality. In D. Laveault, B. Zumbo, M. Gessaroli & M. Boss (Eds.), Modern theories of measurement: Problems and issues (pp. 63-85) Ottawa, Canada, University of Ottawa Press.

McDonald R.P. (1997) Normal-ogive multidimensional model. In van der Linden W.J. & Hambleton R.K. Handbook of Modern Item Response Theory. New York: Springer.

McDonald R.P. (1999) Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum Associates.

McDonald R.P. (2000) Test Theory: a unified treatment. Lawrence Erlbaum. ISBN: 0-8058-3075-8.

McDowell I. & Newell C. (1996) Measuring health: A guide to rating scales and questionnaires. 2d edition. Oxford: Oxford University Press.

McHorney C.A., Haley, S.M. & Ware, J.E. (1997) Evaluation of the MOS SF-36 Physical Functioning Scale (PF-10) Comparison of relative precision using Likert and Rasch scoring methods. Journal of Clinical Epidemiology. 50(4) : 451-461.

McLeod L.D. & Lewis, C. (1999) Detecting Item Memorization in the CAT Environment. Applied Psychological Measurement, 23, 147-160.

McLeond L.D., Swygert, K. A, Thissen, D. (2001) Factor analysis for items scored in two categories. D. Thissen and H. Wainer (eds.), Test Scoring, 189-216.Mahwah NJ: Lawrence Erlbaum Associates, Inc.

McNamara T.F. & Lumley T. (1997) The effect of interlocutor and assessment mode variables in overseas assessments of speaking skills in occupational settings. Language Testing 14: 140-156.

McNamara T.F. (1996) Measuring second language performance. New York: Addison Wesley Longman.

Meijer R.R. & Sijtsma, K. (2001) Methodology review: Evaluating person fit. Applied Psychological Measurement, 25, 107-135.

Meijer R.R. (1996) Person-fit research: An introduction. (Guest editor's introduction to the Special Issue: Person-fit research: Theory and applications.) Applied Measurement in Education, 9, 3-8.

Meijer R.R., Molenaar I. W & Sijtsma K. (1994) Item, test, person and group characteristics and their influence on nonparametric appropriateness measurement. Applied Psychological Measurement, 18, 111-120.

Meijer R.R., Sijtsma, K. Smid, N.G. (1990) Theoretical and empirical comparison of the Mokken and the Rasch approach to IRT. Applied Psychological Measurement, 14, 283-298.

Meisels S.J. (1992) Doing harm by doing good: Iatrogenic effects of early childhood. Early Childhood Research Quarterly, 7 (2), 155-174.

Mellenbergh G.J. (1995) Conceptual notes on models for discrete polytomous item responses. Applied Psychological Measurement, 19, 91-100.

Mellenbergh Gideon J. & Vijn, Pieter. (1981) The Rasch Model As a Loglinear Model, Applied Psychological Measurement, 5, 369-376,

Meredith W. & Horn, J. (2001) The role of factorial invariance in modeling growth and change. In Collins, L.M. & Sayer, A.G. (Eds.), New methods for the analysis of change. Washington, D.C.: American Psychological Association, pp. 203-XXX.

Meredith,W. (1965) Some results based on a general stochastic model for mental tests. Psychometrika, 30, 419-440.

Messick S. (1989) Validity. In R.L. Linn (Ed.), Educational measurement (3rd ed.) New York: American Council on Education/ Macmillan.

Messick S. (1989) Validity. In R.L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-103) New York: Macmillan.

Messick S. (1995) Validity of Psychological Assessment. American Psychologist, 50(9), 74149.

Messick S. (1995) Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741-749.

Michalewicz Z. (1994) Generic Algorithms + Data Structures = Generic Programs. Berlin: Springer-Verlag.

Michell J. (1986) Measurement scales and statistics: A clash of paradigms. Psychological Bulletin, 100, 398-407.

Michell J. (1990) An Introduction to the Logic of Psychological Measurement. Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Michell J. (1997) Quantitative science and the definition of measurement in psychology', Br J Psych (1997) 88, 355-383.

Michell J. (1999) Measurement in psychology: a critical history of a methodological concept. Cambridge, Cambridge University Press.

Miller T., Reckase, R., Spray, J.,Luecht, R. & Davey, T. (1996) Multidimensional item response theory. Iowa City IA: ACT Publications.

Miller T.R. & Hirsch, T.M. (1992) Cluster analysis of angular data in applications of multidimensional item response theory. Applied Measurement in Education, 5, 193-211.

Mills C.N. & Stocking M.L. (1996) Practical issues in large-scale computerized adaptive testing. Applied Measurement in Education, 9, 287-304.

Mislevy R.J. & Bock R.D., (1996) BILOG computer program. Scientific Software International.

Mislevy R.J. & Bock, R.D. (1990) BILOG (Version 3.11) Mooresville, IN: Scientific Software, Inc.

Mislevy R.J. & Bock, R.D. (1991) BILOG users' guide. Chicago: Scientific Software.

Mislevy R.J. & Chang H.H. (2000) Does adaptive testing violate local independence? PSYCHOMETRIKA 65, 149-156.

Mislevy R.J. & Wu P.K. (1996) Missing responses and IRT ability estimation: omits, choice, time limits, and adaptive testing. ETS Research Report RR-96-30-ONR. Princeton NJ: Educational Testing Service.

Mislevy R.J. (1985) Estimation of latent group effects. Journal of the American Statistical Association, 80, 993-997.

Mislevy R.J. (1996) Test theory reconceived. Journal of Educational Measurement, 33, 379-416.

Mislevy R.J., Sheehan, K.M. (1989) The role of collateral information about examinees in item parameter estimation Psychometrika, 54, 661-679.

Mislevy Robert J. (1988) Exploiting Auxiliary Information About Items in the Estimation of Rasch Item Difficulty Parameters, Applied Psychological Measurement, 12, 281-296,

Mislevy, R.J. & Stocking, M.L. (1989) A consumer's guide to LOGIST and BILOG. Applied Psychological Measurement, 13, 57-75.

Mislevy, R.J., Beaton, A.E. & Kaplan, B. (1992) Estimating population characteristics from sparse matrix samples of item responses. Journal of Educational Measurement, 29 (2), 133-161.

Moeller Svend Kreiner. (1976) The Rasch-Weibull Process, Scandinavian Journal of Statistics, 3, 107-115,

Mokken R.J. & Lewis C. (1982) A nonparametric approach to the analysis of dichotomous item responses. Applied Psychological Measurement, 6, 417-430.

Mokken R.J. (1997) Nonparametric models for dichotomous items. In W.J. Van der Linden, R.K. Hambleton (Eds.), Handbook of modern item response theory (pp. 351-368) New York: Springer Verlag.

Molenaar I. W and Hoijtink H. (1990) The many null distributions of person fit indices. Psychometrika, 55, 75-106.

Molenaar I. W and Hoijtink H. (1996) Person fit and the Rasch model, with an application of knowledge of logical quantors. Applied Measurement in Education, 9, 27-45.

Molenaar I.W. (1991) A weighted Loevinger H-coefficient extending Mokken scaling to multicategory items. Kwantitatieve Methoden, 37, 97-117.

Molenaar I.W. (1997) Nonparametric methods for polytomous responses. In W.J. Van der Linden, R.K. Hambleton (Eds.), Handbook of modern psychometrics (pp. 369-380) New York: Springer Verlag.

Molenaar Ivo W. & Hoijtink, Herbert. (1990) The Many Null Distributions of Person Fit Indices, Psychometrika, 55, 75-106,

Molenaar Ivo W. (1983) Some Improved Diagnostics for Failure of the Rasch Model, Psychometrika, 48, 49-72,

Molenaar Ivo W. (1992) Statistical Models for Educational Testing and Attitude Measurement, Statistical Modelling. Papers from the Sixth International Workshop on Statistical Modelling, Elsevier/North-Holland (New York; Amsterdam), 249-262,

Morales L., Reise S. & Hays R.D. (2000) Evaluating the equivalence of health care ratings by whites and Hispanics. Medical Care, 38, 517-527.

Morrow D. & Goertzen, S. (1986) A commentary on gender differences. Manitoba Department of Education, Winnipeg. Planning and Research Branch. (ERIC Document Reproduction Service No. ED 301 469)

Mueller Hans. (1987) A Rasch Model for Continuous Ratings, Psychometrika, 52, 165-181,

Munger G.E, & Loyd, B. H. (1991). Effect of speededness on test performance of handicapped and non-handicapped examinees. Journal of Educational Research, 85(l), 53-57.

Muraki E. & Bock, R.D. (1997) PARSCALE: IRT item analysis and test scoring for rating-scale data. Chicago: Scientific Software International.

Muraki E. (1990) Fitting a polytomous item response model to Likert-type data. Applied Psychological Measurement, 14, 59-71.

Muraki E. (1992) A generalized partial credit model: application of an EM algorithm. Applied Psychological Measurement 16, 159-176.

Muraki E. (1993) Information functions of the generalized partial credit model. Applied Psychological Measurement, 17, 351-363.

Muraki E., Carlson, J.E. (1995) Full-information Factor Analysis for Polytomous Item Responses. Applied Psychological Measurement, 19, 73-90.

Muraki Eiji. (1992) A Generalized Partial Credit Model: Application of An EM Algorithm, Applied Psychological Measurement, 16, 159-176,

Murray B. (1998, August) The latest techno tool: essay-grading computers. APA Monitor, p.43.

Myford C.M. & Mislevy, R.J. (1995) Monitoring and improving a portfolio assessment system (ETS Center for Performance Assessment Report No. MS 94-05) Princeton, NJ: Educational Testing Service.

Myford C.M. & Wolfe, E.W. (2001) Detecting and measuring rater effects using many-facet Rasch measurement: An instructional module. Manuscript submitted for publication.

Myford C.M., Marr, D.B. & Linacre, J.M. (1996) Reader calibration and its potential role in equating for the Test of Written English (ETS Center for Performance Assessment Report No. MS 95-02) Princeton, NJ: Educational Testing Service.

Narahara M. (1998) Kindergarten entrance age and academic achievement. Information Analyses. (ERIC Document Reproduction Service No. ED 421 218)

Narahara M. (1998) The effects of school entry age and gender on reading and math achievement scores of second grade students. Reports - Research. (ERIC Document Reproduction Service No. ED 421 233)

Nedelsky L. (1954) Absolute grading for objective tests. Educational and Psychological Measurement, 14, 3-19.

Nedelsky L. (1954) Absolute grading standards for objective tests. Educational and Psychological Measurement, 14, 3-19.

Nering M.L. (1995) The distribution of person fit using true and estimated person parameter. Applied Psychological Measurement, 19, 121-129.

Nering M.L. (1997) The distribution of indexes of person fit within the computerized adaptive testing environment. Applied Psychological Measurement, 21, 115-127.

Neyman J. & Scott E.L. (1948) Consistent estimates based on partially consistent observations. Econometrica 16, 1-32.

Neyman J. & Scott E.L. (1948) Consistent estimates based on partially consistent observations. Econometrica, 16, 1-32.

Nicholls J.G. (1989) The competitive ethos and democratic education. Cambridge, Mass.: Harvard University Press.

Nichols P., Sugrue, B. (1999) The lack of fidelity between cognitively complex constructs and conventional test development practice. Educational Measurement: Issues and Practice, 18, 18-29.

Nichols S.F. Chipman, R.L. Brennan (Eds.), Cognitively diagnostic assessment (pp. 103-125) Hillsdale, NJ: Lawrence Erlbaum Associates.

Noel Y. (1999) Recovering unimodal latent patterns of change by unfolding analysis: Applications to smoking cessation. Psychological Methods, 4, 173-191.

Nordenskiold U. (1997) Daily activities in women with rheumatoid arthritis. Aspects of patient education, assistive devices and methods for disability and impairment assessment. Scandinavian Journal of Rehabilitation Medicine. Supplement. 37 : 1-72.

Nordenskiold U., Grimby, G., Hedberg, F.M., Wright, B. & Linacre, J.M. (1996) The structure of an instrument for assessing the effects of assistive devices and altered working methods in women with rheumatoid arthritis. Arthritis Care and Research. 9(5) :358-367.

Norris J.M. & Ortega L. 2000: Effectiveness of L2 instruction: A research synthesis and quantitative meta-analysis. To appear in Language Learning 50/3.

Norusis M. (1990) SPSS Introductory Statistics Student Guide, Chicago: SPSS Inc.

Nunnally J.C. & Bernstein I. (1994) Psychometric Theory. 3rd Edition. McGraw-Hill. ISBN: 0-07-047849-X.

Nunnally J.C. (1979) Psychometric Theory 2nd Editn.. McGraw-Hill. ISBN: 0-07-047465-6.

Nunnally J.C., Lemond, L.C. & Wilson,W.H. (1977) Studies of voluntary visual attention: Theory, methods, and psychometric issues.Applied Psychological Measurement, 1(2), 203-218.

Nunnully J.C. & Koplin J.H. (1967) The effects of word-relatedness on learning. Educational Document Reproduction Service NO: ED 016214

O'Brien M.L. (1992) Using Rasch procedures to understand psychometric structure in measures of personality. En M. Wilson (Ed.) Objective measurement: theory into practice. (pp. 61-76) Norwood, NJ: Ablex Publishing Corporation.

O'Connell M.A., Belanger B.A., Haaland Perry D. (1993) Calibration and assay development using the four-parameter logistic model. Chemometrics and Intelligent Lab. Sys. V20, 97-114.

O'Neill T.R. & Lunz, M.E. (2000) A method to study rater severity across several administrations. In M. Wilson & G. Engelhard, Jr. (Eds.), Objective Measurement: Theory into Practice (Vol. 5, pp. 135-146) Stamford, CT: Ablex.

Orlando M. & Thissen, D. (2000) Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24, 50-64.

Orlando M., Sherbourne, C.D. & Thissen, D. (2000) Summed-score linking using item response theory: Application to depression measurement. Psychological Assessment, 12(3), 354-359.

Owen R.J. (1975) A Bayesian sequential procedure for quantal response in the context of adaptive mental testing. Journal of the American Statistical Association, 70, 351-356.

Owens A.M. (2001, March 1) Boys should start kinderga{ten a year later than girls, report advises. National Post Online. Retrieved March 16, 2001 from the World Wide Web: www.nationalpost.com/search/story.html

Page E.B. & Petersen, N.S. (1995) The computer moves into essay grading: Updating the ancient test. Phi Delta Kappan, 76, 561-565.

Page E.B. (1966) The imminence of grading essays by computer. Phi Delta Kappan, 48, 238-243.

Page E.B. (1968) Analyzing student essays by computer. International Review of Education, 14, 210-225.

Palmer D., Kays, M., Smith, A. & Doig, B. (1994) Stop! Look and Lesson. Camberwell: The Australian Council for Educational Research.

Pastor D.A., Dodd, B.G. & Chang, H.H. (2002) A comparison of item selection techniques and exposure control mechanisms in CATs using the generalized partial credit model. Applied Psychological Measurement, 26 (2), 147-163.

Patz R.J., Junker, B.W. (1999a) A straightforward approach to Markov Chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24, 146-178.

Patz R.J., Junker, B.W. (1999b) Applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses. Journal of Educational and Behavioral Statistics, 24, xxx-xxx.

Patz R.J., Junker, B.W., Lerch, F.J., Huguenard, B.R. (1996) Analyzing small psychological experiments with item response models (CMU Statistics Department technical report #644) [Online]. Available: www.stat.cmu.edu/cmu-stats/tr/tr644/tr644.html. Accessed 28 April 2000.

Perline R., B.D. Wright, et al. (1979) "The Rasch model as additive conjoint measurement." Applied Psychological Measurement 3(2): 237-255.

Perline R., Wright, B.D. & Wainer, H. (1979) The Rasch model as additive conjoint measurement. Applied Psychological Measurement, 3, 237-255.

Petersen N.S., Kolen, M.J. & Hoover, H.D. (1989) Scaling, norming, and equating. In R.L. Linn (Ed.), Educational Measurement, Third Edition (pp. 221-262)

Peterson N.S., Kolen, M.J. & Hoover, H.D. (1989) Scaling, norming, and equating, in RL Linn (eds): Educational Measurement (3rd ed) New York: Macmillan, pp 221-262.

Peterson S. & Bainbridge, J. (1999) Teachers' gendered expectations and their evaluation of student writing. Reading Research and Instruction, 38(3), 255-271.

Pfanzagl J. (1993) On the Consistency of Conditional Maximum Likelihood Estimators, Annals of the Institute of Statistical Mathematics, 45, 703-719,

Pfanzagl J. (1994) On item parameter estimation in certain latent trait models. In G.H. Fischer & D. Laming (Eds.) Contributions to Mathematical Psychology, Psychometrics and Methodology. New York: Springer Verlag.

Phillips A., Holland P.W. (1986) A new estimator of the variance of the Mantel-Haenszel Log-Odds-Ratio Estimator. Technical report no. 86-67. Princeton NJ: Educational Testing Service.

Phillips Gary W. & Gedeik, Sandra S. (1984) RKAPPA: Reliability of Mastery Tests: An Application of the Rasch Model, Applied Psychological Measurement, 8, 286-286,

Pimentel F.L., Maia-Gonçalves, J.P., Mesquita, N.F., Mateus, P., Alvarez, P., Roman, P., and Melon, J. (1998) Influence of patient clinical characteristics in quality of life measured by Rasch model in cancer patients: A portuguese experience. Quality of Life Research Vol. 7, 649.

Post W.J. (1992) Nonparametric unfolding models. A latent structure approach. Leiden: DSWO Press, Leiden University, The Netherlands.

Post W.J., Snijders, T.A.B. (1993) Nonparametric unfolding models for dichotomous data. Methodika, 7, 130-156.

Powers D.E., Fowles, M.E. & Welsh, C.K. (1999) Further validation of a writing assessment for graduate admissions. (GRE Board Research Report No. 96-13R and ETS Research Report 99-18) Princeton, NJ: Educational Testing Service.

Prieto L., Alonso J., Lamarca R., Wright B.D. (1998) Rasch measurement for reducing the items of the Nottingham Health Profile. Journal of Outcome Measurement, 2(4):285-301.

Prieto L., Alonzo, J., Ferrer, M. & Anto, J.M. (1997) Are results of the SF-36 Health Survey and the Nottingham Health Profile similar? : A comparison in CODP patients. Journal of Clinical Epidemiology. 50(4) : 463-473.

Pula J.J. & Huot, B.A. (1993) A model of background influences on holistic raters. In M.M. Williamson & B.A. Huot (Eds.), Validating Holistic Scoring for Writing Assessment: Theoretical and Empirical Foundations (pp.237-265) Cresskill, NJ: Hampton Press.

Raczek AE, Ware JE, Bjorner JB, et al. (1998) Comparison of Rasch and summated rating scales constructed from SF-36 physical functioning items in seven countries: results fruit the IQOLA Project. International Quality of Life Assessment. Journal of Clinical Epidemiology 51:1203-1214.

Raju N.S., van der Linden W.J. & Fleer P.F. (1995) IRT-based internal measures of Differential functioning of items and tests. Applied Psychological Measurement, 19, 353-368.

Raju N.S., van der Linden, W.J. & Fleer, P.F. (1995) IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19, 353-368.

Ramsay J.O. (1989) A Comparison of Three Simple Test Theory Models, Psychometrika, 54, 487-499,

Ramsay J.O. (1991) Kernel smoothing approaches to nonparametric item characteristic curve estimation. Psychometrika, 56, 611-630.

Ramsay J.O. (1995) A similarity-based smoothing approach to nondimensional item analysis. Psychometrika, 60, 323-339.

Ramsay J.O. (1996) A geometrical approach to item response theory. Behaviormetrika, 23, 3-17.

Rasch G. (1960, 1980, 1992) Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research. (Reprinted by the Chicago University Press, 1980)

Rasch G. (1961) On general laws and the meaning of measurement in psychology. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Psychology, 4, 321-333.

Rasch G. (1966) An individualistic approach to item analysis. In P.F. Lazarfeld & N.W. Henry (Eds.), Readings in mathematical social science (pp.. 89107) Chicago, IL: Science Research Associates, Inc.

Rasch G. (1977) On specific objectivity: An attempt at formalizing the request for generality and validity of scientific statements. Danish Yearbook of Philosophy, 14, 58-94.

Raymond M.R. & Viswesvaran, C. (1993) Least-squares models to correct for rater effects in performance assessment. Journal of Educational Measurement, 30(3), 253-268.

Raymond M.R. (1986) Missing data in evaluation research. Evaluation and the Health Professions, 9, 395-420.

Raymond M.R., Webb, L.C. & Houston, W.M. (1991) Correcting performance-rating errors in oral examinations. Evaluation and the Health Professions, 14(1), 100-122.

Reckase M.D. & McKinley, R.L. (1991) The discriminating power of items that measure more than one dimension. Applied Psychological Measurement, 15, 361-373.

Reckase M.D. (1974) An interactive computer program for tailored testing based on the one-parameter logistic model. Behavior Research Methods and Instrumentation 6:2 208-212

Reckase M.D. (1979) Unifactor latent trait models applied to multi-factor tests: Results and implications. Journal of Educational Statistics, 4, 207-230.

Reckase M.D. (1985) The difficulty of items that measure more than one ability. Applied Psychological Measurement, 9, 401-412.

Reckase M.D. (1985) The difficulty of test items that measure more than one ability. Applied Psychological Measurement 9, 401-,412.

Reckase M.D. (1985) The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9, 401-412.

Reckase M.D. (1985) The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9,401-412.

Reckase M.D. (1997) A linear logistic multidimensional model for dichotomous item response data. In van der Linden W.J. & Hambleton R.K. Handbook of Modern Item Response Theory. New York: Springer.

Reise S. E (1995) Scoring method and the detection of person misfit in a personality assessment context. Applied Psychological Measurement, 19, 213-229.

Reise S. E and Due A.M. (1991) The influence of test characteristics on the detection of aberrant response patterns. Applied Psychological Measurement, 15, 217-226.

Reise S.P. & Due, A.M. (1991) Test characteristics and their influence on the detection of aberrant response patterns. Applied Psychological Measurement, 15, 217-226.

Reise S.P. & Waller N.G. (1993) Traiteness and the assessment of response pattern scalability. Journal of Personality and Social Psychology, 65. 143-151.

Resnick L.B., Resnick, D.P. (1992) Assessing the thinking curriculum: new tools for educational reform. In B.R. Gifford, M.C. O'Connor (Eds.), Changing assessments: alternative views of aptitude, achievement, and instruction (pp 37-75) Norwell, MA: Kluwer Academic Publishers.

Revicki D.A. & Cella D.F. (1997, Aug) Health status assessment for the twenty-first century item response theory item banking and computer adaptive testing. Quality of Life Research, 6(6), 595-600.

Revuelta J. & Ponsada V. (1998) A comparison of item exposure control methods in computerized adaptive testing. Journal of Educational Measurement 38 311-327.

Revuelta J. & Ponsoda, V. (1998) A comparison of item exposure control methods in computerized adaptive testing. Journal of Educational Measurement, 35, 311-327.

Richardson J. (1994) Cost utility analysis : What should be measured? Social Science Medicine. 39(1) :7-21.

Rigdon S.E., Tsutakawa, R.K. (1983) Parameter estimation in latent trait models. Psychometrika, 48, 567-574.

Rigdon Steven E. & Tsutakawa, Robert K. (1983) Parameter Estimation in Latent Trait Models, Psychometrika, 48, 567-574,

Rigdon Steven E. & Tsutakawa, Robert K. (1987) Estimation for the Rasch Model When Both Ability and Difficulty Parameters Are Random, Journal of Educational Statistics, 12, 76-86,

Roberts J.S. & Laughlin, J.E. (1996) A unidimensional item response model for unfolding responses from a graded disagree-agree response scale. Applied Psychological Measurement, 20, 231-255.

Roberts J.S. (2001b) GGUM2000: Estimation of parameters in the generalized graded unfolding model. Applied Psychological Measurement, 25, 38.

Roberts J.S., Donoghue, J.R. & Laughlin, J.E. (2000) A general item response theory model for unfolding unidimensional polytomous responses. Applied Psychological Measurement, 24, 3-32.

Roberts J.S., Donoghue, J.R. & Laughlin, J.E. (2002) Characteristics of MMLE/EAP parameter estimates in the generalized graded unfolding model. Applied Psychological Measurement, 26, 192-207.

Roberts J.S., Laughlin, J.E. & Wedell, D.H. (1999) Validity issues in the Likert and Thurstone approaches to attitude measurement. Educational and Psychological Measurement, 59, 211-233.

Roberts J.S., Lin, Y. & Laughlin, J.E. (2001) Computerized adaptive testing with the generalized graded unfolding model. Applied Psychological Measurement, 25, 177-196.

Robertson T., Wright, F.T., Dykstra, R.L. (1988) Order restricted statistical inference. New York: Wiley.

Rojas A.J. (1998) Aplicacion del Modelo de Credito Parcial y Modelo de Escalas de Clasificacion a la medicion de actitudes. Almeria: Servicio de Publicaciones de la Universidad de Almeria. [Edition CD-ROM].

Rosenbaum P.R. (1984) Testing the conditional independence and monotonicity assumptions of item response theory. Psychometrika, 49, 425-435. Nonparametric and Parametric IRT, and the Future 26

Rosenbaum P.R. (1985) Comparing distributions of item responses for two groups. British Journal of Mathematical and Statistical Psychology, 38, 206 - 215.

Rosenbaum P.R. (1987a) Probability inequalities for latent scales. British Journal of Mathematical and Statistical Psychology, 40, 157-168.

Rosenbaum P.R. (1987b) Comparing item characteristic curves. Psychometrika, 52, 217-233.

Roskam Edward E. & Jansen, Paul G.W. (1989) Conditions for Rasch-dichotomizability of the Unidimensional Polytomous Rasch Model, Psychometrika, 54, 317-332,

Ross J. & Cliff, N. (1964) A generalization of the interpoint distance model. Psychometrika, 29, 167-176.

Ross S. (1992) Accommodative questions in oral proficiency interviews. Language Testing 9: 173-176.

Rost Jürgen. (1985) A Latent Class Model for Rating Data, Psychometrika, 50, 37-49,

Rost Jürgen. (1989) Rasch Models and Latent Class Models for Measuring Change With Ordinal Variables, Multiway Data Analysis, North-Holland/Elsevier (Amsterdam; New York), 473-483

Rost Jürgen. (1990) Rasch Models in Latent Classes: An Integration of Two Approaches to Item Analysis, Applied Psychological Measurement, 14, 271-282

Roth E.J., Heinemann, A.W., Lovell, L.L., Harvey, R.L., McGuire, J.R. & Diaz, S. (1998) Impairment and disability : Their relation during stroke rehabilitation. Archives of Physical Medicine and Rehabilitation. 79 : 329-335.

Roussos L.A., Stout, W. & Marden, J. (1998) Using new proximity measures with hierarchical cluster analysis to detect multidimensionality. Journal of Educational Measurement, 35, 1-30.

Rubin D.B. (1987) Multiple Imputation for Nonresponse in Surveys. New York: Wiley.

Rubin D.B. (1987) Multiple imputation for nonresponse in surveys. New York: Wiley.

Rudner L.M. (1992) Reducing errors due to the use of judges. ERIC/TM Digest. (Report EDO-TM-92-10) Washington, DC: American Institutes for Research. (ERIC Document Reproduction Service No. ED355254)

Rudner Lawrence M. (1998) An On-line, Interactive, Computer Adaptive Testing Mini-Tutorial, ericae.net/scripts/cat

Ruiz R., Ortiz, R. & Alvarez, P. (2001) Dry bean cultivar characterisation by isoelectric focusing electrophoresis in polyacrylamide gel. Journal of the Science of Food and Agriculture 81, 1126-1131.

Ryan J.T., Williams, J.S. & Doig, B.A. (1998) National tests: Educating teachers about their children's mathematical thinking. In A. Olivier & K. Newstead (Eds) Proceedings of the Twenty-second Conference of the International Group for the Psychology of Mathematics Education. (Vol. IV, pp. 81-88) Stellenbosch, South Africa: University of Stellenbosch.

Saaty T.L. & Vargas, L.G. (1984) Comparison of eigenvalue, logarithmic least squares and least squares methods in estimating ratios. Mathematical Modeling, 5, 309-324.

Saaty T.L. (1990) Eigenvector and logarithmic least squares. European Journal of Operational Research. 48, 156-160.

Saaty T.L. (1996) Multicriteria decision making: The analytic hierarchy process. Pittsburgh, PA: RWS Publications.

Samejima F. (1969) Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 34(4), 100-114.

Samejima F. (1969) Estimation of latent trait ability using a response pattern of graded scores. Psychometrika, Monograph Supplement No. 17.

Samejima F. (1972) A general model for free-response data. Psychometrika, Monograph Supplement No. 18.

Samejima F. (1995) Acceleration model in the heterogeneous case of the general graded response model. Psychometrika, 60, 549-572.

Samejima F. (1997) Departure from normal assumptions: a promise for future psychometrics with substantive mathematical modeling. Psychometrika, 62, 471-493.

Sands W.A., Waters, B.K. & McBride, J.R. (1997) Computerized Adaptive Testing : From Inquiry to Operation. Washington, DC: American Psychological Association.

Sawilowski S.S. (2000) Psychometrics versus datametrics: Comment on Vacha-Haase's "reliability generalization" method and some EPM editorial policies. Educational and Psychological Measurement, 60, 157-173.

Schafer J.L. (1997) Analysis of incomplete multivariate data. New York: Chapman and Hall.

Schaubroeck J. & Green, S.G. (1989) Confirmatory factor analytic procedures for assesisng change during organizational entry. Journal of Applied Psychology, 74, 892-900.

Scheiblechner H. (1995) Isotonic ordinal probabilistic models (ISOP) Psychometrika, 60, 281-304.

Scheiblechner H. (1995) Isotonic ordinal probabilistic models (ISOP) Psychometrika, 60, 281304.

Schmitt N. Cortina J.M. & Whitney D.J. (1993) Appropriateness fit and criterion-related validity. Applied Psychological Measurement, 17. 143-150.

Schoonman W. (1989) An applied study on computerized adaptive testing. Rockland MA: Swets & Zeitlinger.

Schumacker R.E. & Lomax, R.G. (1996) A beginner's guide to structural equation modeling. Mahwah, NJ: Lawrence Erlbaum Associates.

Scott J. (1999, January 31) Looking for the tidy mind, alas. The New York Times.

Segal M.E., Heinemann, A.W., Schall, R.R. & Wright, B.D. (1997) Rasch analysis of a brief physical ability scale for long-term outcomes of stroke. Physical medicine and rehabilitation : State of the Art Reviews. 11(2) : 385-396.

Segall D.O. (1996) Multidimensional adaptive testing. Psychometrika, 61, 331-354.

Segall D.O. (2000) Principles of Multidimensional Adaptive Testing.W. J. van der Linden and C.A.W. Glas (eds.), Computerized Adaptive Testing: Theory and practice, 53-57. Dordrecht, The Netherlands: Kluwer Academic Publishers

Shealy R. & Stout, W. (1993) A model-based standardization approach that separates true bias / DIF from group ability differences and detects test bias / DTF as well as item bias / DIF. Psychometrika, 58, 159-194.

Shepard L.A. (1984) Setting performance standards. In R.A. Berk (Ed.) A Guide to criterion-referenced test construction. Baltimore: John Hopkins Press.

Shohamy E. (1983) The stability of oral proficiency assessment on the oral interview testing procedures. Language Learning 33: 527-540.

Shohamy E. (1994) The validity of direct versus semi-direct oral tests. Language Testing 11: 99-123.

Shohamy E., Gordon, C.M. & Kraemer, R. (1992) The effect of raters' background and training on the reliability of direct writing tests. The Modern Language Journal, 76, 27-33.

Shohamy E., Reves E. & Bejarno Y. (1986) Introducing a new comprehensive test of oral proficiency. ELT Journal 40: 212-220.

Shute V.J., Psotka, J. (1996) Intelligent tutoring systems: Past, Present and Future. In D. Jonassen (Ed.),

Siegmund D. (1985) Sequential Analysis: Tests and Confidence Intervals. Springer-Wrlag, New York.

Sijtsma K. & Hemker B.T. (1998) Nonparametric polytomous IRT models for invariant item ordering, with results for parametric models. Psychometrika, 63, 183-200.

Sijtsma K. & Hemker B.T. (2000) A taxonomy for ordering persons and items using simple sum scores. Journal of Educational and Behavioral Statistics, 25, 391-415.

Sijtsma K. & Junker B.W. (1996) A survey of theory and methods of invariant item ordering. British Journal of Mathematical and Statistical Psychology, 49, 79-105.

Sijtsma K. & Van der Ark L.A. (2001) Progress in NIRT analysis of polytomous item scores: Dilemmas and practical solutions. In A. Boomsma M.A.J. van Duijn & T.A.B. Snijders (Eds.), Essays on item response theory (pp. 297 - 318) New York: Springer.

Sijtsma K. & Verweij A.C. (1999) Knowledge of solution strategies and IRT modeling of items for transitive reasoning. Applied Psychological Measurement, 23, 55-68.

Sijtsma K. (1998) Methodology review: Nonparametric IRT approaches to the analysis of dichotomous item scores. Applied Psychological Measurement, 22, 3-31.

Sijtsma K. (1998) Methodology review: Nonparametric IRT approaches to the analysis of dichotomous item scores. Applied Psychological Measurement, 22, 3-32.

Sijtsma K., Hemker, B.T. (1998) Nonparametric polytomous IRT models for invariant item ordering, with results for parametric models. Psychometrika, 63, 183-200.

Sijtsma K., Junker, B.W. (1996) A survey of theory and methods of invariant item ordering. British Journal of Mathematical and Statistical Psychology, 49, 79-105.

Sijtsma K., Junker, B.W. (1997) Invariant item ordering of transitive reasoning tasks. In J. Rost, R. Langeheine (Eds.), Applications of latent trait and latent class models in the social sciences (pp. 97-107) Munster: Waxmann Verlag.

Sijtsma K., Van der Ark, L.A. (this volume) Progress in IRT analysis of polytomous item scores: dilemmas and practical solutions. In A. Boomsma, T. Snijders, M. Van Duijn (Eds.), Essays in Item Response Modeling (pp. xxx-xxx) New York: Springer-Verlag.

Silverstein B., Fisher, W.P., Kilgore, K.M., Harley, J.P. & Harvey, R.F. (1992) Applying psychometric criteria to functional assessment in medical rehabilitation : 11. Defining interval measures. Archives of Physical Medicine and Rehabilitation. 73 : 507-518.

Smedts Diana M.P. (1987) The Rasch Model: Towards An Alternative Process of Item Selection (German), Tijdschrift Voor Onderwijs Research, 12, 355-364

Smith R.M. & Kramer, G.A. (1992) A comparison of two methods of test equating in the Rasch model. Educational and Psychological Measurement, 52, 835-847.

Smith R.M. (1997) Outcome measurement. First international outcome measurement conference, co-sponsored by rehabilitation foundation, Inc., and the MESA Psychometric Laboratory at the University of Chicago. Physical Medicine and Rehabilitation : State of the Art Reviews. June ; 11(2) : ix-x, 261-424.

Smith R.M. (1997) The relationship between goals and functional status in the Patient Evaluation and Conference System. Physical Medicine and Rehabilitation : State of the Art Reviews. June ; 11(2) : 333-343.

Smith Richard M. (1985) A Comparison of Rasch Person Analysis and Robust Estimators, Educational and Psychological Measurement, 45, 433-444

Smith Richard M. (1988) The Distributional Properties of Rasch Standardized Residuals, Educational and Psychological Measurement, 48, 657-667

Smith Richard M. (1994) A Comparison of the Power of Rasch Total and Between-item Fit Statistics to Detect Measurement Disturbances, Educational and Psychological Measurement, 54, 42-55

Snijders M. Van Duijn (Eds.), Essays in Item Response Modeling (pp. xxx-xxx) New York: Springer- Verlag.

Snijders T (2000) Asymptotic distribution of person-fit statistics with estimated person parameter. Psychometrika.

Snijders T.A.B. (1997) Estimation and prediction for stochastic blockmodels for graphs with latent block structure. Journal of Classification, 14 75-100.

Snijders T.A.B. (this volume) Two-level non-parametric scaling for dichotomous data. In A. Boomsma, T.

Spearman C. (1904) "General intelligence" objectively determined and measured. Amer.J. Psychol., 15, 201-293

Spector P.E. (1985) Measurement of human service staff satisfaction: Development of the job satisfaction survey. American Journal of Community Psychology, 13(6), 693-713.

Spiel C. & Gluck, J. (1998) Item response models for assessing change in dichotomous items. International Journal of Behavioral Development, 22, 517-536.

Stankov L., Cregan A. (1993) Quantitative and Qualitative properties of an intelligence test: series completion. Learning and Individual Differences, 5, 2, 137-169.

Stansfield C. & Kenyon D. (1992) Research on the comparability of the oral proficiency interview and the simulated oral proficiency interview. System 20: 347-362.

Stegelmann W. (1983) Expanding the Rasch model to a general model having more than one dimension. Psychometrika, 48, 259-267. Nonparametric and Parametric IRT, and the Future 27

Stegelmann Werner. (1983) Expanding the Rasch Model to a General Model Having More Than One Dimension, Psychometrika, 48, 259-267

Stenson Herbert H. (1986) TESTAT: Test Analysis for the PC and VAX, Psychometrika, 51, 615-616,

Stevens S.S. (1939) On the problem of scales for the measurement of psychological magnitudes.J. Unified Sci., 9, 94-99.

Stocking M.L. & Lewis C. (1998) Controlling item exposure conditional on ability in computerized adaptive testing. Journal of Educational and Behavioral Statistics, 23, 57-75.

Stocking M.L. & Lord, F.M. (1983) Developing a common metric in item response theory. Psychological Bulletin, 99, 118-128.

Stocking M.L. & Swanson L. (1998) Optimal design of item banks for computerized adaptive tests. Applied Psychological Measurement, 22, 271-279.

Stocking, M.L. & Lord, F.M. (1983) Developing a common metric in item response theory. Applied Psychological Measurement, 7 (2), 201-210.

Stone G.E. A standard vision. Popular Measurement: journal of the Institute for Objective Measurement: 3,40-41.

Stone M. & Wright, B.D. (1988) Separation statistics in Rasch measurement (Research Memorandum No. 51) Chicago: MESA Press.

Stout W. & Roussos, L. (1996) SIBTEST manual. Statistical Laboratory for Educational and Psychological Measurement. University of Illinois at Urbana-Champaign.

Stout W. (1987) A non-parametric approach for assessing latent trait unidimensionality. Psychometrika, 52, 589-617.

Stout W. (1987) A nonparametric approach for assessing latent trait unidimensionality, Psychometrika, 52, 589-617.

Stout W. (1990) A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation. Psychometrika, 55(2), 293-325.

Stout W. (1990) DETECT and DIMTEST manual. Statistical Laboratory for Educational and Psychological Measurement. University of Illinois at Urbana-Champaign.

Stout W.F. (1987) A nonparametric approach for assessing latent trait dimensionality. Psychometrika, 52, 589-617.

Stout W.F. (1987) A nonparametric approach for assessing latent trait unidimensionality. Psychometrika, 52, 589-617.

Stout W.F. (1990) A new item response theory modeling approach with applications to unidimensional assessment and ability estimation. Psychometrika, 55, 293-326. 45

Stout W.F. (1990) A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation. Psychometrika, 55, 293-325.

Stout W.F., Habing, B., Douglas, J., Kim, H.R., Roussos, L., Zhang, J. (1996) Conditional covariance-based nonparametric multidimensionality assessment. Applied Psychological Measurement, 20, 331-354.

Streiner D.L. & Norman G.R. (1995) Health Measurement Scales: A Practical Guide to their Development and Use, 2d edition. New York: Oxford University Press.

Stucki G., Daltroy, L., Katz, J.N., Johannesson, M. & Liang, M.H. (1996) Interpretation of change scores in ordinal clinical scales and health status measures : The whole may not equal the sum of the parts. Journal of Clinical Epidemiology. 49(7) : 711-717.

Suanthong S., Schumacker, R.E. & Beyerlein, M.M (2000) An investigation of factors affecting test equating in latent trait theory. Journal of Applied Measurement, 1(1), 25-43.

Suen H.K. (1990) Principles of Test Theory. Lawrence Erlbaum. ISBN: 0-8058-0198-7.

Swaminathan H. (1999) Latent trait measurement models. In: G.N. Masters,. & J.P. Keeves.. Advances in Measurement in Educational Research and Assessment, pp.43-54. Amsterdam: Pergamon.

Swaminathan Hariharan and Gifford, Janice A. (1982) Bayesian Estimation in the Rasch Model, Journal of Educational Statistics, 7, 175-191,

Swanson D.B., Dillon G.F. & Ross L.P. Setting content-based standards for National Board exams: initial research for the comprehensive Part I Examination. Academic Medicine: 65, S17-18.

Sympson J.B. & Hetter R.D. (1985) Controlling item-exposure rates in computerized adaptive testing. Proceedings of the 27th Annual Meeting of the Military Testing Association (pp. 973-977) San Diego, CA: Navy Personnel Research and Development Center.

Tanaka J.S. & Huba G.J. (1985) A fit index for covariance structure models under arbitrary GLS estimation. British Journal of Mathematical and Statistical Psychology, 38, 197-201.

Tanner M.A. (1996) Tools for statistical inference: methods for the exploration of posterior distributions and likelihood functions. 3rd Edition. New York: Springer-Verlag.

Taris T.W. (2000) A primer in longitudinal data analysis. Thousand Oaks, CA: Sage.

Tatsuoka K.K. (1984) Caution indices based on item response theory. Psychometrika, 49. 95-110.

Tatsuoka K.K. (1985) A probabilistic model for diagnosing misconceptions by the pattern classification approach. Journal of Educational Statistcs, 10. 55-73.

Tatsuoka K.K. (1990) Toward an integration of item response theory and cognitive error diagnosis. In N.

Tatsuoka K.K. (1995) Architecture of knowledge structures and cognitive diagnosis: a statistical pattern recognition and classification approach. In P.D. Nichols, S.F. Chipman, R.L. Brennan (Eds.), Cognitively diagnostic assessment (pp. 327-359) Hillsdale, NJ: Lawrence Erlbaum Associates.

Tennant A. & Young, C. (1997) Coma to community : Continuity in measurement. Physical medicine and rehabilitation : State of the Art Reviews. 11(2) : 375-384.

Tennant A., Geddes, J.M.L. & Chamberlain, M.A. (1996) The Barthel Index : an ordinal score or interval level measure? Clinical Rehabilitation. 10 : 301-308.

Tennant A., Hilmann, M., Fear, J., Pickering, A. & Chamberlain, M.A. (1996) Are we making the most of the Stanford Health Assessment Questionnaire? British Journal of Rheumatology. 35 : 574-578.

Ter Hofstede, F. Steenkamp, J.-B. E.M., Wedel, M. (1999) Identifying spatially contiguous international target markets. Manuscript submitted for publication.

Thissen D. (1982) Marginal Maximum Likelihood Estimation for the One-parameter Logistic Model, Psychometrika, 47, 175-186,

Thissen D. & Steinberg L. (1986) A taxonomy of item response models. Psychometrika, 51, 567-577.

Thissen D. & Wainer H. (1982) Some standard errors in item response theory. Psychometrika, 47, 397-412.

Thissen D., Pommerich, M., Billeaud, K. & Williams, V.S.L. (1995) Item response theory for scores on tests including polytomous items with ordered responses. Applied Psychological Measurement, 19, 39-49.

Thompson B. & Vacha-Haase, T. (2000) Psychometrics is datametrics: The test is not reliable. Educational and Psychological Measurement, 60, 174-195.

Thorndike E.L. (1904) An introduction to the theory of mental and social measurements. New York: Teacher's College.

Thorndike R.L. (1971) Concepts of culture-fairness. Journal of Educational Measurement, 8, 63-70.

Thurstone L.L. (1925) A method of scaling psychological and educational tests. Journal of Educational Psychology,(16), 433-451

Tindal G., Marston, D. & Deno, S.L. (1983) The reliability of direct and repeated measurement (Research Report No. 109) Minneapolis, MN: University of Minnesota Institute for Research on Learning Disabilities.

Tinsley Howard E.A. & Dawis, Rene V. (1975) An Investigation of the Rasch Simple Logistic Model: Sample Free Item and Test Calibration, Educational and Psychological Measurement, 35, 325-340

Tjur Tue. (1982) A Connection Between Rasch's Item Analysis Model and a Multiplicative Poisson Model, Scandinavian Journal of Statistics, 9, 23-30,

Traub R.E. (1983) A priori considerations in choosing an item response model. In R.K. Hambleton (Ed.), Applications of item response theory, pp 57-70,

Tsuji T, Liu M. Sonoda S, Domes K, Chino N. (2000) The stroke impairment assessment set: its internal consistency and predictive validity. Archives of Physical Medicine & Rehabilitation 81:863-868.

Tuerlinckx F. & De Boeck, P. (2001) The effect of ignoring item interactions on the estimated discrimination parameters in Item Response Theory. Psychological Methods, 6(2), 181-195.

Tutz G. (1990) Sequential item response models with an ordered response. British Journal of Mathematical and Statistical Psychology, 43, 39-55.

Tutz G. (1997) Sequential models for ordered responses. In W.J. van der Linden & R.K. Hambleton (Eds.), Handbook of modern item response theory (pp. 139 - 152) New York: Springer.

Tutz Gerhard. (1985) Categorical Response Models and Multiple Regression With Dummy Variables (German), Archiv für Psychologie, 137, 99-114,

Ullman S., Karabatsos G., Koss M. (1999) Alcohol and sexual assault for a national sample of college men. Psychology of Women Quarterly, 23, 673-689.

Ullman S., Karabatsos G., Koss M. (1999) Alcohol and sexual assault for a national sample of college women. Journal of Interpersonal Violence, 14, 6, 603-625.

Upshur J. & Turner C. (1999) Systematic effects in the rating of second-language speaking ability: Test method and learner discourse. Language Testing 16: 82-111.

van de Vijver, Fons J.R. (1988) Systematizing the Item Content in Test Design, Latent Trait and Latent Class Models, Plenum (New York; London), 291-307,

Van den Wollenberg A.L., Wierda F.W. & Janssen P.G.W. (1988) Consistency of Rasch model parameter estimation: a simulation study. Applied Psychological Measurement 12, 307-313.

Van den Wollenberg, A.L. (1982) Two new test statistics for the Rasch model. Psychometrika, 47, 123-140.

van den Wollenberg, Arnold L. (1982) A Simple and Effective Method to Test the Dimensionality Axiom of the Rasch Model, Applied Psychological Measurement, 6, 83-91

van den Wollenberg, Arnold L. (1982) Two New Test Statistics for the Rasch Model, Psychometrika, 47, 123-140

Van Der Flier H. (1982) Deviant response patterns and comparability of tests scores. Journal of Cross-Cultural Psychology, 13. 267-298.

van der Linden W.J. & Eggen, J.H.M. (1986) An empirical Bayesian approach to item banking. Applied Psychological Measurement, 10(4), 345354.

Van der Linden W.J. (1994) Fundamental Measurement and the Fundamentals of Rasch Measurement. In M. Wilson (ed.) Objective Measurement: Theory into Practice Vol.2. Ablex Publishing Corp. ISBN: 0-89381-843-1.

Van der Linden W.J. (1998) Bayesian item selection criteria for adaptive testing. Psychometrika, 63, 201-216.

van der Linden W.J. (1999) Multidimensional Adaptive testing with a minimum error-variance criterion. Journal of Educational and Behavioral Statistics, 24, 398-412.

van der Linden W.J. (2000) Constrained Adaptive Testing with Shadow Tests.W. J. van der Linden and C.A.W. Glas (eds.), Computerized Adaptive Testing: Theory and practice, 27-52. Dordrecht, The Netherlands: Kluwer Academic Publishers

Van der Linden, W.J., Hambleton, R.K. (eds.) (1997) Handbook of modern item response theory. New York: Springer Verlag.

Van der Ven A.H.G.S. & Ellis, J.L. (2000) A Rasch Analysis of Raven's Standard Progressive Matrices. Personality and Individual Differences, 29 (1), 45-64.

Van Krimpen-Stoop E.M.L.A. & Meijer, R.R. (2000) Detection of Person Misfit in Computerized Adaptive Tests with Polytomous Items. Research Report 01, University of Twente, The Netherlands.

Van Leir L. (1989) Reeling, writhing, drawling, stretching, and fainting in coils: Oral proficiency interviews as conversation. TESOL Quarterly 23: 489-508.

van Schuur W.H. & Kiers, H.A.L. (1994) Why factor analysis is often the incorrect model for analyzing bipolar concepts, and what model can be used instead. Applied Psychological Measurement, 18, 97-110.

van Schuur W.H. (1984) Structure in political beliefs: A new model for stochastic unfolding with application to European party activists. Amsterdam: CT Press.

Van Schuur W.H. (1993) Nonparametric unidimensional unfolding for multicategory data. In J.R. Freeman (Ed.), Political analysis (Vol. 4, pp 41-74) Ann Arbor, MI: University of Michigan Press.

van Schuur, W.H. & Kiers, H.A.L. (1994) Why factor analysis is often the incorrect model for analyzing bipolar concepts, and what model can be used instead. Applied Psychological Measurement, 18, 97-110.

VanLehn K., Niu, Z. (????) Bayesian student modeling, user interfaces and feedback: A sensitivity analysis. International Journal of Artificial Intelligence in Education.

VanLehn K., Niu, Z., Siler, S., Gertner, A. (1998) Student modeling from conventional test data: a Bayesian approach without priors. In B.P. Goetl, H.M. Halff, C.L. Redfield, V.J. Shute (Eds.), Proceedings of the Intelligent Tutoring Systems Fourth International Conference, ITS 98 (pp. 434-443) Berlin: Springer- Verlag.

Velozo C.A., Kielhofner G. & Lai J.S. (1999, Jan-Feb) The use of Rasch analysis to produce scale-free measurement of functional ability. American Journal of Occupational Therapy, 53(1), 83-90.

Velozo C.A., Magalhaes L.C., Pan A.-W. & Leiter P. (1995) Functional scale discrimination at admission and discharge: Rasch analysis of the Level of Rehabilitation Scale-III. Archives of Physical Medicine and Rehabilitation, 76(8), 705-712.

Velozo C.A., Magalhaes, L.C., Pan, A. & Leiter, P. (1995) Functional scale discrimination at admission and discharge : Rasch analysis of the Level of Rehabilitation Scale-III. Archives of Physical Medicine and Rehabilitation. 76 : 705-712.

Velozo CA, Kielhofner G, Lai JS. (1999) The use of Rasch analysis to produce scale-free measurement of functional ability. American Journal of Occupational Therapy 33:83-90.

Ventana J. , Antequera, T., Ruiz, J., Cava, R., and Alvarez, P. (1996) Measuring Sensorial Quality of Iberian Ham by Rasch Model. Journal of Food Quality. 19, 397-412.

Verguts T, De Boeck P. (2002) Some Mantel-Haenszel tests of Rasch model assumptions. British Journal of Mathematical & Statistical Psychology 34:21-37.

Verguts T. & De Boeck, P, 2000. A note on the Martin-Lof test for unidimensionality. MPR-online, 5,1, 77-82; (Internet www.mpr-online.de)

Verhelst N. & Molenaar, I.W. (1988) Logit Based Parameter Estimation in the Rasch Model, Statistica Neerlandica, 42, 273-295,

Verhelst N.D. & Glas C.A.W. (1995) The one parameter logistic model. In: Fischer G.H.& Molenaar I.W. (Eds.) Rasch models. (pp. 215-237) New York: Springer.

Verhelst N.D. & Glas, C.A.W. (1993) A Dynamic Generalization of the Rasch Model, Psychometrika, 58, 395-415,

Verhelst N.D., Glas C.A.W. & De Vries H.H. (1997) A steps model to analyze partial credit. In W.J. van der Linden & R.K. Hambleton (Eds.), Handbook of modern item response theory (pp. 123 - 138) New York: Springer.

Verhelst N.D., Glas C.A.W. & van der Sluis A. (1984) Estimation problems in the Rasch model: the basic symmetric functions. Computational Statistics Quarterly, 1, 245-262.

Verhelst N.D., Glas C.A.W. & Verstralen H.H.F.M. (1995) OPLM: One Parameter Logistic Model. Computer program and manual. Arnhem, The Netherlands: CITO.

Verhelst N.D., Verstralen, H.H.F.M. (1993) A stochastic unfolding model derived from the partial credit model. Kwantitatieve Methoden, 42, 73-92. Nonparametric and Parametric IRT, and the Future 28

Verhelst N.D., Verstralen, H.H.F.M. (2001) IRT models for multiple raters. In A. Boomsma, T. Snijders, and M. Van Duijn (Eds.), Essays in Item Response Modeling (pp. 89-108) New York: Springer-Verlag.

Vermetten Y., Lodewijks J. & Vermunt J. (1999) The role of personality traits and goal orientations in strategy use. Manuscript submitted to Contemporary Educational Psychology.

Vermunt J.D. (1998) The regulation of constructive learning processes. British Journal of Educational Psychology, 68, 149-171.

Vorberg Dirk and Schwarz, Wolfgang. (1990) Rasch-representable Reaction Time Distributions, Psychometrika, 55, 617-632

Wainer H. & Eignor, D.(2000) Caveats, pitfalls, and unexpected consequences of implementing large-scale computerized testing. In Wainer, Howard (Ed) Computerized adaptive testing: A primer (2nd ed.) pp. 271-299. Mahwah, NJ Lawrence Erlbaum Associates.

Wainer H. (1993) Model-based standardized measurement of an item's differential impact. In P.W. Holland & H. Wainer (Eds.), Differential item functioning (pp.123-135) Hillsdale, NJ: Erlbaum.

Wainer H., Dorans N.J., Flaughter R., Green B.F., Mislevy R.J., Steinberg L. & Thissen D. (2000) Computerized adaptive testing: A Primer. Second Edition. Hillsdale NJ: Lawrence Erlbaum.

Wainer H., Thissen D., Mislevy R.J. (2000) Computerized Adaptive Testing: A Primer. Lawrence Erlbaum Associates, Inc.

Wainer Howard and Morgan, Anne and Gustafsson, Jan-Eric. (1980) A Review of Estimation Procedures for the Rasch Model With An Eye Toward Longish Tests, Journal of Educational Statistics, 5, 35-64,

Wainer Howard and Wright, Benjamin D. (1980) Robust Estimation of Ability in the Rasch Model, Psychometrika, 45, 373-391

Wang W. (1998) Rasch analysis of distractors in multiple choice items. Journal of Outcome Measurement 2(1), 43-65.

Wang W.-C. (1999) Direct Estimation of Correlations Among Latent Traits within IRT Framework. Methods of Psychological Research Online 4(2): 47-70.

Wang W.-C., Adams, R., et al. (1998) Measuring Individual Differences in Change with Multidimensional Rasch Models. Journal of Outcome Measurement 2(3): 240-265.

Wang W.-C., Wilson, M., Adams, R.J. (1997) Rasch models for multidimensionality between items and within items, in M. Wilson, G. Engelhard Jr, K. Draney (Eds.) Objective measurement: Theory into practice, vol.4, 139-155.

Warm T A. (1989) Weighted likelihood estimation of ability in item response theory. Psychometrika, 54, 427-450.

Waugh R.F., Hii T.K. & Islam A. (2000) An Approach to Studying scale for students in higher education: a Rasch measurement analysis. Journal of Applied Measurement, 1(1), 44-62.

Way W.D. (1998) Protecting the integrity of computerized testing item pools. Educational Measurement: Issues and Practice, 17(4), 17-27.

Weigle S. (1998) Using FACETS to model rater training effects. Language Testing 15: 263-287.

Weigle S.C. (1999) Investigating rater/prompt interactions in writing assessment: Quantitative and qualitative approaches. Assessing Writing, 6 (2), 145-178.

Weir C.J., Hughes A. & Porter D. (1990) Reading skills: hierarchies, implicational relationships and identifiability. Reading in a foreign language. 7(1): 505-510.

Weiss D.J. (1982) Improving measurement quality and efficiency with adaptive testing. Applied Psychological Measurement, 6, 473-492.

Weiss D.J. (Ed.) (1978) Proceedings of the 1977 computerized adaptive testing conference. Minneapolis, MN: University of Minnesota, Department of Psychology, Psychometric Methods Program.

Weiss D.J. (Ed.) (1983) New horizons in testing: Latent trait test theory and computerized adaptive testing. New York: Academic Press.

Weiss D.J., Kingsbury G.G. (1984) Application of computerized adaptive testing to educational problems. Journal of Educational Measurement 21:4 361-375.

Welch C. & Hoover H.D. (1993) Procedures for extending item bias detection techniques to polytomously scored items. Applied Measurement in Education, 6, 1-19.

White P.O. (1976) A Note on Keats' Generalization of the Rasch Model, Psychometrika, 41, 405-408

Whitely Susan E. (1977) Models, Meanings and Misunderstandings: Some Issues in Applying Rasch's Theory, Journal of Educational Measurement, 14, 227-235

Whiteneck G.G., Charlifue, S.W., Gerhart, K.A. Overholser, J.D. & Richardson, G.N. (1992) Quantifying handicap : A new measure of long-term rehabilitation outcomes. Archives of Physical Medicine and Rehabilitation. 73 : 519-526.

Wigglesworth G. (1993) Exploring bias analysis as a tool for improving rater consistency in assessing oral interaction. Language Testing 10: 305-335.

Wigglesworth G. (1994) The investigation of rater and task variability using multi-faceted measurement. Report for the National Centre for English Language Teaching and Research, Macquarie University.

Willingham W.W. & Cole N.S. (1997) Gender and fair assessment. Hillsdale, NJ: Lawrence Erlbaum.

Wilson D.T., Wood R. & Gibbons R. (1991) TESTFACT. Test scoring, Item statistics, and Item Factor Analysis. (Computer Software) Chicago IL: Scientific Software International Inc.

Wilson H.G. (1988) Parameter estimation for peer grading under incomplete design. Educational and Psychological Measurement, 48, 69-81.

Wilson M. & Case, H. (2000) An examination of variation in rater severity over time: A study in rater drift. In M. Wilson & G. Engelhard, Jr. (Eds.), Objective Measurement: Theory into Practice (Vol. 5, pp. 113-133) Stamford, CT: Ablex.

Wimsatt W.C. (1981) Robustness, reliability and overdetermination. In M.B. Brewer & B.E. Collins (Eds.), Scientific inquiry and the social sciences. San Francisco: Jossey-Bass.

Wolcott W., et al.(1988) Discrepancies in essay scoring (Report No. TM013018) Springfield, VA: TM Clearinghouse. (ERIC Document Reproduction Service No. ED306246)

Wolfe E.W. & Chiu, C.W.T. (1999) Measuring change across multiple occasions using the Rasch rating scale model. Journal of Outcome Measurement, 3, 360-381.

Wolfe E.W. & Chiu, C.W.T. (1999) Measuring pretest-posttest change with a Rasch rating scale model. Journal of Outcome Measurement, 3,134-161.

Wolfe E.W., Engelhard, G., Jr. & Myford, C.M. (2001, May) Monitoring Reader Performance and DRIFT in the AP English Literature and Composition Exam Using Benchmark Essays. A proposal funded by the Advanced Placement Research and Development Committee, Educational Testing Service, Princeton, NJ.

Wood R. & Wilson, D. (1974) Evidence for differential marking discrimination among examiners of English. The Irish Journal of Education, 8(1), 36-48.

Wood R. (1978) Fitting the Rasch model: A heady tale. British Journal of Mathematical and Statistical Psychology 31:27-32.

Woodcock R.W. (1999) What can Rasch-based scores convey about a person's test performance? In S.E. Embretson & S.L. Hershberger (Eds.), The new rules of measurement: What every psychologist and educator should know. Hillsdale, NJ: Lawrence Erlbaum Associates.

Wright B.D. & Masters G.N. (1981) The measurement of knowledge and attitude (Research memorandum no. 30) Chicago: Statistical Laboratory, Department of Education, University of Chicago.

Wright B.D. & Masters G.N. (1982) Rating scale analysis: Rasch measurement. Chicago: MESA Press.

Wright B.D. & Panchapakesan, N. (1969) A procedure for sample-free item analysis. Educational and Psychological Measurement, 29(1), 23-48.

Wright B.D. & Stone M.H. (1979) Best test design: Rasch measurement. Chicago: MESA Press.

Wright B.D. & Stone M.H. (2003 - perhaps) Directing Observations, Inventing Constructs, Crafting Yardsticks and Examining Fit. Chicago: The Phaneron Press. thephaneronpress.com

Wright B.D. (1968) Sample-free test calibration and person measurement. Proceedings 1967: Invitational conference on testing problems. Princeton: Educational Testing Service, 85-101.

Wright B.D. (1977) Solving measurement problems with the Rasch model. Journal of Educational Measurement, 14, 97-166.

Wright B.D. (1980) Afterword. In Rasch G. (1960) Probabilistic models for some intelligence and attainment tests. (pp. ix-xxiii)The University of Chicago Press.

Wright B.D. (1984) Despair and hope for educational measurement. Contemporary Education Review, 3(1), 281-288.

Wright B.D. (1985) Additivity in psychological measurement. In E.E. Roskam(Ed.), Measurement and Personality Assessment. Amsterdam: North-Holland:Elsevier Science Publishers B.V. pp 101-112.

Wright B.D. (1996) Comparing Rasch measurement with factor analysis. Stuctural Equation modeling, 3(1), 3-24.

Wright B.D. (1997) Fundamental measurement for outcome evaluation. Physical medicine and rehabilitation : State of the Art Reviews. 11(2) : 261-288.

Wright B.D. (1999) Fundamental measurement for psychology. In S.E. Embretson & S.L. Hershberger (Eds.), The new rules of measurement: What every educator and psychologist should know. Hillsdale, NJ: Lawrence Erlbaum Associates.

Wright B.D., Linacre J.M. & Heinemann A.W. (1993) Measuring functional status in rehabilitation. Physical Medicine and Rehabilitation Clinics of North America, 4(3), 475-491.

Wright Benjamin D. (1977) Misunderstanding the Rasch Model, Journal of Educational Measurement, 14, 219-225

Wright T.A., Bennett, K.K. & Dun, T. (1999) Life and job satisfaction. Psychological Reports, 84(3, pt.1) 1025-1028.

Wu M.L., Adams R.J., Wilson M.R. (1997) ConQuest: Generalized item response modeling software. ACER.

Wu M.L., Adams R.J., Wilson M.R. (1998) ACER ConQuest: generalized item response modelling software. Melbourne: Australian Council for Educational Research.

Yamamoto K., Gitomer, D.H. (1993) Application of a HYBRID model to a test of cognitive skill representation. In N. Fredriksen, R.J. Mislevy (Eds.), Test theory for a new generation of tests (pp. 275-295) Hillsdale, NJ: Lawrence Erlbaum Associates.

Yamauchi Kana. (1999) Comparing Many-facet Rasch Model and ANOVA model: Analysis of ratings of essays [in Japanese]. Japanese Journal of Educational Psychology. Vol 47(3), Sep., 383-392.

Yen W.M. (1981) Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5, 245-262.

Yen W.M. (1984) Effects of local item dependence on the fit and equating performance of the three parameter logistic model. Applied Psychological Measurement, 8, 125-145.

Yen W.M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187-213.

Young J.W. (1990) Adjusting the cumulative GPA using item response theory. Journal of Educational Measurement, 27, 175-186.

Yuan A., Clarke, B. (1999) Manifest characterization and testing for two latent traits. Manuscript submitted for publication.

Zhu W. (1996) Should total scores from a rating scale be used directly? Research Quarterly for Exercise and Sport, 67(3), 363-372.

Zhu W., Updyke W.F., Lewandowski, C (1997) Post Hoc Rasch analysis of optimal categorization of an ordered response scale. Journal of Outcome Measurement, 1(4) p.286-304.

Zimowski M.F., Muraki, E., Mislevy, R.J. & Bock, R.D. (1999) BILOG-MG: Multiple-Group IRT Analysis and Test Maintenance for Binary Items. Scientific Software International, Inc. Chicago, IL.

Zimowski M.F., Muraki, E., Mislevy, R.J., Bock, R.D. (1997) BILOG-MG. [Computer program]. Chicago: Scientific Software Inc. Online description available: ssicentral.com. Accessed 28 April 2000.

Zwick R. (1992) Special issue on the National Assessment of Educational Progress. Journal of Educational Measurement, 17, 93-94.

Zwick R., Donoghue J.R. & Grima A. (1993) Assessment of differential item functioning for performance tasks. Journal of Educational Measurement, 30, 233-251.

Zwinderman A.H. & van den Wollenberg, Arnold L. (1990) Robustness of Marginal Maximum Likelihood Estimation in the Rasch Model, Applied Psychological Measurement, 14, 73-81

Zwinderman A.H. (1991) A Generalized Rasch Model for Manifest Predictors, Psychometrika, 56, 589-600,

Zwinderman A.H. (1995) Pairwise parameter estimation in Rasch models. Applied Psychological Measurement, 19(4), 369-375.

Zwinderman A.H. (1997) Response models with manifest predictors. In van der Linden W.J. & Hambleton R.K. Handbook of Modern Item Response Theory. New York: Springer.

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn. George Engelhard, Jr. & Jue Wang	Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Jan. 16 - Feb. 13, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Apr. 8 - Apr. 11, 2026, Wed.-Sat.	National Council for Measurement in Education - Los Angeles, CA, ncme.org/events/2026-annual-meeting
Apr. 8 - Apr. 12, 2026, Wed.-Sun.	American Educational Research Association - Los Angeles, CA, www.aera.net/AERA2026
May. 15 - June 12, 2026, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 19 - July 25, 2026, Fri.-Sat.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com