Thinking about Validity: The Case of Functional Assessment

The word "validity" has its roots in the Latin ualere, "to be strong". Other words sharing the same root include available, convalesce, prevail, valiant, valor, and value. A valid measure's value could well be said to reside in the strength with which it makes an intended effect or phenomenon available for examination, experimental comparison, and application. Highly valid measures robustly resist tests of their strength and persistently prevail in stable states across samples, instruments, researchers, time, space, etc. Invalid measures, then, are weak and of less value because they provide less evidence that the thing measured is what is supposed to be measured, and do not hold up when subjected to the stresses of application.

For instance, a 15mm wrench fits with a small degree of error around the head of a 15mm machine screw or bolt. The strength and value of this measure are tested by the extent to which the fit of the wrench on the bolt head (and the structural integrity of the wrench handle) provides leverage for turning the bolt and screwing it in place, or removing it. The validity and practical value of the wrench as a measure of the bolt head and of the screw's leverageable capacity to function as an inclined plane follow from the extent to which it repeatedly facilitates the production of a particular effect (torque) at the point of use. The validity and value of the theory informing the process stem from the extent to which the mathematical relations of force, mass, and acceleration can be predicted for any combination of wrench, bolt, and application, anywhere and any time.

Similarly, a functional assessment adaptively targeted in a medical rehabilitation context at 350 PAR (Physical Activity Rehabits) brings the mobility and ADL skills of a 350 PAR stroke survivor into sharp focus for the informed therapist. The strength and value of this measure are tested, in one way, by the extent to which the targeting of the assessment provides leverage for moving the stroke survivor's mobility and ADL skills higher up the PAR scale. The validity and value of the assessment as a measure of physical activity follow from the extent to which it repeatedly facilitates the production of the desired effect at the point of use. In the absence of a valid qualitative or quantitative conceptual measure of physical activity, it would be possible neither to assess how much functionality the stroke survivor possesses, nor how much, if any, change in functionality occurs over time.

It also follows, then, that the validity and value of the theory informing the process stem from the extent to which the mathematical relations of functional ability, task difficulty, rater harshness, and expected percent independent can be predicted for any combination of rehabilitation candidate, physical activity, therapist, and functional independence. In the absence of quantitative measures, theory remains mathematical to the extent that some degree of transparency in the relevant relations is obtained (Fisher, 2003). Therapists can (and routinely do), for instance, intuit whether any given patient will be able to perform any given task with a given degree of independence. Valid intuitions concerning the correspondence between patients' abilities and various task difficulties provide an initial degree of the mathematical clarity and proportionate rationality that enable a field of practice to take on a coherent identity as a community.

Locally more advanced degrees of clarity of mathematical views of functional independence provide more value to rehabilitation practitioners by providing a stronger, experimentally-based quantitative measure of constant amounts. The validity of qualitative intuitions is compromised by their variability across therapists and by the lack of a systematic frame of reference for communicating their meaning, and the same problems are associated with functional assessments that stop with the method of summated ratings (Merbitz, Morris, & Grip, 1989; Michell, 2003). Calibrated additive representations overcome these limitations by locating patients' abilities, task difficulties, and sometimes rater harshness on a common continuum capable of providing quantitative measures (among many others, see Silverstein, Fisher, Kilgore, et al., 1992; Heinemann, Linacre, Wright, et al., 1993; Fisher, Bryze, Granger, et al., 1994; Velozo, Kielhofner, & Lai, 1999). The local validity of these measures for distinguishing between various groups of rehabilitation clients and predicting the relevant level of care is well established (Harvey, Silverstein, Venzon, et al., 1992; Heinemann, Linacre, Wright, et al., 1994).

But yet more advanced degrees of such clarity and rationality have become available as different instruments intended to measure the same physical functioning construct have been shown to do so in linearly transformed versions of the same metric (Grimby, Andrén, Holmgren, et al., 1996; Fisher, 1997; Fisher, Eubanks, & Marier, 1997; Segal, Heinemann, Schall, et al., 1997; Wolfe, Hawley, Goldenberg, et al., 2000; Wolfe, 2001; Zhu, 2001), which might be termed the Rehabit (Fisher, Harvey, Taylor, et al., 1995). Evidence strongly supports the possibility that several, if not many, of the functional assessment instruments currently in use could be equated to a common reference standard. In this context, it would become possible for all users of functional assessment measures to use the same numeric language for referring to demonstrably constant amounts of more and less functionality.

Measurement validity is inherently a matter of the value of the practical consequences that follow from the application of an instrument. The full potential of integrated instruction or rehabilitation and assessment will be realized only when three steps are taken. First, experimental assessments of instruments must focus on establishing the existence of a single one thing that adds and divides up consistently and proportionately enough to be represented by numbers (Rasch, 1960). Second, different instruments supposed to measure the same variable ought to be examined for convergence on a common construct (Fisher, 1997) and equated if the evidence supports that course of action.

Third, every class of potential and actual users of functional assessment measures, from the treatment teams and the clients to researchers, disability advocacy groups, educators, payors, accreditors, administrators, and accountants, all need to agree on basic conventions of data quality, the quantitative unit's size and range, valid applications and inferences, and systems for maintaining and improving the metric across instruments. When all three of these steps are taken, we will arrive at a system of functional metrology with the widely distributed strength and generalized value of other metrological systems, such as the one that makes it possible in principle for any metric wrench manufactured by any tool company to fit any metric bolt anywhere in the world on any hour of any day. If and when we can also arrive at a pure mathematical theory of functionometric relationships, then we will have opened the door to a new kind of scientific revolution, one like the second scientific revolution of the nineteenth century in being provoked by "the immense efficacy of quantitative experimentation undertaken within the context of a fully mathematized theory" (Kuhn, 1977, pp. 219-20).

After all, what might we expect to happen if and when everyone researching or practicing physical rehabilitation thinks about the constructs of functional assessment in a common language? What might follow from everyone repeatedly seeing the consistency with which experimentally controlled, and even everyday variations in, treatment, initial status, length of stay, etc. do, or do not, affect functional assessment measures? Research in cognitive psychology (for instance, among many others, Hutchins, 1995; Latour, 1995) suggests that we are highly likely to also see a manifestation of the collective, group-level effect characteristic of distributed thinking. The technologically-embodied cognition effected by a standardized metric gives birth to a propagation of one and the same construct through different media.

This process, and not metaphysically vapid claims about unobservable mental events, provides the only documentable evidence of representation that anyone has made available to date. Navigational charts, for instance, do not make anything observable in and of themselves. No, a tool like a chart functions only insofar as a navigator, a pilot, and the chart maker are able make features on the landscape correspond with the features on the chart en route to achieving some change in position relative to those features. In other words, a map mediates relationships between people with different perspectives, and so validly provides practical value and supports strong inferences, only insofar as it helps them get where they want to go. Insofar as maps of functional assessment variables are valid, should we expect any less strength and value from them?

Fisher, A. G., Bryze, K. A., Granger, C. V., Haley, S. M., Hamilton, B. B., Heinemann, A. W., Puderbaugh, J. K., Linacre, J. M., Ludlow, L. H., McCabe, M. A., Wright, B. D. (1994) Applications of conjoint measurement to the development of functional assessments. International Journal of Educational Research, 21(6), 579-593.

Fisher, W. P., Jr. (1997) Physical disability construct convergence across instruments: Towards a universal metric. Journal of Outcome Measurement, 1(2), 87-113.

Fisher, W. P., Jr. (2003) Mathematics, measurement, metaphor, metaphysics: Part I. Implications for method in postmodern science. Theory & Psychology, 13(6), 753-90.

Fisher, W. P., Jr., Eubanks, R. L., Marier, R. L. (1997) Equating the MOS SF36 and the LSU HSI physical functioning scales. Journal of Outcome Measurement, 1(4), 329-362.

Fisher, W. P., Jr., Harvey, R. F., Taylor, P., Kilgore, K. M., Kelly, C. K. (1995) Rehabits: A common language of functional assessment. Archives of Physical Medicine and Rehabilitation, 76(2), 113-122.

Grimby, G., Andrén, E., Holmgren, E., Wright, B., Linacre, J. M., Sundh, V. (1996) Structure of a combination of Functional Independence Measure and Instrumental Activity Measure items in community-living persons: A study of individuals with spina bifida. Archives of Physical Medicine and Rehabilitation, 77(11), 1109-1114.

Harvey, R. F., Silverstein, B., Venzon, M. A., Kilgore, K. M., Fisher, W. P., Jr., Steiner, M., Harley, J. P. (1992, October) Applying psychometric criteria to functional assessment in medical rehabilitation: III. construct validity and predicting level of care. Archives of Physical Medicine and Rehabilitation, 73(10), 887-892.

Heinemann, A. W., Linacre, J. M., Wright, B. D., Hamilton, B. B., Granger, C. V. (1993) Relationships between impairment and physical disability as measured by the Functional Independence Measure. Archives of Physical Medicine and Rehabilitation, 74(6), 566-573.

Heinemann, A. W., Linacre, J. M., Wright, B. D., Hamilton, B. B., Granger, C. V. (1994) Prediction of rehabilitation outcomes with disability measures. Archives of Physical Medicine and Rehabilitation, 75(2), 133-143.

Kuhn, T. S. (1977) The function of measurement in modern physical science. In T. S. Kuhn, The essential tension: Selected studies in scientific tradition and change (pp. 178-224) Chicago: University of Chicago Press.

Latour, B. (1995) Cogito ergo sumus! Or psychology swept inside out by the fresh air of the upper deck: Review of Hutchins' Cognition in the Wild, MIT Press, 1995. Mind, Culture, and Activity, 3(1), 54-63.

Linacre, J. M., Heinemann, A. W., Wright, B. D., Granger, C. V., Hamilton, B. B. (1994) The structure and stability of the Functional Independence Measure. Archives of Physical Medicine and Rehabilitation, 75(2), 127-132 www.rasch.org/memo50.htm.

Merbitz, C., Morris, J., Grip, J. (1989) Ordinal scales and the foundations of misinference. Archives of Physical Medicine and Rehabilitation, 70, 308-312.

Michell, J. (2003) Measurement: A beginner's guide. Journal of Applied Measurement, 4(4), 298-308.

Rasch, G. (1960) Probabilistic models for some intelligence and attainment tests (Foreword, Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen: Danmarks Paedogogiske Institut.

Segal, M., Heinemann, A., Schall, R. R., Wright, B. D. (1997) Extending the range of the Functional Independence Measure with SF-36 items. Physical Medicine & Rehabilitation: State of the Art Reviews, 11(2), 385-396.

Velozo, C. A., Kielhofner, G., Lai, J. S. (1999) The use of Rasch analysis to produce scale-free measurement of functional ability. American Journal of Occupational Therapy, 53(1), 83-90.

Wolfe, F. (2001) Which HAQ is best? A comparison of the HAQ, MHAQ and RA-HAQ, a difficult 8 item HAQ (DHAQ), and a rescored 20 item HAQ (HAQ20): Analyses in 2,491 rheumatoid arthritis patients following leflunomide initiation. Journal of Rheumatology, 28(5), 982-9.

Wolfe, F., Hawley, D., Goldenberg, D., Russell, I., Buskila, D., Neumann, L. (2000) The assessment of functional impairment in fibromyalgia (FM): Rasch analyses of 5 functional scales and the development of the FM Health Assessment Questionnaire. Journal of Rheumatology, 27(8), 1989-99.

Zhu, W. (2001, Jan). An empirical investigation of Rasch equating of motor function tasks. Adapted Physical Activity Quarterly, 18(1), 72-89.

Thinking about Validity: The Case of Functional Assessment, Fisher W.P. Jr. … Rasch Measurement Transactions, 2004, 18:1 p.964-966

Rasch Books and Publications
Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, 2nd Edn. George Engelhard, Jr. & Jue Wang	Applying the Rasch Model (Winsteps, Facets) 4th Ed., Bond, Yan, Heene	Advances in Rasch Analyses in the Human Sciences (Winsteps, Facets) 1st Ed., Boone, Staver	Advances in Applications of Rasch Measurement in Science Education, X. Liu & W. J. Boone	Rasch Analysis in the Human Sciences (Winsteps) Boone, Staver, Yale
Introduction to Many-Facet Rasch Measurement (Facets), Thomas Eckes	Statistical Analyses for Language Testers (Facets), Rita Green	Invariant Measurement with Raters and Rating Scales: Rasch Models for Rater-Mediated Assessments (Facets), George Engelhard, Jr. & Stefanie Wind	Aplicação do Modelo de Rasch (Português), de Bond, Trevor G., Fox, Christine M	Appliquer le modèle de Rasch: Défis et pistes de solution (Winsteps) E. Dionne, S. Béland
Exploring Rating Scale Functioning for Survey Research (R, Facets), Stefanie Wind	Rasch Measurement: Applications, Khine	Winsteps Tutorials - free Facets Tutorials - free	Many-Facet Rasch Measurement (Facets) - free, J.M. Linacre	Fairness, Justice and Language Assessment (Winsteps, Facets), McNamara, Knoch, Fan
Other Rasch-Related Resources: Rasch Measurement YouTube Channel
Rasch Measurement Transactions & Rasch Measurement research papers - free	An Introduction to the Rasch Model with Examples in R (eRm, etc.), Debelak, Strobl, Zeigenfuse	Rasch Measurement Theory Analysis in R, Wind, Hua	Applying the Rasch Model in Social Sciences Using R, Lamprianou	El modelo métrico de Rasch: Fundamentación, implementación e interpretación de la medida en ciencias sociales (Spanish Edition), Manuel González-Montesinos M.
Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar	Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch	Rasch Models for Measurement, David Andrich	Constructing Measures, Mark Wilson	Best Test Design - free, Wright & Stone Rating Scale Analysis - free, Wright & Masters
Virtual Standard Setting: Setting Cut Scores, Charalambos Kollias	Diseño de Mejores Pruebas - free, Spanish Best Test Design	A Course in Rasch Measurement Theory, Andrich, Marais	Rasch Models in Health, Christensen, Kreiner, Mesba	Multivariate and Mixture Distribution Rasch Models, von Davier, Carstensen

Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.

Coming Rasch-related Events
Apr. 21 - 22, 2025, Mon.-Tue.	International Objective Measurement Workshop (IOMW) - Boulder, CO, www.iomw.net
Jan. 17 - Feb. 21, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
Feb. - June, 2025	On-line course: Introduction to Classical Test and Rasch Measurement Theories (D. Andrich, I. Marais, RUMM2030), University of Western Australia
Feb. - June, 2025	On-line course: Advanced Course in Rasch Measurement Theory (D. Andrich, I. Marais, RUMM2030), University of Western Australia
May 16 - June 20, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com
June 20 - July 18, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com
July 21 - 23, 2025, Mon.-Wed.	Pacific Rim Objective Measurement Symposium (PROMS) 2025, www.proms2025.com
Oct. 3 - Nov. 7, 2025, Fri.-Fri.	On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com