Jump to:

1. Overview

The recent national emphasis on outcomes as a valid means of defining value in health care has overshadowed the fact that outcomes are simply events or measurements ascertained downstream from health care-related occurrences (events, procedures, exposures, etc) such that their magnitude reflects on the positive or negative impact of an occurrence. Event outcomes (eg, death, return to work, institutionalization) are relatively straightforward summative, often binary, reflections of the broad impact of such occurrences. In contrast, measurement outcomes are generally more discrete assessments, usually quantitative, that evaluate relatively limited dimensions of an occurrence’s impact. For example, gait speed would provide important information regarding a patient’s mobility following limb salvage (the occurrence), but it would fail to offer information regarding other important aspects of recovery (eg, pain relief, cosmesis, health care utilization, vocational viability). The reader should bear in mind that it is only through the downstream application of these measures that they become outcomes. The same instruments and assessment techniques are often used as tools for screening, treatment determination, and clinical assessment.

Performance measures

The functional outcomes routinely collected from cancer populations are distinguished by several attributes that relate to the needs of oncologic clinicians and trialists. The former have long recognized that a patient’s functional capabilities offer valuable information regarding their prognosis and ability to tolerate toxic anticancer therapies. This recognition was formalized in the Karnofsky Performance Scale (KPS), a 10-point ordinal scale used by clinicians to rate a patient’s overall performance status and thereby determine treatment eligibility. KPS scores are assigned by clinicians based on their overall impression of a patient’s performance status during a clinic or hospital visit. The KPS is an 11-point ordinal scale the ranges from 100% (normative, no complaints, no evidence of disease) to 0% (dead). KPS levels that suggest a need for physiatric intervention include 70% (cares for self, unable to carry on normative activity or do active work) through 40% (disabled, requires special care and assistance.) The clinician may or may not query patients regarding their functional capabilities, and the assignment of KPS scores does not require the clinician to observe a patient engaging in specific tasks. Therefore, KPS scores are highly subjective. Nonetheless, KPS scores are prognostic, accord with patients’ self-assessments in roughly one third of cases, and remain an important determinant of treatment eligibility, particularly in clinical trials. The finding that collapsing the KPS scale to 5 rather than 10 points did not lessen its prognostic capabilities led to the widespread adoption of the 5-point Eastern Cooperative Oncology Group (ECOG) performance scale as a more succinct alternative. ECOG scores are ordinal but more limited than the KPS scores in that they range from 0 (asymptomatic, fully able to carry on all predisease activities without restriction) to 5 (death). ECOG scores are currently more commonly used in clinical practice and as eligibility criteria for clinial trials. Somewhat counterintuitively, lower ECOG scores imply better function, whereas the reverse is true for KPS scores.

Functional subscales of cancer-specific quality of life measures

Clinical trialists’ need to determine the cost-to-benefit ratios of anticancer therapies that have driven the development of cancer-specific quality of life (QOL) measures. The uptake of these measures, which include the family of Functional Assessment of Cancer Treatment (FACT) tools and European Organization for Research and Treatment of Cancer QOL Questionnaire, has been largely confined to research because few are calibrated to provide clinicians with actionable data. Cancer-specific tools overwhelmingly conceptualize QOL as a multidomain contruct comprised of social, psychologic, and functional domains (among others depending on the instrument). The FACT family, for example, includes function as 1 of its 5 essential QOL domains, which do not vary across the tools. Specificity to cancer type (eg, breast, colon, lung) is conferred by adding disease-specific items at the end of the 27-item, 4-domain FACT-General questionnaire. The functional domain of the FACT and other cancer-specific QOL tools have been subjected to rigorous psychometric scrutiny and display excellent characteristics (validity, reliability, responsiveness, etc). However, for the most part, they fail to provide the simultaneously broad and granular information on a patient’s functional capabilities required by physiatrists to determine treatment effectiveness in practice or research.

Functional subscales of generic QOL measures

The physical functioning subscale (PF-10) of the Medical Outcomes Study 36-Item Short-Form Health Survey is a notable exception. This 10-item tool has been used to assess varied clinical populations, including patients with cancer, for decades. Therefore, normative data are available for most clinical populations of interest. Further, the PF-10 has been translated and validated in many languages. It has fewer ceiling and floor effects than other instruments across the broad range of performance capability that characterize cancer populations.Each item describes an activity (eg, stair climbing, walking) and offers 3 ordinal response options regarding the difficulty that a patient experiences with the activity. A notable benefit of the PF-10 for research purposes is its broad acceptance by reviewers of grants and manuscripts and the robustness of the supportive literature.

Condition- or body part-specific outcomes

Lymphedema and upper quadrant-specific questionnaires have been developed for use among cancer survivors and the general population.The Disabilities of the Arm, Shoulder, and Hand questionnaire and University of Pennsylvania Shoulder Score have been widely to characterize disability among breast cancer cohorts.3

Utility of functional outcomes routinely used in physical medicine and rehabilitation

Conventional clinician-rated, rehabilitation functional assessment tools (eg, FIM instrument) have been limitedly applied to cancer populations. Several FIM-based reports comparing the outcomes of patients with malignant versus traumatic spinal cord and brain injury based their conclusions on FIM scores, but they did not examine whether the FIM’s psychometric performance differed between the study groups. Theoretically, there are no compelling grounds to suspect that the psychometric characteristics of the FIM or other functional measures used in physical medicine and rehabilitation may differ in cancer populations. However, many of the limitations (eg, marked ceiling effects, inconsistent precision across the relevant trait range) that have been reported for the FIM in noncancer populations constrain its utility in assessing cancer patients as well. Additionally, a recent report noted that the presence of pain led clinicians’ FIM-based assessments to differ systematically from the assessments of patients with cancer regarding their functional capabilities.Because pain is prevalent among cancer populations, it is important, when relying on clinician-rated tools (eg, FIM instrument), to appreciate that a patient’s perception of task difficulty may differ substantially from the clinician’s.

Performance tests and batteries

Objective performance tests (eg, 6-minute walk test, repeated sit-to-stand, timed up and go test) have been used to assess cancer populations.There are no empirical bases to suspect that these measures perform differently in cancer populations than the general population. Similarly, test batteries (eg, Short Physical Performance Battery) appear to perform similarly among patients with cancer relative to the general population.

2. Cutting edge/emerging and unique concepts and practice

Item response theory-based functional outcomes

Over the last decade, item response theory (IRT) has been increasingly espoused as a more versatile and robust approach to the assessment of latent traits (those that are cannot be objectively measured) than classical test theory. IRT maintains that any item is a potentially valuable source of information and that its validity is not contingent on being imbedded within a specific collection of items, which must always be presented in the same order and format. IRT models order items along a unidimensional trait continuum (eg, verbal intelligence, anxiety) based on the likelihood that a respondent with a given trait level (eg, low vs high anxiety) will endorse or respond correctly to a given item. In the case of function, the trait continuum is quite broad, ranging from bed-based patients with plegia to elite, endurance-trained athletes. Therefore, measurements that use any fixed collection of items will likely fail to precisely distinguish individuals at all relevant trait levels. Estimates generated with classical test theory-based instruments are particularly vulnerable to error inflation at the scale extremes. Also, some items may not be appropriate for a given population; however, per classical test theory, these items cannot be eliminated without jeopardizing measurement validity. IRT breaks radically with the mandates of classical test theory in ways that offer highly desirable flexibility to clinicians and researchers.

An IRT-based assessment tool is comprised of an item bank, preferably with a large number of highly discriminating items, and a means of administering the items. Administration tools may be preestablished short forms (generally comprised of the most discriminating items that provide coverage of the entire trait range), items selected for a specific purpose or to evaluate a specific population (eg, ambulatory cancer survivors), or computer adaptive tests (CATs). A salient strength of IRT rests is the fact that regardless of which or how many items may be administered, all resultant scores can be compared on a common scale. Therefore, 2 clinicians or researchers could administer entirely different groups of items from an IRT-modeled bank but be able to compare the scores from their populations of interest. CATs are computer algorithms that serially administer the items that are most likely to improve the precision of a respondent’s score, based on their prior responses. CATs produce the most precise score estimates with the fewest items and are, therefore, efficient and respondent-friendly. At this juncture, however, they have been limitedly integrated into research and clinical activities.

Three IRT-based item banks are currently available to assess function. The Activity Measure for Post Acute Care (AM PAC) is comprised of 3 domains: mobility, daily activities, and applied cognition. The AM PAC CAT has been found to be valid and responsive among patients with advanced cancer.It is available gratis to researchers and nonprofit organizations. The National Institutes of Health-funded Patient-Reported Outcome Medical Information System (PROMIS) includes a functional item bank that combines mobility and daily activities into a single domain. The PROMIS also includes a Cancer PROMIS Supplement with a functional item bank specific to cancer patients.7Per the developing team, both PROMIS banks are quite similar. Neither, to date, have been psychometrically vetted in a cancer population beyond the calibration cohorts.

3. Gaps in the evidence-based knowledge

IRT-based assessment tools have many appealing measurement characteristics; however, their performance in heterogeneous clinical populations, including patients with cancer, is only now being rigorously studied. Early reports suggest that these tools may not perform consistently across all populations, as was initially hoped. Therefore, clinicians and investigators electing to use these tools should proceed with an awareness that their precision may vary and related conclusions should be appropriately tempered.


1. Cheville AL, Basford JR, Troxel AB, Kornblith AB. Performance of common clinician- and self-report measures in assessing the function of community-dwelling people with metastatic breast cancer.Arch Phys Med Rehabil.2009;90(12):2116-2124.

2. Yost KJ, Cheville AL, Weaver AL, Al Hilli M, Dowdy SC. Development and validation of a self-report lower-extremity lymphedema screening questionnaire in women.Phys Ther. 2013;93(5):694-703.

3. Harrington S, Michener LA, Kendig T, Miale S, George SZ. Patient-reported upper extremity outcome measures used in breast cancer survivors: a systematic review.Arch Phys Med Rehabil. 2014;95(1):153-162.

4. Cheville AL, Basford JR, Dos Santos K, Kroenke K. Symptom burden and comorbidities impact the consistency of responses on patient-reported functional outcomes.Arch Phys Med Rehabil. 2014;95(1):79-86.

5. Hoppe S, Rainfray M, Fonck M, et al. Functional decline in older patients with cancer receiving first-line chemotherapy.J Clin Oncol.2013;31(31):3877-3882.

6. Cheville AL, Yost KJ, Larson DR, et al. Performance of an item response theory-based computer adaptive test in identifying functional decline.Arch Phys Med Rehabil.2012;93(7):1153-1160.

7. Garcia SF, Cella D, Clauser SB, et al. Standardizing patient-reported outcomes assessment in cancer clinical trials: a patient-reported outcomes measurement information system initiative.J Clin Oncol.2007;25(32):5106-5112.

Author Disclosure

Andrea L. Cheville, MD
Nothing to Disclose