Abstract
Background and Purpose. Pain and physical function are core outcome measures for people with osteoarthritis, and self-report questionnaires have been the preferred assessment method. There is evidence suggesting that self-reports of physical function represent what people experience when performing activities rather than their ability to perform activities. The purpose of this study was to examine the factorial validity of performance-specific assessments of pain and function. Subjects. The sample consisted of 177 participants who had osteoarthritis of the hip (n=81) or knee (n=96) and who were awaiting total joint arthroplasty. Methods. Through a cross-sectional design, participants performed 4 performance activities (self-paced walk test, stair test, Timed “Up & Go” Test, and Six-Minute Walk Test). Outcomes were time or distance (function) and pain ratings obtained immediately after each activity. The authors conceptualized 2 correlated factors, with pain items loading uniquely on 1 factor and functional items loading on the second factor, and uncorrelated error terms. Confirmatory factor analysis was applied. Results. Initial analysis yielded results consistent with the conceptualized model in this study with the exception of a nonzero correlation between the stair pain and function error terms. Dropping the stair test provided results consistent with the conceptualized model. Discussion and Conclusion. Given the limitations of self-report alone as a method of obtaining reasonably distinct assessments of pain and function, the extent to which performance-specific assessments could accomplish this goal was examined in this study. It was found that collectively the walk test, Timed “Up & Go” Test, and Six-Minute Walk Test yielded 2 factors consistent with the health concepts of pain and function. The authors believe that the application of these tests may provide clinicians and clinical researchers with more distinct impressions of pain and function that complement information from self-report measures.
- Factorial validity
- Osteoarthritis
- Outcome assessment
Patients with osteoarthritis (OA) and those progressing to arthroplasty often present with pain and limitations of physical function. The rate, pattern, and direction of change may differ for pain and function depending on the period in the natural or clinical history over which a patient is assessed.1,2 For example, at 2 months after total joint arthroplasty of the hip or knee, patients' pain ratings were shown to be comparable to or lower than their preoperative values; however, the time to complete performance tasks was substantially increased.2,3 Because pain and physical function represent different but related health concepts and interventions targeting them frequently differ, separate assessments of these attributes were recommended at the Outcome Measures in Arthritis Clinical Trials conference (OMERACT) III.4 An intriguing aspect of the OMERACT III outcome model was that although the assessment of physical function was essential, the application of performance tests was optional.4 Self-report measures of physical function also were favored over performance measures in an authoritative review of outcome measures for people with OA.5 This recommendation apparently was based on the lower cost and ease of administration associated with self-report measures; however, it assumes that self-report measures and performance measures of physical function assess the same attribute and nothing else.
Although many self-report measures profess to assess physical function, few provide an operational definition of its intended meaning. A noted exception is the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) physical function subscale, which provides the following statement: “By this we mean your ability to move around and to look after yourself.”6 We suspect that this statement captures the intended meaning of lower-extremity physical functional status left undeclared by many researchers, and it is representative of our view of lower-extremity functional status.
Contrary to the belief that self-report measures and performance measures of physical function provide comparable information is a body of work refuting this idea.2,3,7 Parent and Moffet,2 in a study of patients after total knee arthroplasty, noted improvement in self-reported physical function as measured by the WOMAC and the Medical Outcomes Study 36-Item Health Survey Questionnaire (SF-36) physical function subscales but a significant reduction in the 6-minute walking distance when assessed at 2 months after arthroplasty. Maly et al,7 in an investigation of patients with OA of the knee, reported higher correlations between pain and WOMAC and SF-36 physical function scores than between pain and 3 performance measures (Six-Minute Walk Test, Timed “Up & Go” Test, and a stair test). Using a stepwise linear regression analysis that included pain and thigh muscle strength as independent variables and WOMAC and SF-36 physical function subscales as dependent variables, these investigators also found that pain was more predictive of self-reported function than muscle strength.7 Stratford and Kennedy,3 in a study of patients after hip or knee arthroplasty, reported higher standardized regression coefficients between pain and WOMAC physical function scores and change scores than between pain and the time or distance associated with several performance tasks. Also reported in this article was the finding that self-reported Lower-Extremity Functional Scale (LEFS) scores were most strongly associated with pain preoperatively, exertion when assessed within 2 weeks of arthroplasty, and the time or distance associated with performance measures when evaluated approximately 2 months after arthroplasty.3
A further insight is provided in a study that examined the relationship between performance-rated components of pain, exertion, and function (time or distance) and LEFS scores.8 Patients with end-stage OA of the hip or knee and awaiting arthroplasty performed 3 performance tasks—40-m self-paced walk, stair test, and Timed “Up & Go” Test—and completed the LEFS. Immediately following each performance task, patients reported the amount of pain and exertion that they experienced.8 An exploratory factor analysis identified 3 factors, with pain responses loading on 1 factor, exertion loading on the second factor, and time loading on the third factor. The LEFS loaded on all 3 factors (pain=.44; exertion=.41; and time=.35).8 Recently, Terwee et al9 examined the relationship between the WOMAC and SF-36 pain and function subscales with the performance-based DynaPort Knee Test* for patients with OA of the knee before and after arthroplasty. Applying an exploratory factor analysis, these investigators found that the self-report measures of pain and function loaded on 1 factor and that the performance measure loaded on a second factor.9 The SF-36 function score loaded on both factors, with the higher loading on the factor composed of the self-report measures (.78 and .69).9 Collectively, these findings support the premise that self-report measures of physical function assess more than a patient's ability to move around.2,3,7–9 It appears that, in addition to providing patients' perceptions of their ability to move around, self-report measures of physical function also are influenced by what patients experience when moving around (eg, pain and exertion).
A further understanding of the relationship between self-report assessments of pain and physical function is offered by a number of studies that examined the factorial validity of the WOMAC. Factorial validity exists to the extent that items cluster in accordance with the specified domains to which they have been assigned by the measure's developer. The WOMAC was conceived to assess 3 domains: pain, stiffness, and physical function.10 Accordingly, factorial validity would exist if the 5 pain items loaded on 1 factor, the 2 stiffness items loaded on a second factor, and the 17 physical function items loaded on a third factor. However, there is consistent evidence demonstrating that the WOMAC pain and physical function items group more by activity than by the hypothesized domains of pain and physical function.11–14
There is no doubt that pain and physical function are related health concepts. Yet to the extent that during assessments, clinicians routinely inquire about pain and physical function separately, outcome measures have separate scales to assess pain and function, and due to the fact that authoritative groups such as OMERACT III have identified pain and physical function as 2 core outcome measures rather than 1, investigators are challenged to develop assessment methods that maximize valid information concerning the attributes of interest. It was with these challenges in mind that we undertook the present study.
Our intent was to determine whether performance test assessments of pain and physical function provided responses consistent with these 2 domains. Specifically, our goal was to evaluate the factorial validity of performance assessments of pain and physical function. Our specific hypotheses were as follows: (1) responses to the performance assessments could be explained by 2 factors, 1 consisting of pain items and the other consisting of time (distance) items; (2) each pain or performance item would be related only to the health concept that it was perceived to be assessing (each item would have a nonzero loading on the factor that it was conceived to measure and a zero loading on the other factor); (3) the factors pain and physical function would be correlated; and (4) the measurement error terms associated with the items would be uncorrelated.
Method
Subjects
Patients were eligible for this study if they were able to speak and comprehend written English; were diagnosed with end-stage OA of the hip or knee, as labeled by the surgeon and confirmed by radiographs; were scheduled to undergo primary total hip arthroplasty or total knee arthroplasty; were able to complete the performance tests; and provided written informed consent. Patients undergoing revision or bilateral arthroplasty or additional operative procedures or those demonstrating comorbidities associated with cognitive impairment were excluded. Of the 188 patients reviewed, 177 (94%) met the eligibility criteria. The study sample consisted of 81 participants with hip OA and 96 participants with knee OA. Eighty-five of the participants were women, 36 of whom received total hip arthroplasty. The mean age and body mass index of the sample were 65 years (first and third quartiles: 58.0 and 72.0) and 29.1 kg/m2 (first and third quartiles: 26.4 and 33.0), respectively. A breakdown of the participants' characteristics by site of OA is shown in Table 1. The study took place at a tertiary-care orthopedic hospital in Toronto, Ontario, Canada, and data were collected from November 2001 to February 2003.
Design
We applied a cross-sectional study design. Participants completed the performance tests at a median interval of 20 days (first and third quartiles: 12 and 22) before surgery.
Measures
Participants completed 4 performance measures in the following order: self-paced walk, Timed “Up & Go” Test, stair test, and Six-Minute Walk Test. Several minutes were provided between the self-paced walk, Timed “Up & Go” Test, and stair test. A 10-minute rest interval was provided between the stair test and the Six-Minute Walk Test. With the exception of the Six-Minute Walk Test, the outcome was the time to complete the task. Time was measured on a stopwatch to the nearest one-hundredth of a second, and distance was measured to the nearest meter.
Self-paced walk
Participants walked 2 lengths of a 20-m indoor course in response to the instructions, “Walk as quickly as you can without overexerting yourself.”15 The turnaround time was excluded. An intraclass correlation coefficient (ICC) for test-retest reliability of .91 and a standard error of measurement of 1.73 seconds have been reported for this measure for patients similar to the participants in the present study.15
Timed “Up & Go” Test
Participants were instructed to rise from a standard arm chair, walk at a safe and comfortable pace to a line 3 m away, cross the line, turn, and return to a sitting position in the chair.16 An ICC for test-retest reliability of .75 and a standard error of measurement of 1.07 seconds have been reported for this measure for patients with OA and those undergoing arthroplasty of the hip or knee.15
Stair test
Participants ascended and descended 9 stairs (step height, 20 cm; step depth, 27 cm) in their usual manner at a safe and comfortable pace.15 A handrail was available. An ICC for test-retest reliability of .90 and a standard error of measurement of 2.35 seconds have been reported for this measure for patients similar to the participants in the present study.15
Six-Minute Walk Test
Participants were instructed to cover as much distance as possible during the 6-minute time frame. Standardized encouragement—“You are doing well, keep up the good work”—was provided at 60-second intervals. The test was conducted on a premeasured, 46-m, unobstructed, uncarpeted, rectangular circuit. The outcome was the distance walked in 6 minutes.15,17 An ICC for test-retest reliability of .94 and a standard error of measurement of 26.29 m have been reported for this measure for patients similar to the participants in the present study.15
Activity-specific pain rating
Participants marked the pain that they experienced on an 11-point (0–10) numeric rating scale immediately following each performance test.15 We are not aware of test-retest reliability values for patients similar to the participants in the present study; however, a reliability estimate (ICC) of .86 and a standard error of measurement of 1.04 have been reported for people with a spectrum of lower-extremity problems.18
Data Analysis
We applied confirmatory factor analysis with a maximum-likelihood estimation method (AMOS 4.0†) to assess the factorial validity of the performance tests.19–22 Unlike exploratory factor analysis, which provides all possible factor loadings, confirmatory factor analysis provides factor loadings for the specified model only. We conceptualized a measurement model with 2 factors, which we labeled pain and physical function (Fig. 1).22 We applied the following indexes to assess model fit: comparative fit index (CFI), relative fit (RF), Tucker-Lewis Index (TLI), root-mean-square error of approximation (RMSEA), and the model fit chi-square test and associated P value.22 Although no single standard exists for defining acceptable model fit, the following values are generally accepted: CFI, RF, and TLI values exceeding .95 indicate good fit; RMSEA values of less than .05 indicate good fit; and RMSEA values of less than .08 indicate reasonable fit.22,23 A significant chi-square value (eg, P<.05) indicates that the data do not fit the model. Prior to conducting the analyses, we assessed the data and found several of the underlying distributions to be nonnormal. Accordingly, we applied the bootstrap feature of AMOS 4.0 for 1,000 samples with replacement to estimate the parameter values and model fit indexes.22
Standardized factor loadings for initial model. “e” represents the measurement error terms associated with each item. tug=Timed “Up & Go” Test, 6 mw=Six-Minute Walk Test, dist=distance.
To enhance the validity and generalizability of our final model, we performed 2 cross-validation procedures. First, we stratified by site (hip or knee) and used a random-number generator to create 2 samples with approximately equal representation of hips and knees. One group was used to generate the initial model and modifications (n=88: 48 knees, 40 hips), and the second group was used to cross-validate the model (n=89: 48 knees, 41 hips). The second cross-validation procedure repeated the steps described above; however, this time 1 group was composed of participants with knee OA and the other group was composed of participants with hip OA.
Results
Results for the first cross-validation analysis were similar for the combined samples, which included participants with knee OA and hip OA in each group (initial sample: χ28=7.7, P=.473; cross-validation sample: χ28=9.6, P=.269; simultaneous test for a difference between model structures: χ216=17.6, P=.351). Similar results also were obtained for the second cross-validation analysis (knee sample: χ28=12.1, P=.148; hip sample: χ28=13.3, P=.108; simultaneous test for a difference between model structures: χ216=25.3, P=.064). Given that the cross-validation analyses supported the model for various independent subgroups of participants, we present the results for the entire sample of 177 participants.
Descriptive statistics for the performance measures are shown in Table 2, Figure 1 shows the standardized factor loadings for the initial measurement model (model 1), and Table 3 shows the fit statistics. The observed or measured variables in Figure 1 are shown in rectangles, and the latent variables are shown in circles. The larger circles labeled “pain” and “function” designate the factors, and the smaller circles with numbered “e” values signify the measurement error terms associated with each observed variable. The numbers between the factors and observed variables connected by single-headed arrows represent the standardized factor loadings. The negative value associated with the function component of the 6-minute walk test occurs because higher functional levels are associated with greater distances, whereas shorter times reflect higher functional levels for the other 3 performance tests. The curved double-headed arrow showing a value of .48 represents the correlation between the factors pain and function. Although the CFI, RF, and TLI exceeded .90 (Tab. 3), the root-mean-square coefficient indicated a less-than-desirable fit. The modification index for this model (not shown) suggested that the model could be improved by adding a correlation between the stair pain and time error terms, and we elected to address this association with 2 revised models. To ascertain the magnitude of the correlated error terms, the first revised model (model 2a) specified a correlation between the stair pain and time error terms (Fig. 2a: curved double headed arrow showing a correlation of .41). The second revised model (model 2b) removed the stair pain and time terms (Fig. 2b). The fit statistics for both models are shown in Table 3. Both modified models improved the fit over that of the initial model. However, of the 2 modified models, only the 1 that removed the stair terms achieved a good fit for all indexes and was consistent with all of our initial hypotheses.
(a) Standardized factor loadings for model with correlated stair pain and time error terms. (b) Standardized factor loadings for model with stair pain and time terms removed from the model. “e” represents the measurement error terms associated with each item. tug=Timed “Up & Go” Test, 6 mw=Six-Minute Walk Test, dist=distance.
Discussion
Although OMERACT III identified pain and physical function as 2 core outcome measures requiring separate assessments,4 we suspect that most investigators would agree that these health concepts are related in patients with OA and those progressing to arthroplasty. The question unanswered to this point is whether a more distinct assessment of these health concepts can be obtained than has been reported for self-report measures such as the WOMAC. Repeatedly, studies have failed to support the factorial validity of the WOMAC pain and physical function subscale (the principal self-report measure for patients with OA of the hip or knee),11–14 and these data have led some investigators to abandon the notion of separate assessments of these concepts in favor of grouping WOMAC items by activity regardless of their health domain. For example, Angst et al24 suggested combining pain and physical function items to form 4 subscales, which they labeled lying/sitting, standing/walking, bending, and ascending/descending. This approach ignores the recommendation of OMERACT III that pain and physical function should represent 2 core outcome measures. Moreover, from a clinical perspective, combining pain and physical function items into a single domain does not assist clinicians in identifying treatment goals. Is the patient's principal problem pain, physical function, or a combination of these 2 concepts? In addition, after a course of treatment targeting pain is provided, is the patient's poor score on the standing/walking domain a result of ineffective pain management or a consequence of the patient's reduced ability to move around for reasons other than pain (eg, poor balance, muscle weakness, or restricted range of motion)?
Rather than adhering to the notion that self-report measures represent the preferred method of assessing physical function, we examined whether performance-specific evaluations of pain and physical function provide a viable method for obtaining a more distinct assessment of these 2 related health concepts than has been reported for self-report measures, such as the WOMAC. Our initial model yielded a correlation of .48 between pain and function, providing support for hypotheses 1 and 3, which conceptualized 2 correlated health concepts. Moreover, our second hypothesis was sustained in that significant correlations were obtained for the specified health concepts, and no evidence of loading on the nonspecified health concept was evident. However, our fourth hypothesis was not supported in that the error terms for stair pain and function were correlated. This finding led to the exploration of 2 revised models: 1 allowed a correlation between stair pain and time error terms, and the other removed the stair terms from the model. The intent of the model that allowed a correlation between the stair pain and time terms was to examine the extent to which these components were correlated beyond the correlation between the factors pain and function. The second revised model excluded the stair test from the analysis, and this model provided results consistent with our 4 hypotheses. The correlation between the factors pain and function was .43 for the final model. This correlation is lower than that typically reported between the pain and function subscales of the WOMAC (.74–.84)7,9,25 and SF-36 (.57)9 for patients reasonably similar to the participants in the present study.
Inclusion of the stair test makes the distinction between the health concepts of pain and function less discernable. This finding is reflected in the lower correlation between pain and function noted in model 2b (r=.43) than in models 1 and 2a (r=.48). Accordingly, when the clinical goal is to obtain as distinct an assessment as possible between the health concepts of pain and function, our results suggest that the stair test not be included in a composite score. However, we are not suggesting that the stair test be excluded from a patient's assessment. It is clear that 1 of the physical therapist's responsibilities for patients similar to the participants in the present study is to ascertain their ability to safely ascend and descend stairs and to intervene when appropriate. We simply stress that if the results from the stair test are combined with the results from the other performance measures, then the impressions of pain and function will be less distinct.
Assessments of pain and function are important both to identify patients' problems at a point in time and to assess change over time. Information from these assessments is applied by clinicians to guide decisions concerning individual patients, by researchers to ascertain the relative effectiveness of competing interventions in clinical trials, and by health care policy makers to set benchmarks regarding the maximum number of patient visits and corresponding payment plans. Previous work demonstrated that self-reports of physical function after arthroplasty are strongly influenced by pain and change in pain.3 The consequences are that patients report their physical function to be higher than is demonstrated by performance tests and that health care professionals who rely on self-reports alone overestimate patients' functional status levels.2,3 The results of the confirmatory factor analysis of the present study indicate that performance-rated pain and function represent 2 factors that have not emerged in previous factor analyses of self-report measures. Accordingly, complementing existing self-report assessments of physical function with performance-rated pain and function tests may provide clinicians with a more valid assessment of these health concepts.
There are several potential limitations of the present study. First, the study sample was patients awaiting hip or knee arthroplasty. Presumably, these patients have more severe OA than the typical patient seen in general physical therapist practice. However, in considering this point, it should be remembered that the study participants were able to complete all of the performance tests. A second limitation relates to the sample size for the cross-validation portion of the present study. Although there is no standard method for estimating sample size, it is generally agreed that the sample size should be at least 10 subjects per observed variable or a minimum of 100 subjects.21,26 Although our overall sample size of 177 participants exceeded the recommended minimum sample size, the number of participants in each of the cross-validation samples was slightly smaller than the recommended sample size.
Conclusion
Our goal was to determine whether performance-specific assessments of pain and physical function could provide a more distinct evaluation of these attributes than has been found for self-report measures. We conceived a 2-factor model consisting of pain and physical function and tested the model with 4 activities (self-paced walk test, Timed “Up & Go” Test, stair test, and Six-Minute Walk Test) by use of a confirmatory factor analysis. Although the initial model appeared promising, the stair test pain and function error terms were correlated. Dropping the stair test from the analysis provided results that supported the application of performance-specific assessments of pain and function as a method of obtaining reasonably distinct assessments of these attributes. We believe that performance-specific assessments of pain and function offer a more distinct method of assessing these attributes than can be obtained by self-reports alone and that performance measures should be viewed as core measures for people with OA of the hip or knee and those progressing to arthroplasty.
Footnotes
All authors provided concept/idea/research design and writing. Ms Kennedy provided data collection, subjects, and institutional liaisons. Mr Stratford and Dr Woodhouse provided data analysis and fund procurement.
The Research and Ethics Committee of the Holland Orthopaedic & Arthritic Centre of Sunnybrook Health Sciences Centre approved this study.
↵* McRoberts BV, The Hague, the Netherlands.
↵† SmallWaters Corp, 1507 E 53rd St, Suite 452, Chicago, IL 60615.
- Received January 4, 2006.
- Accepted July 5, 2006.
- Physical Therapy