Abstract
Background The Patient-Specific Functional Scale (PSFS) has received considerable attention over the last 2 decades; however, validation studies have not examined its performance in patients after total knee arthroplasty (TKA).
Objective The purpose of this study was to investigate the ability of the PSFS to detect change in patients post-TKA by comparing PSFS change scores with Lower Extremity Functional Scale (LEFS) and pooled impairment change scores.
Methods One hundred thirty-three patients participating in a post-TKA exercise class were assessed at their initial and discharge visits. Initial assessments occurred within 28 days of arthroplasty; follow-up assessments occurred within 80 days of surgery. At both assessments, participants completed the PSFS, LEFS, and the P4 pain measure, and their knee range of motion (ROM) and extensor strength were measured. The ability to detect change was expressed as the standardized response mean (SRM) and as a correlation between the PSFS change scores and 2 reference standards: (1) LEFS change scores and (2) pooled impairment change scores. The pooled impairment measure consisted of pain, ROM, and strength change scores.
Results The SRMs were PSFS 4.60 (95% confidence interval [CI]=4.00, 5.36) for the PSFS and 2.28 (95% CI=2.04, 2.60) for the LEFS. The correlation between the PSFS and pooled impairment change scores was 0.12 (95% CI=−0.04, 0.25), and the correlation between the PSFS and LEFS changes scores was 0.18 (0.02, 0.34).
Limitations The order of measure administration was not standardized, and fixed activity set does not reflect clinical application in many instances.
Conclusions The results suggest that the PSFS is adept at detecting improvement in patients post-TKA but that the PSFS, like other patient-specific measures, is likely to be of limited value in distinguishing different levels of change among patients.
Patient-specific measures have received increased attention and use over the past 2 decades. Examples of patient-specific measures include the MACTAR,1 the Patient-Generated Index,2 and the Patient-Specific Functional Scale (PSFS).3 Much the same as single-subject study designs were conceived to aid decisions concerning the effectiveness of an intervention on an individual patient,4,5 patient-specific measures were developed to assess outcomes most relevant to an individual patient.1–3 Unlike generic and disease-, condition-, and region-specific measures that are composed of items common to all respondents, patient-specific measures allow respondents to nominate their own activities to be assessed and followed over time. The goal of the current study was to contribute information concerning the ability of the PSFS to assess change in patients in the first several months after total knee arthroplasty (TKA), a context that, to our knowledge, had not been investigated previously.
First reported in 1995, the PSFS was conceived to provide clinicians with a formal structure for eliciting activities that are most important to an individual patient and rating the difficulty associated with each activity.3 Since its introduction, studies have contributed information concerning the ability of the PSFS to detect change when applied to patients with back,6,7 neck,8–10 and extremity11,12 problems. In addition to validation studies of the PSFS with individual patients, the PSFS also has been applied with apparent success in clinical intervention trials.13–15
Two families of study designs and analyses are available for researchers interested in detecting a measure's ability to assess change.16,17 One design focuses on a measure's ability to assess change over time and involves a within-patient comparison.16 Husted et al17 have referred to this characterization of change as internal responsiveness. This approach is based on the theory that patients should change in the same direction (ie, either improve or worsen) over the reassessment interval. The standardized response mean (SRM=mean change/SD change) is one example of an internal responsiveness coefficient.18 The second method builds on the first method not only by examining a measure's ability to detect change but also by evaluating a measure's ability to differentiate among patients, or groups of patients, who change by differing amounts.16,17 The second approach requires a comparison with a reference standard of change, and it has been termed external responsiveness.17 Correlation coefficients and area under receiver operating characteristic curves are examples of external responsiveness coefficients. It is not uncommon for investigators to apply both methods to the same patient sample6,11,19; however, it is generally agreed that among-patient or among-group designs (ie, external responsiveness) provide a more rigorous assessment of a measure's ability to assess valid change.16,17
Regardless of the change coefficient applied, it is important to appreciate that properties such as sensitivity to change and responsiveness are not properties of a measure but rather of a measure as it performs in a specific context.20 Accordingly, to have confidence in a measure's usefulness in a particular measurement condition or circumstance, information concerning the measure's performance in that circumstance is required. The purpose of this study was to contribute information concerning the ability of the PSFS to detect change (improvement) in the context of patients initially assessed within 28 days post-TKA. Specifically, our purpose was to investigate the ability of the PSFS to detect change in patients post-TKA by comparing PSFS change scores with Lower Extremity Functional Scale (LEFS) change scores and a pooled index consisting of pain, range-of-motion (ROM), and strength change scores. We chose the LEFS because it has been shown to detect clinically important change in similar populations.21–23
Method
The current study represents a secondary analysis of data obtained from an investigation that examined patient outcomes with a new class-based model of care that involved a series of structured exercise classes for patients post-TKA. Each class lasted approximately 90 minutes, and participants typically attended 10 classes over 5 weeks. The goals of the sessions were to increase strength, ROM, and mobility. The study took place at the Sunnybrook Holland Orthopaedic & Arthritic Centre. Data were collected from December 2009 to September 2011. All participants provided written informed consent.
Participants
Patients were eligible for this study if they had a TKA secondary to osteoarthritis of the knee and had sufficient language skills to communicate in written and spoken English. Initial assessments had to occur within 28 days of arthroplasty, and follow-up assessments had to occur within 80 days of arthroplasty. Patients had to nominate at least 3 activities on the PSFS. Patients were excluded if they had neurological, respiratory, cardiac, or other conditions that would significantly compromise their functional status or if they underwent additional surgical procedures. The sample size was one of convenience and represented all eligible patients.
Design
Participants were assessed at their initial visit and upon discharge from the exercise class. At both assessments, their pain, knee ROM, and strength were measured. In addition, participants completed the PSFS and LEFS. At discharge, participants were not shown their initial PSFS and LEFS scores prior to completing the follow-up ratings on these measures. The order of self-report measure administration was not standardized. The premise for a before-after study design was based on a previous study that showed patients improved post-TKA.24
Measures
PSFS.
The PSFS consists of a standardized script for eliciting activities and a difficulty scale for scoring activities.3 Participants were asked to identify 3 to 5 activities they were having difficulty with as a result of their current problem. Each activity was scored on an 11-point numeric rating scale (0=“unable to perform activity,” 10=“able to perform activity at same level as before injury or problem”). An average PSFS score was obtained by summing the ratings for the first 3 activities nominated by a participant and dividing by 3. Thus, average scores could vary from 0 to 10, with higher scores representing higher levels of lower extremity functional status. Validation studies have supported the interpretation of PSFS scores at both individual item and total or average score levels.9,12,14
LEFS.
The LEFS is a 20-item, region-specific, patient-report outcome measure validated at the total score level. Each item is scored from 0 to 4.25 Total scores can vary from 0 to 80, with higher scores representing higher levels of lower extremity functional status. We applied the recommended method for assigning missing values.26 Studies have supported the interpretation of LEFS scores in patients with a variety of lower extremity problems, including osteoarthritis and total joint arthroplasty.22,24 The LEFS has been shown to detect change comparable to or better than competing measures such as the Western Ontario and McMaster Universities Osteoarthritis Index.21–23
P4.
The P4 is a 4-item pain measure that asks patients to rate their typical morning, afternoon, evening, and activity pain levels on 11-point (0–10) numeric pain scales.27,28 A total pain score was obtained by summing the 4 ratings. Total scores can vary from 0 to 40, with higher scores representing higher pain levels. Support for P4 use in patients with osteoarthritis of the knee awaiting knee arthroplasty has been reported previously.29
Active knee ROM.
Knee flexion and extension were assessed with a universal goniometer. Flexion was documented in degrees of flexion, extension beyond zero was expressed as hyperextension, and extension values where patients were unable to achieve zero degrees were recorded as degrees of flexion.
Strength assessment.
Unilateral knee extensor strength of operative and nonoperative limbs was assessed using the 1-repetition maximum method. The test was conducted on a leg press machine, and participants were required to extend through a range from 90 degrees of flexion (or a participant's maximum flexion if less than 90°) to maximum extension. A participant's involved strength score was expressed as a ratio of operative to nonoperative values.
Reference standards of change.
We applied 2 reference standards for change. One was a pooled index of impairment measures, and the other was the activity-focused LEFS. Because improving impairments was an important goal during this interval post-TKA, we constructed a pooled index consisting of P4, ROM, and strength quotient change scores. Using the approach described by Smythe et al,30 we: (1) created standardized scores (ie, mean=0, SD=1) for P4, ROM, and strength quotient change scores; (2) aligned the scale orientation of the standard scores such that positive scores represented improvement; (3) summed the pooled change scores; and (4) converted the sum to a standard score (ie, mean=0, SD=1).
Data Analysis
For all measures, we calculated summary statistics and the coefficient of variation (CV) for change scores. We examined the extent to which the distribution of change scores for the PSFS and LEFS was consistent with a normal distribution by calculating Shapiro-Wilk statistics and generating histograms of the data.
Based on discussions with physical therapists who supervised the knee class, we anticipated that many participants in the sample would change (improve) by truly different amounts (ie, their error-free scores would differ). Studies have demonstrated PSFS change score correlations with reference standards—most often a global rating of change—greater than 0.60.6–8,12,19 Accordingly, we anticipated that the PSFS would display internal and external responsiveness. We calculated the SRM and a correlation coefficient. The Pearson or Spearman correlation coefficient between the PSFS and reference standards for change was calculated depending on the extent to which the requisite statistical assumptions were met. We anticipated that the PSFS would display a greater correlation with the LEFS than with the pooled impairment reference standard. We estimated 95% CI values for the change coefficients by obtaining 1,000 bootstrap samples with replacement.31 The 95% confidence limits were identified as the 25th and 975th rank-ordered observations from the bootstrap samples. We applied a similar bootstrap approach that accounted for dependent data to differences in change coefficients between the PSFS and LEFS by generating paired samples and taking the pair-wise differences in coefficients. The 95% confidence limits on the differences were identified as the 25th and 975th rank-ordered observations from the bootstrap samples. All analyses were conducted using STATA 12.1 (StataCorp LP, College Station, Texas).
Results
Participants were 133 patients with a mean age of 68.0 years (SD=8.4). Eighty-seven participants were female. The median initial and follow-up assessments occurred 14 days (1st, 3rd quartiles=11, 17) and 51 days (1st, 3rd quartiles=46, 58) postarthroplasty. The median number of days between initial and follow-up assessments was 36 (1st, 3rd quartiles=33, 41). Participants attended a median of 11 (1st, 3rd quartiles=10, 11) exercise classes.
Measure Summary Statistics
The Table provides a summary of the measures' initial and follow-up scores, change scores, and SRMs. Also reported in this table are the CVs for change scores (CV=SD change/mean change). Applied in the context of change, the CV is the reciprocal of the SRM. Figures 1 and 2 show the distributions of change scores for the PSFS and LEFS. The PSFS change distribution was markedly skewed to the left (z=3.60, P<.001), whereas the LEFS (z=−0.01, P=.50) and pooled impairment change (z=1.25, P=.10) distributions were consistent with a normal distribution.
Summary Statistics and Standardized Response Means (SRM)
Distribution of Patient-Specific Functional Scale (PSFS) change scores.
Distribution of Lower Extremity Functional Scale (LEFS) change scores.
Change Coefficients
The PSFS SRM (4.60) was twice that of the LEFS (2.28). The SRM for the PSFS was significantly greater (difference=2.32; 95% CI=1.70, 3.06) than that of the LEFS.
Given that the PSFS change score distribution was not consistent with a normal distribution, we selected the Spearman rank order correlation coefficient as our primary change coefficient. The correlations between PSFS change scores and those of the pooled impairment reference standard and LEFS were 0.12 (95% CI=−0.04, 0.25) and 0.18 (95% CI=0.2, 0.34), respectively. The difference in PSFS correlation coefficients between the pooled impairment reference standard and the LEFS (difference=0.06; 95% CI=−0.12, 0.26) was not significant. The LEFS correlation with the pooled impairment reference standard was 0.37 (95% CI=0.20, 0.50), which was significantly greater than the PSFS correlation with the reference standard (difference=0.25; 95% CI=0.03, 0.45).
Discussion
The goal of this study was to provide information concerning the ability of the PSFS to detect improvement in patients post-TKA. Patients were assessed within 80 days of arthroplasty, and the LEFS scores and change scores of our sample were consistent with those reported previously.24 Our results showed that the PSFS was not only highly internally responsive, but that it was significantly more internally responsive than the LEFS. In contrast, the PSFS displayed a low level of external responsiveness: its correlation with the pooled impairment reference standard did not differ from zero, and its correlation with LEFS change scores was marginally greater than zero.
A fundamental requirement for internal responsiveness is that patients change in the same direction. In our study, the change direction was improvement. Referring to the SRMs reported in the Table, it is evident that all measures displayed what Cohen would characterize as a large effective size (ie, >0.8).32(p26) We interpret the magnitude of these coefficients as support for the premise that our sample truly improved. An essential requirement for external responsiveness is that the change status of patients truly differs. We believe that support for this requirement is evident in the distribution of change scores for the LEFS depicted in Figure 2 and in the correlation between LEFS change scores and those of the pooled impairment reference standard. A 40-point difference in LEFS change scores—about the range of LEFS change scores observed in our study—is more than 4 times what would be considered an important within-patient change.22,23,25 Also, the correlation of LEFS and pooled impairment change scores of .37 is consistent with the reported relationship between impairments and activities noted at other anatomical sites.33,34
If the premise that many participants improved by different amounts is accepted, there are at least 2 possible explanations for the low correlation of the PSFS with the reference standards. One potential reason is that the PSFS is measuring something different from functional status. However, given its performance with other conditions,6,8,12 and the activities identified by participants in our study—typically walking short distances, stair climbing, and light household activities—we believe this explanation is highly unlikely. A second, and more likely, explanation is that the participants' ratings of change in our study were more homogeneous than those obtained in other investigations of the PSFS. That is, in our study, many patients identified activities that displayed a ceiling effect over the reassessment interval. All else being equal, the magnitude of a correlation coefficient is affected by the measured variability in the attribute of interest.35(p270) In the extreme case when all patients display the same score, the correlation coefficient will be zero.
To our knowledge, this is the first study that has investigated the PSFS post-TKA. However, Chatman et al12 examined the ability of the PSFS to detect change in patients with knee dysfunction due to a variety of problems. Chatman and colleagues' sample consisted of 38 patients attending physical therapy clinics. Twenty patients were female, and the sample's mean age was 47 years (SD=18). Chatman and colleagues' validation approach consisted of correlating PSFS change scores with a composite global rating that equally weighted patients' and clinicians' impressions of change. The correlation of the PSFS with the composite global rating of change (.77; 95% CI=.61, .89) was found to be greater than the correlation of the 36-Item Short-Form Health Survey (SF-36) physical function subscale with the global rating of change (.59; 95% CI=.30, .78).
In addition to sample composition, there are several notable differences between our sample and that of Chatman et al.12 First, the typical initial visit item mean in Chatman and colleagues' study was approximately 3.0 points (SD=2.4) compared with a mean of 0.8 points (SD=1.5) in our study. Second, the typical follow-up item mean in their study was approximately 5.8 points (SD=2.7) compared with 8.7 points (SD=1.7) in our study. Chatman et al did not provide summary statistics for change scores. A third difference between our sample and that of Chatman et al relates to activity difficulty. There was heterogeneity in activity difficulty within many participants in Chatman and colleagues' sample. For example, at the initial assessment, one participant receiving TKA identified the following activities (Jill M. Binkley; personal communication; August 8, 2013): (1) getting into my pickup truck, (2) walking up and down the river bank, and (3) dredging for gold. Another, younger participant post–anterior cruciate ligament reconstruction nominated the following activities: (1) walking to the school bus, (2) going to the mall, and (3) marching with my tuba. In contrast, participants in the current study tended to generate PSFS activities that were more homogeneous in difficulty. For example, many participants nominated walking several blocks, going up and down a flight of stairs, and a variety of activities that would be considered light household activities. At the initial assessment, participants in our study rated these activities near or at zero, whereas at follow-up, these activity ratings were near or at 10 for many participants. Lastly, the reference standard used by Chatman et al—a composite retrospective global rating of change that included the opinions of patients and clinicians—may have positively influenced the magnitude of the correlation.
In addition to the work of Chatman et al,12 other studies have shown moderate correlations (.47–.69) between reference standards of change and PSFS scores.6,7,11,19 These studies had 2 characteristics similar to those of the study by Chatman et al. First, they had baseline PSFS mean scores in the range of 3.5 to 4.3. Follow-up scores were not presented. Second, the reference standard was a retrospective global rating of change assigned by the participants.
Our primary explanation for the high PSFS SRM and low correlation of change is that many participants in our study nominated activities with relatively low difficulty levels that displayed a large improvement, resulting in a ceiling effect. A consequence of many patients changing by about the same amount was restricted variability of PSFS change scores that potentially limited the magnitude of the correlation between the PSFS and reference standards for change. In contrast, the LEFS detected a greater spectrum of change scores in our sample, which allowed for a higher correlation with the reference standard for change. The lower SRM for the LEFS was a direct consequence of the greater true variability in change scores, which is contained in the denominator of the SRM and portrayed as measurement error rather than true change.18,36
To summarize, patient-specific measures were developed to evaluate activities most relevant to individual patients at specific time points in their care. They were not originally conceived for between-patient or between-group comparisons. The high SRM obtained for the PSFS in our study supports the ability of this measure to assess improvement in patients post-TKA. The ceiling effect observed in this study has important clinical relevance. Studies modeling recovery after TKA have demonstrated that patients undergo rapid change in the first few months.24,37 In patient populations where rapid change is expected, more frequent administration of the PSFS would be warranted, with nomination of new and more challenging activities when previous goals have been met (ie, activity scores approach 10 points).
Limitations
There are several caveats when interpreting our results. First, the order of measure administration was not standardized, and we do not know the extent to which this factor may have affected our findings. A second potential limitation relates to previous studies that showed patients' self-reported impressions of functional status were influenced by pain. This limitation may have inflated the correlation between the LEFS and pooled impairment scores.21,22 Accordingly, we performed a post hoc test that correlated PSFS and LEFS change scores with a reference standard of change that excluded P4 scores. The correlation coefficients for the PSFS and LEFS were .05 (95% CI=−.12, .22) and .27 (95% CI=.11, .42), respectively. Although these correlations were smaller than those with the original reference standard, the difference in coefficients of .23 (95% CI=.01, .42) remained statistically significant. Finally, our study, like other reported investigations of the PSFS, used a fixed activity set per patient.6,7,11,19 That is, the activities nominated at the patients' initial assessment were the only activities analyzed. This approach differs from clinical practice, where patients often generate new activities when a ceiling is reached for a previously identified activity. Accordingly, our findings are not directly applicable to clinical settings where patients nominate new activities when previously identified activity goals have been met.
In conclusion, our results suggest the PSFS is adept at detecting improvement in patients post-TKA, but that the PSFS, like other patient-specific measures, is likely to be of limited value in distinguishing different levels of change among patients.
Footnotes
All authors provided concept/idea/research design and writing. Ms Kennedy and Ms Wainwright provided data collection, project management, study participants, facilities/equipment, and institutional liaisons. Mr Stratford provided data analysis. Mr Stratford and Ms Kennedy provided consultation (including review of manuscript before submission).
Ethics approval was obtained from the Ethics Board of Sunnybrook Holland Orthopaedic & Arthritic Centre.
- Received August 28, 2013.
- Accepted February 14, 2014.
- © 2014 American Physical Therapy Association