Abstract
Background The KOOS-PS represents a shortened version of the Knee Injury and Osteoarthritis Outcome Score (KOOS) Function and Sport scales. Previous investigations have not evaluated the KOOS-PS against performance measures or self-report measures composed of items that assess a broad spectrum of ability levels.
Objective The purpose of this study was to compare the construct validity of the KOOS Function and Sport subscales with a shorter version of the measure (KOOS-PS).
Methods Using a cross-sectional, observational design, consecutive consenting patients diagnosed with knee osteoarthritis were recruited at an assessment center visit to determine need for conservative or surgical management. Participants completed the Lower Extremity Functional Scale (LEFS), KOOS, Timed “Up & Go” Test, and Six-Minute Walk Test. A single function-sport score (KOOS FunSportsum) and the KOOS-PS were abstracted from the KOOS. Pearson correlation coefficients were compared between the reference standards' scores (performance measures and LEFS) and KOOS scores. KOOS-PSraw scores were compared with KOOS-PSRasch scores.
Results Three hundred seventy-seven patients with a mean age of 64.4 years (SD=10.5) participated. The correlation between performance reference standard and KOOS-PSRasch scores was significantly lower than with KOOS FunSportsum scores (mean difference in r=.08 [95% confidence interval=.03, .11], z=4.45, P1<.001). A similar finding was observed with the LEFS comparison.
Limitations The study sample did not consist of many patients with mild or severe functional status limitations.
Conclusions For patients with knee osteoarthritis, the KOOS-PS appears too restricted in item content to provide a comprehensive estimate of lower extremity functional status level relative to the KOOS Function and Sport subscales. Pursuit of a computer-adapted test may be a productive direction for future inquiry.
Interest in patient-reported outcome measures for people with osteoarthritis (OA) of the knee has a history that extends over several decades.1 During this period, efforts have been made to develop new measures2,3 and to modify existing measures.4,5 One such undertaking was the refinement of the Knee Injury and Osteoarthritis Outcome Score (KOOS)4 to the shortened version (KOOS-PS)5,6 that focuses only on functional status. The goal of the current study, in the context of patients seeking care for knee arthritis, was to determine candidacy for conservative or surgical management.
The KOOS-PS was developed as part of an Osteoarthritis Research Society International (OARSI) and Outcome Measures in Rheumatology (OMERACT) initiative that sought to “create relevant tools, then to develop pain, physical function, and structure states that represent the progression from early to late disease for individuals with OA of the hip and knee”7(p1433) The KOOS-PS developers noted 4 reasons for refining the sum of the KOOS Function and Sport items. First, they considered the potential burden that longer questionnaires place on participants in clinical trials.5,6,8 Second, they noted that the 17 KOOS Function items, which are identical to Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) Physical Function items,9 exhibit a restricted range in item difficulty. Third, they cited studies suggesting item redundancy.10,11 Fourth, KOOS-PS developers were interested in designing a psychometrically sound unidimensional measure with interval scale properties.5,6,8 Rasch analysis was used to reduce the KOOS Function and Sport items to a 7-item unidimensional measure.5,6 The KOOS-PS is composed of 4 items from the KOOS Function domain (bending to floor, rising from sitting, putting on socks/stockings, and rising from bed) and 3 items from the KOOS Sports domain (squatting, kneeling, and twisting/pivoting on injured knee).5,6 Subsequent investigations have primarily applied the KOOS-PS in the validation of other patient-reported outcome measures.12–14
Acknowledging that the validation of a measure is an ongoing process, we noted 2 areas that had not been investigated at the time of undertaking the current study. One was that the KOOS-PS had not been compared with performance measures. The second was that the KOOS-PS had not been compared with patient-reported lower extremity functional status measures with items of extremely low (ie, rolling over in bed) and high (ie, running on even ground, making sharp turns while running fast) difficulty levels. It was with these novel opportunities in mind that we undertook the current investigation to determine whether KOOS-PS scores displayed cross-sectional validity similar to KOOS Function and Sport subscale scores.
Two questions directed our study: (1) Does the KOOS-PS perform equivalently to the sum of KOOS Function and Sport items (KOOS FunSportsum)? and (2) Do KOOS-PSRasch scores perform better than KOOS-PSraw scores? Specifically, we hypothesized the following: (1) the correlation between performance and KOOS FunSportsum scores would not be more than .07 greater than the correlation between performance and KOOS-PS scores, (2) the correlation between LEFS and KOOS FunSportsum scores would not be more than .07 greater than the correlation between LEFS and KOOS-PS scores, and (3) the correlation between performance scores and the KOOS-PSRasch scores would be .07 greater than the correlation between performance scores and KOOS-PSraw scores.
Although the developers of the KOOS-PS did not define lower extremity functional status, we equate it with the “ability to move around” and “performance of daily activities” as expressed by the Activity component of the World Health Organization International Classification of Functioning, Disability and Health (ICF) model.15–18
Method
Participants and Setting
The study took place at the Sunnybrook Holland Orthopaedic & Arthritic Centre, a large tertiary care orthopedic facility in Toronto. The Holland Centre conducts more than 2,000 joint arthroplasty surgeries annually and offers comprehensive interprofessional arthritis care that includes centralized referral intake and an assessment center to determine optimal management.
To be eligible for the study, patients had to be referred to the assessment center with a diagnosis of knee OA. Exclusion criteria included a diagnosis of inflammatory arthritis or acute trauma. Patients with cognitive impairments affecting their ability to consent and complete the questionnaires or with neurological, respiratory, cardiac, or other conditions that would significantly compromise their ability to complete the performance measures also were excluded. All participants provided written informed consent.
Study Design
Using a cross-sectional observational design, consecutive patients who met all eligibility criteria were recruited at the assessment center visit. At the assessment center, a comprehensive physical assessment was undertaken to identify those patients who required conservative versus surgical management. As part of the assessment, patients completed a number of standardized outcome measures, including the WOMAC,1 Lower Extremity Functional Scale (LEFS),3 Six-Minute Walk Test (6MWT),19 and Timed “Up & Go” Test (TUG).20 For the purposes of this study, the KOOS was exchanged for the WOMAC, as the KOOS contains all WOMAC items and additional items.
Measures
KOOS.
The KOOS is a knee-specific measure developed to assess a patient's opinion about his or her knee and associated problems. It is self-administered and covers 5 patient-relevant dimensions: pain (9 items), other disease-specific symptoms (7 items), function in activities of daily living (ADL) (17 items), sport and recreation function (5 items), and knee-related quality of life (4 items).4,21 The KOOS items are scored from 0 to 4 and summed within subscales. The KOOS Function subscale scores can range from 0 to 68. The KOOS Sport subscale scores can range from 0 to 20. For both subscales, lower scores represent higher functional status levels. These scores are converted to percentage scores, with higher scores representing higher functional status levels.4
KOOS Function and Sport summed score scale (KOOS FunSportsum).
We created a single function-sport score by summing individual Function (n=17) and Sport (n=5) subscale items, which, in turn, were converted to a percentage score, with higher scores representing higher functional status levels.
KOOS-PS.
The KOOS-PS consists of 7 items from the KOOS Function (rising from sitting, bending to floor, putting on socks/stockings, rising from bed) and Sports (squatting, kneeling, twisting/pivoting) subscales.6 Items were selected based on a Rasch analysis. Items are scored from 0 to 4 and summed to give a raw score from 0 to 28, with lower scores representing higher levels of functional status. Raw scores were converted to Rasch-equivalent person scores using the cubic model reported by Perruccio et al.6 Rasch-equivalent person scores, in turn, were converted to a 0 to 100 scale, with lower scores again representing higher levels of functional status.6 The KOOS-PS item scores were abstracted from the full-length KOOS.
LEFS.
The LEFS is a 20-item, self-report, unidimensional, region-specific measure that inquires about perceived difficulty with a variety of activities. Each item is scored on a 5-point scale (0=extreme difficulty or unable to perform activity, 1=quite a bit of difficulty, 2=moderate difficulty, 3=a little bit of difficulty, and 4=no difficulty), and item scores are summed to yield a total score.3 Total LEFS scores can range from 0 to 80, with higher scores representing better functional status. Earlier investigations have shown the cross-sectional and longitudinal validity of the LEFS to be as good as or better than competing measures when applied to patients with OA of the hip or knee and those who have undergone total joint arthroplasty.22–24 A previous study referenced total LEFS scores with the following typical item activity ratings: 20 points=quite a bit of difficulty getting in and out of bath and performing light activities around the home, moderate difficulty walking between rooms; 40 points=moderate difficulty walking 2 blocks and with stairs; 60 points=a little bit of difficulty with heavy activities around the homes and walking 1 mile (1.6 km), no difficulty walking 2 blocks or performing light activities around the home; and 80 points=no difficulty running on uneven ground or making sharp turns while running fast.25 The LEFS has displayed high levels of validity,3,26 interpretability,25 and sensitivity to change for a spectrum of lower extremity conditions, including OA,23,24 total hip and knee arthroplasty,22 ankle sprain,26 and anterior cruciate reconstruction.27
TUG.
The TUG has been used to assess the mobility of patients with OA of the knee and those undergoing knee arthroplasty.18,28 It requires patients to rise from a standard armchair, walk at a safe and comfortable pace to a line 3 m away, cross the line, turn, and return to a sitting position in the chair. The measured outcome is time (in seconds).
6MWT.
The 6MWT quantifies functional status as the distance (in meters) walked during a 6-minute period. Originally conceived as an outcome measure for people with cardiorespiratory problems,19 the standardized 6MWT has become a popular measure of lower extremity functional limitation for patients with OA of the lower extremity and those progressing to joint replacement.18,29–32
Pooled performance reference standard.
The pooled performance score was obtained by combining information from the TUG and 6MWT. These tests include rising from a sitting position, turning/pivoting, and walking. Because the TUG and 6MWT are scored on different metrics, a standardization procedure was required to allow data pooling. Using the method described by Smythe et al,33 pooled scores were obtained by converting raw timed and distance scores to standard scores with a mean of zero and a standard deviation of one. Because faster times and greater distances represented better lower extremity functional status, standardized TUG scores were multiplied by −1. This method allowed the summing of standardized scores such that larger scores represent better functional status levels. Summed performance scores were converted to t scores, with a mean of 50 and a standard deviation of 10.
Statistical Methods
We calculated descriptive statistics that summarized the sample's characteristics and outcome measures' scores. To evaluate hypotheses 1 and 2, we calculated Pearson correlation coefficients between scores for the reference standards (performance measures and LEFS) and KOOS scores. Meng and colleagues' analysis for dependent data was applied to test for differences between KOOS-PS and KOOS FunSportsum correlations with the reference standards.34 Because the scale orientation of the measures differed, we compared the absolute values of the correlation coefficients. We considered a correlation difference of .07 to be clinically important. A bootstrap procedure was applied to place confidence intervals on the difference in correlation coefficients. Specifically, 1,000 paired bootstrap samples with replacement were generated, and 95% confidence limits were identified as the 25th and 975th rank-ordered observations.
To evaluate hypothesis 3, we applied a similar approach that compared the correlation coefficients between performance reference standard scores and KOOS-PSraw and KOOS-PSRasch scores. This comparison was restricted to pooled performance reference standard scores because they possess true interval scale properties. Again, we considered a correlation difference of .07 to be clinically important.
Consistent with our directional hypotheses, statistical tests were 1-tailed, and an effect was considered statistically significant if P<.05. Analyses were performed using STATA version 13 (Stata Corp, College Station, Texas).
Sample Size
The sample size for the correlation analysis was derived from the test statistic proposed by Meng et al34 for dependent data. We estimated a sample size of approximately 300 patients based on the following assumptions: (1) observed KOOS FunSportsum > KOOS-PS by .07 given the KOOS FunSportsum correlation coefficient with the reference standard was .55, (2) a correlation between KOOS FunSportsum and KOOS-PS of .85, (3) a type I error of .05 (1-tailed), and (4) a type II error of .20. Adjusting this sample size for a 20% dropout/missing value rate produced a target sample size of 375 patients.
Results
Descriptive Statistics
Data were collected between May 2011 and December 2011. During this period, 377 patients (238 women, 139 men) met the eligibility criteria. The sample's mean age was 64.4 years (SD=10.5). Of the 377 patients, data for patient-reported outcome measures were available for 374 patients (99%), and performance measure data were available for 330 patients (88%). Participants who provided performance data were younger than those who did not provide performance data (63.4 versus 72.6 years, t375=5.22, P<.001). Distributions by sex were similar for participants contributing and not contributing performance data (Fisher exact test, P=.51). Table 1 displays a summary of the measures' scores. Eighty-four participants (22%) provided the lowest possible score for the KOOS Sport, whereas 9 participants (2.4%) displayed the lowest possible KOOS-PS score.
Measures' Descriptive Characteristicsa
Validity Analyses
Hypotheses 1 and 2.
Table 2 reports correlations among reference standard scores and KOOS and KOOS-PS scores. The correlation between performance reference standard and KOOS-PSRasch scores was significantly lower than the correlation between performance reference standard and KOOS FunSportsum scores (z=4.45, P1<.001; mean difference in r=.08 [95% CI=.03, .11]). Also, the correlation between LEFS and KOOS-PSRasch scores was significantly lower than the correlation between LEFS and KOOS FunSportsum scores (z=6.79, P1<.001; mean difference in r=.10 [95% CI=0.08, 0.13]). Neither result supported the proposed hypotheses.
Correlations With Reference Standardsa
Hypothesis 3.
There was no difference in the correlations of KOOS-PSraw and KOOS-PSRasch scores with performance reference standard scores (z=1.34, P1=.089). Accordingly, the third hypothesis was not supported.
Discussion
Our study examined 2 questions: (1) Does the KOOS-PS perform equivalently to the sum of KOOS Function and Sport items? and (2) Do KOOS-PSRasch scores perform better than KOOS-PSraw scores? Within the context of our sample, the hypotheses probing these questions were not supported. The KOOS FunSportsum scores displayed significantly greater correlations with both reference standards' score, and there was no difference in KOOSRasch and KOOSraw correlations with the performance reference standard.
One goal of the KOOS-PS was to include items with a greater range of difficulty levels than those of the KOOS Function. We attempted to examine the potential benefits of this goal in 2 ways. First, we investigated floor and ceiling effects by counting the number of responses at the extremes of the scale range. Although the KOOS-PS did not display a strong ceiling effect, it appeared susceptible to a floor effect. Second, we chose a patient-reported outcome measures reference standard (ie, LEFS) that includes a spectrum of activity difficulty levels ranging from rolling over in bed to running over uneven ground and making sharp turns while running fast.3 However, because our sampling frame focused on patients seeking care rather than those recovering from the effects of treatment, it is possible that our sample did not provide a good test of the KOOS-PS for patients with higher functioning.
To gain an insight into how the KOOS-PS performs with patients with higher functioning, we performed a post hoc correlation analysis of patients with LEFS scores ≥50. The result of this analysis, reported in Table 3, is similar in format to that presented in Table 2. Because this analysis represents post hoc inquiry, we avoided formal between-measure comparisons. That the point estimates of coefficients presented in Table 3 are smaller than those reported in Table 2 is to be expected owing to the restricted range in function levels of patients with LEFS scores ≥50. When correlated with the performance reference standard, the differences among the various KOOS and KOOS-PS correlation coefficients were unremarkable. In contrast, KOOS Sport scores appeared to correlate much higher with LEFS scores than did KOOS-PS scores. This finding is not entirely unexpected given KOOS Sports items assess higher-level functional activities.
Post Hoc Mean Scores and Correlations With Reference Standards' Scores (Subsample LEFS Scores ≥50)a
When examining construct validity of the KOOS-PS, some investigations have applied the Spearman rank order correlation coefficient.12 Because this coefficient is based on rank ordering, the benefit of an interval-scale measure may be lost: Spearman correlation of raw scores will equal Spearman correlation of Rasch scores. To allow a comparison of raw and Rasch scores, we applied the Pearson correlation coefficient. The coefficients reported in Tables 2 and 3 were nearly identical for raw and Rasch scores. This finding suggests that the added complexity of converting raw scores to Rasch-equivalent scores using the reported third-degree polynomial may not be necessary.
The KOOS-PS was intended to provide an efficient measure of physical function from early to late disease for people with OA.7 We interpret the phrase “early to late disease” to represent mild to severe functional status limitations. Although the KOOS-PS is one-third the length of the KOOS FunSportsum items, we found its ability to assess lower extremity functional status was inferior to that of the KOOS FunSportsum. One explanation for the performance of the KOOS-PS may relate to its content validity: none of the activities require patients to take one step. If the ability to move around is considered to be an important component of lower extremity functional status, the KOOS-PS performance may well be due to its absence of items that capture this dimension. An explanation for this finding likely rests on the method used in the item reduction process. Although Rasch analysis is an excellent method for identifying unidimensional items, it does not assess the extent to which the items capture the characteristic of interest. A second explanation may be that regardless of the item-specific content, a fixed 7-item measure cannot capture the breadth of activities necessary to adequately evaluate lower extremity functional status.
It is expected that measurement properties of a well-developed measure will exceed those of a shortened version of the measure. An important question when judging the value of a shortened measure is, to what extent can a reduction in measurement properties be tolerated without compromising the validity of inferences drawn from a measured value? In part, the answer is dependent on the context in which the measure will be applied. Measures intended for group decisions can withstand greater reductions in reliability than measures proposed for individual patient use. The KOOS-PS, along with pain and structure measures, were conceived to shape a dichotomous decision that would label a patient as a candidate for total joint replacement.7 Applied in this context, measurement properties consistent with individual patient application would seem relevant. Another consideration when judging reported KOOS-PS measurement properties is that they apply to continuous rather than binary scores. Collapsing KOOS-PS scores to form a dichotomous decision (ie, surgical candidate: yes/no) will decrease the magnitude of its reported reliability and validity coefficients. When considering the adequacy of a shortened measure, it may be of greater value to focus on the confidence in inferences drawn from the measure's scores rather than on often-cited minimal acceptable reliability values.35
We have raised 2 concerns that we believe extend beyond the KOOS-PS: one addresses content validity, and the other acknowledges the inevitable reduction in validity of a shortened measure. If the goals are to maximize validity and minimize patient response burden, perhaps a better direction would be to develop a computerized adaptive test (CAT) of lower extremity functional status. A CAT contains an item bank capturing the range of ability/difficulty levels for the characteristic of interest. Items are administered sequentially, usually starting with an item of median difficulty (ie, the most informative item). Based on a patient's response to the initial item, a second item is selected approximating the difficulty level conveyed by the patient's response to the first item. This process is repeated until criteria for the stopping rule are met. For example, Hart et al36 developed a CAT based on 18 LEFS items. The first item administered to a patient was “Walking 2 blocks.” Subsequent items were selected based on responses to previous items. Item administration terminated when either the standard error for the provisional level of functional status was less than 4 out of 100 points or the change in the provisional level of functional status estimates for the last 3 items was less than 1 in 100. The CAT reported by Hart et al36 required the administration of an average of 7 items to obtain a score consistent with individual patient application. Development of a CAT would require a clear definition of lower extremity functional status—something that we believe is missing—and the identification of the anticipated activity spectrum to be assessed. We also believe that such discussions would benefit by considering whether the concept of lower extremity functional status is independent of the condition being assessed. For example, is the concept of lower extremity functional status the same for people with OA of the hip or knee, anterior cruciate ligament reconstruction, and neurological problems?
Our study was not without limitations. One limitation was that our sample did not contain many patients with severe or mild lower extremity functional status limitations. Owing to the sigmoidal relationship between raw and Rasch scores, the full benefit of a Rasch-scored measure may not be fully apparent. One direction for further inquiry would be to compare the KOOS-PS with existing measures on samples that contain a more uniform distribution of functional status limitations across the scale range.
A second potential limitation was the application of the sum of the KOOS Function and Sport subscales as a comparison. We did this comparison to gain an impression of what the maximum possible correlation to the reference standards would be. With the exception of the post hoc comparisons, the differences in correlations between the summed score and KOOSRasch were near identical to the differences between the KOOS Function and the KOOSRasch. A third potential limitation was that patients contributing performance data tended to be younger than those not providing performance data. We cannot anticipate the extent to which this limitation may have affected the relative magnitudes of the measures' validity coefficients. Another potential limitation was the arbitrary choice of a significant difference in correlation coefficients of .07. We chose this value based on the belief than .05 was too stringent and .10 was too large. Inspection of the CI on the differences may assist readers who would prefer to apply another significant difference value.
This study explored the construct validity of the KOOS-PS in the context of patients seeking care for knee OA. The construct validity of the KOOS-PS was significantly less than that of KOOS FunSportsum. It may be that a fixed 7-item measure is too restricted in range to provide a precise and valid estimate of a patient's lower extremity functional status level. Rather than attempting to generate a shorter measure of lower extremity functional status, it may be more productive to pursue the development of a CAT.
Footnotes
Both authors provided concept/idea/research design, writing, and consultation (including review of the manuscript before submission). Ms Kennedy provided data collection, fund procurement, study participants, and facilities/equipment.
Ethics approval for the study was received from the Sunnybrook Holland Orthopaedic & Arthritis Centre's Research Ethics Review Board.
- Received February 26, 2014.
- Accepted July 6, 2014.
- © 2014 American Physical Therapy Association