Abstract
Background Valid comparison of patient outcomes of physical therapy care requires risk adjustment for patient characteristics using statistical models. Because patients are clustered within clinics, results of risk adjustment models are likely to be biased by random, unobserved between-clinic differences. Such bias could lead to inaccurate prediction and interpretation of outcomes.
Purpose The purpose of this study was to determine if including between-clinic variation as a random effect would improve the performance of a risk adjustment model for patient outcomes following physical therapy for low back dysfunction.
Design This was a secondary analysis of data from a longitudinal cohort of 147,623 patients with lumbar dysfunction receiving physical therapy in 1,470 clinics in 48 states of the United States.
Methods Three linear mixed models predicting patients' functional status (FS) at discharge, controlling for FS at intake, age, sex, number of comorbidities, surgical history, and health care payer, were developed. Models were: (1) a fixed-effect model, (2) a random-intercept model that allowed clinics to have different intercepts, and (3) a random-slope model that allowed different intercepts and slopes for each clinic. Goodness of fit, residual error, and coefficient estimates were compared across the models.
Results The random-effect model fit the data better and explained an additional 11% to 12% of the between-patient differences compared with the fixed-effect model. Effects of payer, acuity, and number of comorbidities were confounded by random clinic effects.
Limitations Models may not have included some variables associated with FS at discharge. The clinics studied may not be representative of all US physical therapy clinics.
Conclusions Risk adjustment models for functional outcome of patients with lumbar dysfunction that control for between-clinic variation performed better than a model that does not.
Pay-for-performance initiatives, including physical therapist services, are becoming more prevalent in the US health care system.1 Allen et al2 defined pay-for-performance as a payment system whereby payments are clearly linked to a quality target. Such a system gives financial incentives to providers for improving quality of care.2–4 A common approach to determining provider quality is through examining patient outcomes. To develop equitable pay-for-performance programs, valid comparison of patient outcomes is extremely important. Patient outcomes are associated not only with the quality of care given by health care providers but also with patient characteristics. For example, patients who are older, have more comorbidities, and have more surgeries tend to have poorer outcomes following physical therapy for low back dysfunction.5–8 To enable valid assessment and comparison of provider quality, as defined by patient outcomes, risk adjustment for variation in patient characteristics is needed.
According to the US Department of Health and Human Services, risk adjustment is “a statistical process used to identify and adjust for variation in patient outcomes that stem from differences in patient characteristics (or risk factors) across health care organizations.”9(pp94–95) However, risk adjustment may be limited by the fact that not all patient characteristics are measured and thus cannot be controlled for. These unmeasured patient characteristics could systematically vary from one clinic to another because patients are clustered within clinics and because patients in the same clinic tend to have similar characteristics. When clustering is not accounted for, comparisons of provider quality may be biased. For example, some clinics may have referrals from surgeons, and others may have referrals from chronic pain programs. Their patients may be similar in basic characteristics but might have very different prognoses. Statistically, when clustering by clinic is not accounted for, underestimation of standard errors of regression coefficients is likely to occur.10,11 Underestimation of standard errors can lead to the inappropriate conclusion that associations or treatment effects are statistically significant.
An approach that can be used to address clustering and between-clinic variation is a mixed model, also known as a multilevel or hierarchical model.10,11 Mixed models can adjust for risk factors as fixed effects and random effects. A fixed effect is the average effect of a risk factor on the outcome variable across clusters (eg, clinics), whereas a random effect estimates the deviation from the fixed effect due to being in different clusters.12 In a risk adjustment model, “clinic” can be treated as a random effect. Mixed models allow each clinic to have a different intercept (random intercept) or a different slope (random slope) for predicting patient outcomes. Clinics treating healthier patients could show that their patients have higher intercepts for functional outcomes and faster rates of recovery (ie, slope) compared with clinics treating patients with more severe conditions. In theory, accounting for random intercept or random slope across clinics should explain more variance in between-patient differences and improve the prediction of patient outcomes.
Mixed models have been recommended for risk adjustment in many medical fields, such as cardiovascular care,13 surgery,14–16 trauma care,17 and intensive care,18 to adjust for potential random clinic effects. However, studies did not consistently show that adding random clinic effects had a significant impact on risk adjustment. For example, Moore et al17 evaluated the performance of trauma centers in terms of patient survival using a traditional regression model and a mixed model. They found that the mixed model including between-clinic variation as a random intercept led to more stable effect estimates, fewer statistical outliers, and different hospital ranks compared with the traditional regression. D'Errigo et al16 evaluated the performance of cardiac surgery centers using models with and without adjusting for between-clinic variation as a random intercept. They found that the random effect accounted for an additional 10% of variance in mortality and concluded that the model including the random effect resulted in less biased outcome estimates. In contrast, Cohen et al19 assessed colorectal operation morbidity and mortality outcomes using models with and without adjusting for between-clinic variation as a random intercept. They found that the 2 models yielded similar results. These inconsistent findings suggest that not every risk adjustment model is strengthened by adding random clinic effects. To elucidate whether inclusion of random clinic effect is needed for a particular risk adjustment procedure, direct comparison across different models is needed.
Previous studies have identified a range of risk factors associated with clinical outcomes for patients with lumbar dysfunction.5–8 These factors include, but are not limited to, the patient's functional status (FS) at intake, age, sex, surgical history, number of comorbidities, health care payer, and symptom acuity. However, it remains unclear how between-clinic variation, as a random effect, will affect the risk adjustment model. Resnik et al5 included random clinic effect in a risk adjustment model for patient outcomes following physical therapy for lumbar dysfunction. However, they did not compare the results of risk adjustment with and without inclusion of the random effect. Thus, the purpose of this study was to determine if including between-clinic variation as a random effect improves the performance of a risk adjustment model for patient outcomes following physical therapy for low back dysfunction.
Method
Research Design and Data Source
The study was a secondary analysis of data from the Focus on Therapeutic Outcomes, Inc (FOTO, Knoxville, Tennessee) database from 2009 to 2012. The data set was a convenience sample of 230,648 patients with lumbar dysfunction who received physical therapy in 2,064 clinics in 48 states of the United States. All clinics that subscribed to FOTO services and had patients who completed the lumbar FS measure were included in the data set. Patient data were collected through Web-based computer surveys. The number of patients recorded in each clinic ranged from 1 to 2,135. Patients were included in the analysis when there was no missing value in the outcome measure or risk adjustment variables. Clinics were included in the analysis when they had at least 8 patients recorded in the data set. This threshold was established because estimates for each clinic's intercept and slope may be biased if the clinic has too few patients. Based on these inclusion criteria, our analyses included a total of 147,623 patients with lumbar dysfunction from 1,470 clinics. We excluded 81,258 patients from the analysis because they had missing data in one or more variables, and an additional 1,767 patients were excluded because their clinics did not have at least 8 patients in the data set.
This study was approved by the Institutional Review Board for the Protection of Human Subjects at Northeastern University. Informed consent was waived, as no intervention was applied to human participants.
Outcome Measure and Risk Factors
Patients' FS was assessed by a lumbar computerized adaptive testing (CAT) survey developed by FOTO. Development, simulation, validation, and use of the survey have been described in detail elsewhere.20–23 The CAT item bank consists of 9 Back Pain Functional Scale items24 and 16 physical functioning items developed based on the Medical Outcomes Study 36-Item Short-Form Health Survey (SF-36).25,26 The Back Pain Functional Scale items test patients' ability to perform an activity using a 6-level scale ranging from “unable to perform activity” to “no difficulty.” The physical functioning items test patients' ability to perform an activity using a 3-level scale ranging from “yes, limited a lot” to “no, not limited.” The FS score of the lumbar CAT ranges from 0 to 100. Higher scores represent better FS in patients with lumbar spine impairments. The survey was completed by patients at admission and discharge.
Functional status at discharge was the outcome variable in this study. The risk factors included in the analyses were FS at intake (on a 0–100 scale, continuous variable); age (in years, continuous variable); sex (categorical variable: male, female); number of comorbidities, assessed using a list of 30 conditions common to patients entering an outpatient rehabilitation clinic based on functional comorbidity index27 (0–30, continuous variable; see Appendix); symptom acuity, defined as the number of calendar days from the date of onset of the condition being treated in therapy to the date of initial therapy evaluation (categorical variable: 0–21 days, 22–90 days, and >90 days); surgical history (categorical variable: no surgical history, had a surgical history related to the impairments being treated); and payers (categorical variable: indemnity, litigation, Medicaid, Medicare Part A, Medicare Part B, Medicare Part C, patient, health maintenance organization [HMO], preferred provider organization [PPO], workers' compensation, no fault, other, no charge, and auto insurance).
There were 75,505 data points (32.7% of the data) missing in FS at discharge, 3 missing in age (<0.01% of the data), 3 missing in sex (<0.01% of the data), 160 missing in acuity (<0.1% of the data), 4,139 missing in payers (1.8% of the data), and 4,569 missing in surgical history (2% of the data). There were no missing values in FS at intake and number of comorbidities. Table 1 shows comparisons between patients with and without missing FS data points at discharge. In general, the 2 groups of patients were similar, although those without missing FS data points at discharge were older (mean age=55.2 years [SD=17.1] versus 51.4 years [SD=16.9]), less likely to be insured by Medicaid (3.9% versus 7.8%), and more likely to be insured by Medicare Part B (23.7% versus 17.7%).
Comparisons Between Patients With and Without Missing FS Values at Dischargea
Data Analysis
We fit the data with 3 linear mixed models: (1) fixed effect, (2) random intercept, and (3) random slope. The 3 models were nested models. The random-intercept model contained all of the terms of the fixed-effect model, with an additional term for random intercept. The random-slope model contained all of the terms of the random-intercept model, with an additional term for random slope. In the random-intercept and random-slope models, clinic identification number was used as a grouping variable for estimating random effects.
All data analyses were carried out using IBM SPSS version 21 (IBM Corp, Armonk, NY), with 2-sided tests and a type 1 error rate of 0.05. The fixed-effect model is equivalent to a standard single-level (patient-level) multiple regression. No random-effect terms were specified in this model. All independent variables were introduced into the model to estimate the predictability.
The random-intercept model had both fixed and random terms. The fixed terms included estimates of the coefficients of all independent variables and the intercept of the model. The random term estimated the variation of the intercept across clinics. We modeled the random intercept using the “variance component” structure, allowing us to estimate how much of the total variance in the intercept was due to between-clinic variation.28,29
The random-slope model included fixed effects, a random intercept, and a random slope. We selected FS at intake as the random slope term that enabled us to estimate how the relationship between FS at intake and FS at discharge varied across clinics. We selected FS at intake over the other risk factors as the random slope term based on our pilot analysis using standard multiple linear regression. The results indicated that FS at intake accounts for most of the variance of FS at discharge (partial eta square=19.1%). In addition, previous studies consistently showed that FS at intake was predictive for FS at discharge.5,6 We used an “unstructured” variance model, which allows the observed data to dictate the correlations between measurements at different clinics.28,29
Potential bias caused by the missing data was adjusted through inverse probability weighting.30 This approach involves giving different weights to individuals based on their likelihood of being selected into the study, where those more likely to be selected are given less weight. The weight was calculated by performing the following 2 steps. First, we fit a logistic regression model where FS at discharge took the value of 1 if the observation was complete and the value of 0 if missing and where all risk factors were the independent variables. Second, we used the inverse of the predicted probabilities of being complete as the weight for each patient. The weight was applied in all 3 models examined (fixed-effect, random-intercept, and random-slope).
Between-Model Comparisons
We compared the fixed-effect, random-intercept, and random-slope models from 3 perspectives. First, we compared the goodness of fit of the model based on Akaike's information criterion (AIC) and Schwarz's Bayesian information criterion (BIC). Both criteria are common approaches used to determine the fit of the model, with lower AIC and BIC values indicating a better-fitting model.31 Second, we calculated the percentage change in error residual before and after a random term was added. The percentage change represented the amount of between-patient variance explained by the additional random clinic effect.28,32 The percentage change was calculated as:
where Enew is the error residual of the model with the new-added random term and Eold is the error residual of the previous model.28 Third, we compared fixed-effect coefficients among models to determine if any risk adjustor was confounded by random clinic effect. A risk adjustor was considered to be confounded if the corresponding coefficient had more than 10% change after a random term was added, based on Rothman and Greenland.33 The coefficient change was expressed as a percentage and was calculated as:
where Cnew is the coefficient in the model with the new-added random term and Cold is the coefficient of the previous model.
Lastly, we examined the 95% confidence intervals of each coefficient estimate before and after the random terms were added. The confidence interval could be regarded as a reasonable variation or precision of the estimate. When the confidence intervals were not overlapping among the models, it indicated a meaningful change in the estimate and a potential confounding effect. The evidence of confounding was considered to be stronger when the percentage change was greater than 10% and the confidence intervals were not overlapping.
Results
Descriptive Statistics
Table 2 summarizes the unweighted and weighted descriptive statistics of the outcome measure and risk factors at the patient level. Based on unweighted descriptive statistics, the patients' mean age was 55.2 years (SD=17, range=18–102). On average, patient FS improved by approximately 15 points at discharge. The majority of the patients were female (59.7%) and had symptom acuity of more than 90 days (54%). More than 80% of the patients did not have a surgical history related to the impairments being treated. Most patients were insured by PPO (34.3%), Medicare Part B (24.1%), and HMO (10.8%). The weighting procedure had minimal impact on the variables. A decrease in age (1.3 years), a decrease in percentage of cases insured by Medicare Part B (2%), and an increase in percentage of cases insured by Medicaid (1.3%) were larger changes.
Descriptive Statistics for Patient Level (N=147,623)a
Table 3 summarizes the median, 25th percentile, 75th percentile, and range of all variables at the clinic level. Based on the median, 50% of the clinics had patients whose mean FS value at intake was above 49.9, mean FS value at discharge was above 64.6, mean age was above 54.5 years, and mean number of comorbidities was above 4. In addition, 50% of the clinics had more than 59.1% of patients who were female, more than 52.9% of patients who had symptom acuity more than 90 days, and more than 83.3% of patients who did not have a surgical history. At least 75% of the clinics did not have the following payer categories: indemnity, litigation, Medicare Part C, no fault, patient, and no charge (75th percentile=0%). In addition, very few clinics had patients paid by auto insurance (75th percentile=0.9%), Medicaid (75th percentile=2%), or Medicare Part A (75th percentile=3%).
Descriptive Statistics at the Clinic Level (n=1,470)a
Fixed-Effect Model
Table 4 summarizes the estimated coefficients, error residuals, and goodness of fit of the 3 models (fixed-effect, random-intercept, and random-slope). Table 5 summarizes percentage change in goodness of fit, error residuals, and fixed-effect coefficients among the models.
Estimates for Fixed Effects, Random Effects, Error Residuals, and Model Goodness of Fita
Percentage Change in Model Goodness of Fit, Error Residuals, and Fixed-Effect Coefficientsa
In the fixed-effect model, all independent variables were significant predictors for FS at discharge (P<.001). All coefficient estimates were significant, with the exception of some of the payer parameters (Tab. 4). Based on the coefficient estimates, FS at discharge tended to be lower in patients who had lower FS values at intake, more comorbidities, and a surgical history and who were older, female, with symptom acuity more than 90 days, and insured by Medicaid or having no-fault insurance. The fixed-effect model had a residual error of 277.2. The AIC value for the model was 1,249,292.6, and the BIC value was 1,249,302.5. Based on R2, the model was able to explain 30.6% of the variance in FS at discharge.
Random-Intercept Model
When a random intercept was included in the model, the AIC value decreased to 1,177,589.9 and the BIC value decreased to 1,177,609.7, which were 5.7% smaller than the corresponding estimates in the fixed-effect model. In addition, the error residual decreased to 246.4, indicating that including the random intercept would explain an additional 11.1% of the between-patient variance compared with the fixed-effect model. Based on estimates of covariance parameters, the random intercept was statistically significant (variance=27.5, P<.001), suggesting that effects of risk adjustors on FS at discharge vary significantly from one clinic to another.
As shown in Table 5, there was a difference of less than 10% in the coefficients of FS at intake, age, sex, acuity, and surgical history in predicting FS at discharge between the fixed-effect and random-intercept models. On the other hand, the coefficients for number of comorbidities were approximately 17% different. The largest difference was observed in payers. Almost all payer types showed a >10% difference in coefficients between the fixed-effect and random-intercept models, except for Medicaid and workers' compensation. In particular, the coefficient changes for Medicare Part A and HMO both exceeded 100%. When examining the 95% confidence intervals for each coefficient estimate, we found that the confidence intervals of comorbidities, acuity between 0 and 21 days and between 22 and 90 days, and Medicare Part A were not overlapping among the models (Tab. 4).
Random-Slope Model
The AIC and BIC values for the random-slope model were 1,176,119.3 and 1,176,158.9, respectively, which were 0.1% smaller than the corresponding estimate in the random-intercept model. The residual error of the random-slope model was 242.8. Compared with the random-intercept model, the random-slope model explained an additional 1.5% of between-patient variance.
Based on estimates of covariance parameters, the random intercept UN (1,1) was significant (variance=119.5, P<.001), suggesting that effects of risk adjustors on FS at discharge vary significantly from one clinic to another. The random slope UN (2,2) also was significant (variance=0.02, P<.001). This finding suggested that the relationship between FS at intake and FS at discharge varied significantly from clinic to clinic, although the variation was small according to the variance estimate. The covariance between intercept and slope was significant (covariance=−1.4, P<.001). The negative covariance suggested that clinics with lower intercepts had steeper slopes. In other words, clinics with lower FS values at intake tended to have greater change in FS values at discharge.
Compared with the random-intercept model, adding random slope did not have a large impact on the estimated coefficients of all risk adjustors. The 95% confidence intervals were similar for all risk adjustors between the random-intercept and random-slope models (Tab. 4). The percentage differences in coefficients were all lower than 10% with the exception of the categorical variables patient, HMO, and no charge (Tab. 5).
Discussion
Our results suggest that risk-adjustment models that control for between-clinic variation perform better than a model that does not in the prediction of outcomes of care for patients treated for lumbar dysfunction. Based on the AIC and BIC values, adding random intercept improved the model fit to a small extent (AIC and BIC had 5.7% of reduction), and adding random slope resulted in additional improvement, although this additional improvement was minimal (AIC and BIC had 0.1% of reduction). In addition, the random-effect model explained an additional 11% to 12% of the between-patient variation compared with the fixed-effect model. Most of this change appeared to be due to baseline differences in the patients seen at the clinics rather than differences in the rate at which patients recovered. Furthermore, adding random terms improved the precision of fixed-effect coefficient estimates. The coefficient estimates for comorbidities, acuity between 0 and 21 days and between 22 and 90 days, and Medicare Part A in the random-intercept and random-slope models were outside of the 95% confidence intervals of those in the fixed-effect model. These findings suggested that clinics varied particularly in regard to payers, number of comorbidities, and symptom acuity.
Random clinic effects appeared to confound payers to a greater extent than comorbidities and acuity, as suggested by the higher percentage change in coefficient estimates shown in Table 5. Payer types have been used as proxy measures for multidimensional factors, including socioeconomic status, health status, access to health care resources, and health care providers' attitudes toward patients.34,35 For example, patients paid by Medicaid tend to have lower socioeconomic status, poorer health status, and fewer visits to clinics compared with those paid by private insurance.36 Patients paid by workers' compensation have expressed concerns that health care providers seem to discount their injuries, especially self-reported pain.37–39 Although outcomes of care vary by payer type, our results suggest that this effect differs from clinic to clinic. The reason for this type of variation could not be explored within the scope of this study and should be addressed in future studies.
Symptom acuity and number of comorbidities also were confounded by random clinic effects, although the confounding effect was weaker than that observed in the payer variable, as suggested by the lower percentage change in coefficient estimates shown in Table 5. Comorbidities and acuity, like FS at intake, age, and surgical history, are variables that represent patients' physiological status. Therefore, the effects of these variables on patient outcomes should follow physiological principles. That is, more comorbidities, longer symptom acuity, lower FS at intake, more prior surgeries, and older age would be associated with lower FS at discharge from therapy.5–7,11,21,40,41 Our results suggested that the impact of comorbidities and acuity on patient outcomes may vary somewhat from one clinic to another. A possible explanation is that clinics with more experienced therapists may better reduce the negative effect of comorbidities and symptom acuity on patient outcomes. This is a hypothesis that can be tested in future studies.
The results of this study can be used to inform the emerging pay-for-performance movement in rehabilitation that links compensation to quality of care. Functional status at discharge has been considered one way to measure clinics' quality and has been proposed as an important aspect of pay-for-performance models.1 To make a valid comparison on FS across clinics, risk adjustment models have been developed to control for confounding effects due to different patient characteristics and initial FS.5,6 Our study showed that confounding effects varied across clinics and affected risk adjustment coefficients, especially for the variable payer. Our results can help researchers and policy makers to optimize their choice of risk adjustment models for patients with lumbar dysfunction.
The major strength of this study was the direct comparison, using FOTO data, of risk adjustment models with and without inclusion of between-clinic variation as a random effect for patients with low back dysfunction. Resnik et al5 included random clinic effect in a risk adjustment model for patient outcomes following physical therapy for lumbar dysfunction, but they did not compare the results of risk adjustment with and without inclusion of the random effect. Similarly, Resnik and Hart8 examined effects of therapist certification on functional outcomes of patients with low back dysfunction, adjusting for between-therapist variation as a random variable. However, they did not compare the results with and without adjusting for the random variable. When examining the relationship between state regulation and the delivery of physical therapist services, Resnik et al11 used mixed models to adjust for random intercepts in state, practice, and therapist levels. Again, no comparison between models with and without inclusion of random effects was made.
This study had many weaknesses. First, this study included only clinics that participated in the FOTO outcome measurement system, weakening the generalizability of the results to all clinics in the United States. Because this study was a secondary analysis of prospectively collected data via a proprietary database management company, we had no control over potential errors made during data collection and data entries. There were several potential confounders that we did not adjust for, as the variables were not in the FOTO data set. The first potential confounder was clinic type (eg, outpatient clinics, hospitals, nursing homes). Heterogeneity of clinic types would be a source of random clinic effects. In addition, information on the frequency and duration of the treatment sessions was not available in the data set. Also, we did not know which physical therapy treatments were provided to the patients, and we were not able to control for differences in the type and amount of physical therapy treatments delivered at different clinics. Although the data set provided the information on number of comorbidities, it did not provide information on the specific comorbidities that each patient had. Some comorbidities listed in the FOTO survey (Appendix) did not seem to relate to low back functions (eg, visual impairment). Number of comorbidities may not have been as informative as controlling for specific types of comorbidities, as they may not all have an equal impact on functional outcome.42 Lastly, some patients may have been covered by multiple insurance providers, although each patient in the data set had only one payer of record.
The purpose of this study was to determine if including between-clinic variation as a random effect would improve the performance of a risk adjustment model for patient outcomes following physical therapy for low back dysfunction. Our analyses showed that this type of model fit the data better and explained additional between-patient variance compared with models that did not account for clinic-to-clinic differences.
Appendix.
Comorbidity Conditions
Footnotes
Dr Yen, Dr Chui, Dr Wang, and Dr Resnik provided concept/idea/research design. Dr Yen, Dr Corkery, Dr Chui, Dr Wang, and Dr Resnik provided writing. Dr Yen, Dr Chui, and Dr Manjourides provided data analysis. Dr Yen provided project management. Dr Wang provided participants. Dr Yen, Dr Corkery, Dr Manjourides, Dr Wang, and Dr Resnik provided consultation (including review of manuscript before submission).
The authors thank Focus On Therapeutic Outcomes, Inc (FOTO) for providing the patient data set.
- Received October 14, 2014.
- Accepted April 13, 2015.
- © 2015 American Physical Therapy Association