Abstract
Background The Centers for Medicare & Medicaid Services has mandated rehabilitation professionals to document patients' impairment levels. There is no evidence of responsiveness to change of functional limitation severity modifier codes.
Objective The purpose of this study was to assess the validity of G-code functional limitation severity modifier codes in determining change in function.
Design This was a retrospective observational study.
Methods Patients completed the Activity Measure for Post-Acute Care (AM-PAC) and were assigned G-codes, with severity modifiers based on AM-PAC scores at initial and follow-up visits. Patients were classified as having AM-PAC scores in the upper or lower range for each severity modifier, and sensitivity, specificity, and positive and negative predictive values for change in severity modifier level and odds of changing by one severity modifier level using a change in AM-PAC score of at least 1 minimal detectable change at the 95% confidence interval (MDC95) as the standard were determined.
Results Sensitivity and specificity of change in severity modifier in determining change in function were dependent on patients' initial AM-PAC scores. Improvement in severity modifier level was 2.2 to 4.5 times more likely with scores at the higher end of the range within a severity modifier level than with scores in the lower end of the range. Decline in severity modifier level was 2.7 to 4.8 times more likely with scores at the lower end of the range within a severity modifier than with scores in the higher end of the range.
Limitations Data were from one health care system, and most patients had orthopedic conditions. The MDC95 for AM-PAC tool may not be the best standard for defining functional change.
Conclusions The G-code functional limitation severity modifier system may not be valid for determining change in function and is not recommended for determining if patients have changed over the course of outpatient therapy.
Physical therapists have long been encouraged to use well-tested, relevant standardized measures to assess patients' function and progress. Application of standardized measures enhances the ability of therapists to target interventions and to understand when meaningful change has occurred.1 Furthermore, standardized measurement has increasingly allowed therapists to demonstrate to patients and payers the value of the care they provide. Although the goal of documenting and tracking patients' function throughout an episode of care is valuable, tracking is meaningful only if the measurement tool provides information that reflects clinically relevant functional status. Additionally, therapists understand that tools used for this type of measurement need to have evidence supporting their reliability and validity, including responsiveness to change.1 The use of standardized measurement tools, however, is not universal. In a sample of 498 physical therapists responding to a questionnaire in 2008, slightly fewer than 50% reported using standardized measurement tools in their practice.1
The Centers for Medicare & Medicaid Services (CMS) directive for functional reporting mandates a universal system of documenting patients' impairment levels in an effort to increase the accountability of rehabilitation professionals. The requirement by CMS for functional limitation reporting by health care providers began in July 2013 as part of the Middle Class Tax Relief and Jobs Creation Act of 2012 (MCTRJCA, Section 3005[g]). The act stated, “The Secretary of Health and Human Services shall implement, beginning on January 1, 2013, a claims-based data collection strategy that is designed to assist in reforming the Medicare payment system for outpatient therapy services subject to the limitations of section 1833(g) of the Social Security Act (42 USC, 1395l[g]). Such strategy shall be designed to provide for the collection of data on patient function during the course of therapy services in order to better understand patient condition and outcomes.”2(p2) It was implemented as a way of tracking a patient's function throughout an episode of therapy and applies to several health care disciplines, including physical therapy, occupational therapy, and speech-language pathology. All practice settings that bill for outpatient therapy services under Medicare Part B must include G-codes and functional limitation severity modifier codes on claim forms. G-codes are a category of alphanumeric procedure codes assigned by CMS to be used for coding primarily nonphysician services not covered by Current Procedural Terminology (CPT-4) codes. G-codes for reporting function are nonpayable.
The categories of functional limitation that are reported reflect the World Health Organization's International Classification of Functioning, Disability and Health (ICF),3 encompassing participation restrictions and activity limitations. There are specific G-codes that relate to physical therapy and occupational therapy in 4 categories: (1) mobility; (2) changing and maintaining body position; (3) carrying, moving, and handling objects; and (4) self-care. In addition, 3 “other” categories may relate to physical therapy and occupational therapy. The clinician selects the G-code that is most clinically relevant and represents the primary limitation that is expected to change with intervention. For each G-code, therapists also select 1 of 7 functional limitation severity modifier codes to reflect a patient's percentage of impairment from 0% to 100%. One code reflects 0% impairment, another reflects 100% impairment, and the remaining 5 codes identify patients' status in 20% increments of function (Tab. 1). The severity modifier codes may be reflective of a score or range of scores from one or more standardized functional assessment tools of the therapist's choice, or they may be based on clinical judgment. Other considerations, including comorbidities, age, cognition, prognosis, and acuity, also may affect selection of severity modifier codes. Therapists document the method they use to make the code selection so that the same process can be followed at succeeding assessment intervals. The assessment tools recommended by CMS include the Activity Measure for Post-Acute Care (AM-PAC) tools.4 Each functional limitation severity modifier code has been linked to a range of scores on each outpatient short form of the AM-PAC. Therefore, the scores on the AM-PAC drive the selection of severity modifier code.5
Sample Characteristicsa
Although documentation of G-codes and functional limitation severity modifier codes has been advanced as a method for demonstrating the value of physical therapy and the data generated may be used to reform CMS payment for outpatient therapy services,4 there is lack of evidence for the responsiveness to change of functional severity modifier codes. Evidence to support the validity of the functional limitation severity modifier codes has the potential to shape future recommendations by CMS as they relate to rehabilitation services. Although it is currently unclear how CMS will use information derived from G-code functional limitation severity modifiers, or whether further changes will be made to the system prior to policy changes, given the potential that change in function may be linked to payment in the future, it is important to begin to assess the responsiveness to change of the severity modifier codes. The primary objective of this study, therefore, was to determine the validity of G-code functional limitation severity modifier codes in determining change in function.
Given that the AM-PAC is one of the assessment tools recommended by CMS and that it has established validity, we used change in AM-PAC scores as the standard against which to determine the validity of functional limitation severity modifier codes in terms of responsiveness to change. Specifically, we examined the sensitivity and specificity of a change in functional limitation severity modifier code in reflecting change in function, as determined by an AM-PAC score change from an initial outpatient therapy visit to a follow-up visit.
True change in function was determined using a distribution-based measure (minimal detectable change at the 95% confidence level [MDC95]) for AM-PAC scores. We selected MDC as opposed to a minimal clinically important difference (MCID) based on patient report of perceived change, as there are, to our knowledge, no reports of the MCID of the AM-PAC in a sample of patients similar to ours. We found one report of minimally important difference for AM-PAC scores in patients with late-stage lung cancer6; however, we did not apply this threshold in our study due to the differences in the sample between that study and ours. Although there appears to be no consensus, Wyrwich7 has reported that approximately 2.3 to 2.6 standard errors of measurement (SEMs) estimated the MCID in studies of patients receiving physical therapy for musculoskeletal conditions. The MDC is based on the SEM and represents the variability in change scores from baseline to follow-up for a percentage of people who have truly not changed, usually 90% or 95%. The determination of which level to select is admittedly arbitrary, although some authors have suggested the choice may be related to how the measurement is used.8
We selected the MDC95 because it is the more conservative estimate of change (2.77 × SEM). Because a range of AM-PAC scores equates to each functional severity modifier code, we believed that it was likely that those patients scoring near the upper end of that range of scores would be likely to show improvement in functional limitation severity modifier code because a small change would put them into the score range for the next code, whereas a similar change in score in those with initial scores in the lower range of the severity modifier code would not lead to a change in code. Therefore, we examined the sensitivity and specificity of change in functional limitation severity modifier code separately for those whose initial scores were at the upper and lower portions of the range of AM-PAC scores assigned to each severity modifier code.
For example, we were concerned that if a patient's initial AM-PAC score was at the lower end of the range of scores within a severity modifier category and he or she had an improvement in function based on AM-PAC score, there would be a lower likelihood of reporting a change in severity code modifier code than if the patient had the same change in AM-PAC score with an initial score at the upper end of the range. We also examined the odds of change in functional limitation severity modifier code given a change of at least 1 MDC95 on the AM-PAC.
Method
Instrument
The AM-PAC tools are validated measures based on the activity limitation domain of the ICF.9 They are designed to be used for patients receiving postacute care rehabilitation regardless of type of condition or setting. The tools measure function in 3 domains: basic mobility, daily activity, and applied cognitive. Based on item response theory (IRT) item calibrations for each tool, the conditional distributions of summed item raw scores are calculated as a function of the latent trait.10,11 Then a Bayesian (expected a posteriori) estimation is used to estimate the standardized score for each raw summed score.12 Standardized scores are transformed to a T-score scale, with a mean of 50 and a standard deviation of 10. For any given standardized score on an AM-PAC short form, a linear transformation is used to convert it to a 0 to 100 scale, representing the level of impairment. The various forms of the AM-PAC that assess the same functional domain (eg, basic mobility), including those generated from computer adapted tests, yield comparable scores. For each form of the AM-PAC, a range of standardized scores equates to each of the 7 functional limitation severity modifier codes associated with G-codes, and the patient's AM-PAC score is used to assign the severity modifier code.5
Setting and Procedure
We used a retrospective observational study design analyzing data collected by therapists at the Cleveland Clinic Health System, a nonprofit system comprising 45 outpatient rehabilitation sites. These sites include 210 physical therapists and 55 occupational therapists who manage more than 1,000 new patient evaluations per week. Therapists providing care at these sites reported G-codes and severity modifier codes and administered 1 of 5 basic mobility and daily activity outpatient short forms of the AM-PAC to their patients. Patients seen for an occupational therapy visit and those with neck-related complaints completed a daily activity form. Patients seen for a physical therapy visit, other than those with neck complaints, completed a basic mobility form. Patients 65 years old and those with neurological conditions were given the adapted forms.13 At some sites, patients used computer tablets to address each item on the form given to them. At other sites, patients used paper for form completion, and therapists entered their patients' scores into the electronic medical record system.
As stipulated by CMS, G-codes and functional limitation severity modifier codes were included on claims for the first visit by a patient, for every tenth subsequent visit at a minimum, when re-evaluation was billed and on the last day of service if the therapist knew it was the last visit. At each designated reporting interval, therapists submitted 2 G-codes for each relevant functional limitation category: the current or discharge functional status and the goal for projected functional status.14 Therapists recorded a functional limitation severity modifier code related to each G-code based on the patient's AM-PAC score.5 The institutional review boards of the Spaulding Rehabilitation Network Research Institute and Cleveland Clinic Health Systems determined the study did not involve human participants.
Data Source
Our data set included 1,443 episodes of care completed between July 2013 and June 2014, during which patients completed the same AM-PAC short forms at initial and follow-up visits and were assigned G-codes with severity modifiers for current or discharge functional status at both the initial visit and at least one follow-up visit. Our data set also included basic patient demographic information.
Data Analysis
To determine the change in severity modifier code, we gave each sequential severity modifier code a number and subtracted the severity level recorded for the initial visit from the level recorded at the last follow-up visit. The last follow-up visit was sometimes identified as the discharge visit, but when a discharge visit was not recorded, we used the last follow-up visit measurement that was available. We then created 2 dichotomous variables. One variable categorized patients as having improved by at least one severity level, or not (no change or decline); the other variable categorized patients as having declined at least one severity level, or not (no change or improved). To determine the change in AM-PAC score, we subtracted the initial AM-PAC score from the AM-PAC score for the last visit for which a measurement was available. Using our sample data and the test-retest reliability previously reported,15 we calculated the MDC95. Because of a concern that the SEM might be affected by the initial score,16 we calculated the MDC95 separately for each tertile of baseline scores for each of the 5 AM-PAC forms.
Using these calculations, we then determined whether each patient's change in score on AM-PAC represented a change by at least 1 MDC95 (improvement or decline) for the relevant tertile of the short form on which they were assessed. We then created 2 dichotomous variables. One variable categorized patients as having improved by at least 1 MDC95, or not; the other variable categorized patients as having declined by at least 1 MDC95, or not.
We also created 2 dichotomous variables that indicated whether patients' initial AM-PAC scores were in the lower 50% or upper 50% of the range of scores within the assigned severity modifier level for the short forms that they had completed. For example, 10 AM-PAC scores (40.66–49.39) on the generic basic mobility form are associated with the CL severity modifier code; patients with the higher 5 scores were classified as being in the upper 50% category for AM-PAC baseline scores, and patients with the lower 5 scores were included in the lower 50% category. If the severity modifier code was represented by an odd number of AM-PAC scores, we randomly determined whether there were more scores in the upper or lower category. Because the CH (0% impairment) and CN (100% impairment) severity modifiers are represented by only one AM-PAC score on each form, they were not able to be categorized in this manner; therefore, these modifier levels were categorized as missing when creating the new variable.
We determined the sensitivity and specificity, and positive and negative predictive values for a change in severity modifier level using a change in AM-PAC of at least 1 MDC95 as the standard. We ran 4 separate analyses to examine sensitivity and specificity for patients with AM-PAC scores that showed improvement from initial to final measurement and for patients with AM-PAC scores that showed decline from initial to final measurement on the basic mobility forms and daily activity forms. In order to determine whether sensitivity and specificity of change in severity modifier were affected by where the initial AM-PAC score fell within the range of scores assigned to each severity modifier code, we ran additional analyses based on our categorization of patients as falling into those with scores in the lower part of the range of scores for any severity modifier code and those with scores in the upper part of the range of scores for severity modifier code.
Finally, we performed 4 logistic regression analyses to determine the odds of improving or of declining by one severity modifier level given the classification of the initial AM-PAC score within severity modifier code (upper versus lower range of scores). These analyses were run separately for basic mobility and daily activity scores and were controlled for age, sex, days between measures, and diagnostic category.
Results
The sample included patients who were a mean of 71.7 years of age (SD=9.9), the majority of whom had musculoskeletal conditions with mobility impairments (Tab. 1). The median time between initial visit and last follow-up visit at which assessments were completed was 38 days (interquartile range=33). Based on the AM-PAC score changes (1 MDC95), approximately 61% of the patients improved and 15% declined in basic mobility function; approximately 55% improved and 20% declined in daily activity function. As indicated by change in severity modifier, approximately 40% improved and 8% declined in basic mobility function, whereas approximately 42% improved and 13% declined in daily activity (Tab. 2).
Changes in AMPAC Scores and Severity Modifiersa
Basic Mobility
Determining improvement in basic mobility function.
The sensitivity, specificity, and positive and negative predictive values for change in severity modifier level in determining improvement (increase in score of at least 1 MDC95) in basic mobility function are shown in Table 3. Overall, the sensitivity of severity modifier change in determining an improvement in basic mobility function was 0.63 (95% confidence interval [CI]=0.58, 0.64). The sensitivity dropped to 0.34 (95% CI=0.29, 0.39) when considering only cases in which the initial AM-PAC score was in the lower part of the range of scores for each severity modifier and increased to 0.89 (95% CI=0.85, 0.92) for cases in which the initial score was in the upper part of the range. Overall specificity of change in severity modifier for determining improvement in basic mobility function was 0.96 (95% CI=0.93, 0.97). Specificity increased to 1.00 (95% CI=0.96, 1.00) when considering only cases in which the initial AM-PAC score was in the lower part of the range of scores for each severity modifier and was 0.92 (95% CI=0.88, 0.95) when only cases with initial AM-PAC scores in the upper range of scores for each severity modifier were considered. Overall, of those patients with a positive change in severity modifier level, 96% actually had a change in basic mobility function, and of those without a positive change in severity modifier level, 62% did not have improvement in basic mobility function. Of those patients with an initial score in the lower range of AM-PAC scores for a severity modifier code who demonstrated a positive change in severity level, 100% actually had a change in basic mobility function, whereas 42% of those who did not demonstrate a positive change in severity modifier actually did not have an improvement in basic mobility function. Of those patients with an initial score in the upper end of the range, 94% with a positive change in severity modifier level demonstrated an improvement in basic mobility function, and 85% without a positive change in severity modifier did not improve in basic mobility function.
Sensitivity and Specificity of Severity Modifier in Determining Improvement in Functiona
The odds of improving at least one severity level compared with the odds of not improving were 4.46 (95% CI=3.38, 5.88) times greater for those patients with initial AM-PAC scores in the upper end of the range of scores for a severity modifier level than for those with initial scores in the lower end of the range of scores (Tab. 4).
Odds of Improving at Least One Severity Levela
Determining decline in basic mobility function.
The sensitivity, specificity, and positive and negative predictive values for change in severity modifier level in determining decline in basic mobility function (decrease in score of at least 1 MDC95) are shown in Table 5. Overall, the sensitivity of severity modifier change in determining a decline in basic mobility function was 0.44 (95% CI=0.36, 0.52). The sensitivity increased to 0.55 (95% CI=0.43, 0.66) when considering only cases in which the initial AM-PAC score was in the lower end of the range of scores for each severity modifier and decreased to 0.21 (95% CI=0.13, 0.33) for cases in which the initial score was in the upper range of scores. Overall specificity of change in severity modifier for determining a decline in basic mobility function was 0.99 (95% CI=0.98, 0.99). Specificity decreased slightly to 0.97 (95% CI=0.95, 0.99) when considering only cases in which the initial AM-PAC score was in the lower end of the range of scores for each severity modifier and increased to 1.00 (95% CI=0.99, 1.00) when only cases with initial AM-PAC scores in the upper end of the range for each severity modifier were considered. Overall, of those patients with a negative change in severity modifier level, 88% actually had a decline in basic mobility function, and of those without a negative change in severity modifier level, 91% did not have a decline. Of those patients with an initial score in the lower end of the range of scores who demonstrated a negative change in severity modifier level, 81% actually had a decline in basic mobility function, and 91% of those who did not demonstrate a negative change in severity modifier actually did not have a decline in basic mobility function. Of those patients with an initial score in the upper end of the range, 100% with a negative change in severity modifier level demonstrated a decline in basic mobility function, and 90% without a negative change in severity modifier did not decline in basic mobility function.
Sensitivity and Specificity of Severity Modifier in Determining Decline in Functiona
The odds of declining at least one severity level compared with the odds of not declining were 4.81 (95% CI=2.66, 8.70) times greater for those patients with initial AM-PAC scores in the upper end of the range of scores for each severity modifier level than for those with initial scores in the lower end of the range of scores (Tab. 6).
Odds of Declining at Least One Severity Levela
Daily Activity
Determining improvement in daily activity function.
The sensitivity, specificity, and positive and negative predictive values for change in severity modifier level in determining improvement in daily activity function are shown in Table 3. Overall, the sensitivity of severity modifier change in determining an improvement in daily activity function was 0.74 (95% CI=0.68, 0.80). The sensitivity dropped to 0.57 (95% CI=0.47, 0.67) when considering only cases in which the initial AM-PAC score was in the lower end of the range of scores for each severity modifier and was 0.90 (95% CI=0.83, 0.95) for cases in which the initial score was in the upper end of the range. Overall specificity of change in severity modifier for determining improvement in daily activity function was 0.98 (95% CI=0.94, 0.99). Specificity increased to 1.00 (95% CI=0.93, 1.00) when considering only cases in which the initial AM-PAC score was in the lower end of the range of scores for each severity modifier and was 0.95 (95% CI=0.87, 0.98) when only cases with initial AM-PAC scores in the upper range for each severity modifier were considered. Overall, of those patients with a positive change in severity modifier level, 97% actually had an improvement in daily activity function, and of those without a positive change in severity modifier level, 76% did not have improvement. Of those patients with an initial score in the lower end of the range who demonstrated a positive change in severity modifier level, 100% actually had improvement in daily activity function, and 61% of those who did not demonstrate a positive change in severity modifier did not have an improvement in daily activity function. Of those patients with an initial score in the upper end of the range, 96% with a positive change in severity modifier level demonstrated an improvement in daily activity function, and 88% without a positive change in severity modifier did not improve in daily activity function.
The odds of improving at least one severity level compared with the odds of not improving were 2.19 (95% CI=1.42, 3.41) times greater for those patients with initial AM-PAC scores in the upper end of the range of scores for a severity modifier level than for those with initial scores in the lower end of the range of scores (Tab. 4).
Determining decline in daily activity function.
The sensitivity, specificity, and positive and negative predictive values for change in severity modifier level in determining decline in daily activity function are shown in Table 5. Overall, the sensitivity of severity modifier change in determining a decline in daily activity function was 0.55 (95% CI=0.44, 0.67). The sensitivity was 0.71 (95% CI=0.51, 0.86) when considering only cases in which the initial AM-PAC score was in the lower end of the range of scores for each severity modifier and decreased to 0.33 (95% CI=0.20, 0.50) for cases in which the initial score was in the upper range. Overall specificity of change in severity modifier for determining a decline in daily activity function was 0.98 (95% CI=0.95, 0.99). Specificity was 0.96 (95% CI=0.90, 0.98) when considering only cases in which the initial AM-PAC score was in the lower end of the range of scores for each severity modifier and 1.00 (95% CI=0.97, 1.00) when only cases with initial AM-PAC scores in the upper range for each severity modifier were considered. Overall, of those patients with a negative change in severity modifier level, 88% actually had a decline in daily activity function, and of those without a negative change in severity modifier level, 90% did not have a decline. Of those patients with an initial score in the lower range who demonstrated a negative change in severity modifier level, 77% actually had a decline in daily activity function, whereas 94% of those who did not demonstrate a negative change in severity modifier actually did not have a decline. Of those patients with an initial score in the upper end of the range, 100% with a negative change in severity modifier level demonstrated a decline in daily activity function, and 85% without a negative change in severity modifier did not decline in daily activity function.
The odds of declining at least one severity level compared with the odds of not declining were 2.65 (95% CI=1.27, 5.55) times greater for those patients with initial AM-PAC scores in the upper end of the range of scores for each severity modifier level than for those with initial scores in the lower end of the range of scores (Tab. 6).
Discussion
To our knowledge, this is the first study to examine the validity of the G-code functional limitation severity modifier codes for assessing change over time. We found that the sensitivity and specificity of change of severity modifier codes in determining change in basic mobility and daily activity function were dependent on whether the initial AM-PAC score was in the upper or lower range of scores representing functional limitation severity modifier codes. Showing an improvement in functional limitation severity level was more likely if AM-PAC scores were at the higher end of the range of scores within a functional limitation severity level than if the initial scores were at the lower end of the range. Similarly, showing a decline in function using severity codes was more likely if AM-PAC initial scores were at the lower end of the range of scores representing a functional limitation modifier code than if the initial scores were at the higher end of the range. These findings suggest that the G-code functional limitation severity modifier system may not be a valid approach to determining improvement in function and could yield inaccurate results depending on the patient's initial AM-PAC score within the range of scores representing a severity modifier code. If we accept that standardized self-report measures of function, validated through IRT and traditional psychometric testing, such as AM-PAC, provide a reasonable measure of “true” function and change in function, our study has demonstrated that severity modifier codes lack accuracy. If the G-code system is eventually to be used for demonstrating to patients and payers the value of the therapy being provided, modifications should be considered.
We used change from initial to last visit exceeding the MDC95 for AM-PAC outpatient basic mobility and daily activity short forms as the standard for identifying functional change. These tools are standardized functional measures that have been developed using IRT methods, and they have been shown to be responsive to change in function in patients receiving outpatient rehabilitation services for musculoskeletal conditions,17 for patients with a variety of conditions following inpatient rehabilitation,18 and in patients with hip fracture receiving rehabilitation services.19 In addition to the AM-PAC instrument, CMS has recommended 3 other measurement tools for application in determining functional limitation severity modifier codes, all of which have at least some evidence to support their reliability, validity, and responsiveness to change.20–22 However, CMS does not require that therapists use the same tools to assess all of their patients' severity levels or to identify the specific measure that was used to derive the severity code for each patient. Our finding that change in functional limitation modifier code is related to where on a scale a patient's initial score lies is likely true for other measures where a range of scores has been equated with each severity modifier code. Another problem, wherein a change in modifier code does not represent meaningful change, can be demonstrated using a calculator found on one website that converts scores on standardized measures of function to severity modifier codes.5 Using the calculator for the Timed “Up & Go” Test, a difference of 1 second, from 15 to 16 seconds, resulted in a change in functional limitation modifier code, whereas in at least one previous study on patients over the age of 65 years, the MDC90 has been reported as 4 seconds.23
An additional issue is that the policy allows therapists to derive severity modifier codes from clinical impressions. As a result, it is unclear how therapists are making decisions about how to assign functional limitation severity modifier codes. Anecdotally, some physical therapists have reported equating the concepts of maximum, moderate, minimal, and stand-by assistance to the severity code modifiers. The lack of standardization of functional reporting using G-codes and severity modifier codes makes it impossible to use them to draw valid functional comparisons over time, across patients, or among different clinics. Although the goal of documenting and tracking patients' function throughout an episode of care is valuable, tracking will be meaningful only if a consistent outcome measurement tool is being used that provides reliable and valid information that reflects clinically relevant functional status and important changes in function over time.
We believe that the effect of where the initial AM-PAC score falls within the range of scores assigned to a severity modifier level on sensitivity and specificity of the severity modifiers in determining change demonstrates a limitation of the usefulness of the functional reporting system. Although not examined in our study, additional concerns in using severity modifier codes to assess the level of patients' functional limitations include lack of information about the reliability of therapists in assigning the codes when using clinical impressions or tools that have not been linked in a systematic way to the codes. To our knowledge, there have been no reported studies regarding the reliability of therapists in determining functional limitation G-codes and assigning functional limitation severity modifier codes. Even seemingly straightforward classifications may be open to interpretation. Beninato et al24 found poor agreement between older adult patients' and their physical therapists' assessments of improvement in walking balance. They suggest that the discrepancy may be related to the differences in perspective, as therapists see patients function in the controlled clinical setting, whereas patients use their daily mobility and activity experience in the community as a reference. This finding suggests that assessments of change in functional severity code modifier based on therapists' clinical assessments may be inaccurate. Further evidence for the validity of the severity code modifiers and their reliability within and between therapists and clinical sites is necessary before data can be legitimately aggregated and analyzed to inform decisions.
The CMS directive for functional reporting may be viewed as a useful starting point aimed at increasing the accountability of rehabilitation professionals and, because of its mandated use in the treatment of patient care reimbursed by Medicare, represents a foundation for establishing a universal system of reporting with no room for reporting only favorable outcomes. We believe, however, that several changes need to be implemented before the data derived from such a system can be meaningful. First, we recommend that CMS identify and require the use of specific, well-constructed, and vetted measurement tools. Instrument selection should be based on the demonstration of ease of application and adequate levels of reliability, validity, and responsiveness to change in relevant functional domains and in relevant patient populations. The metrics used to report functional levels and changes in function must not contain flaws such as those inherent in the existing system; appropriate measures of score change on those instruments selected to identify functional impairments should be used. Therapists also should undergo training in the systematic application and use of the selected functional assessment tools. Only then will the data derived from a universal reporting system be meaningful in reflecting clinically relevant functional status and the value of therapy being provided to patients whose care is reimbursed through Medicare.
Limitations
Our data came from one health care system; however, the patients were seen at many different sites within that system. Therefore, we believe that the results may have generalizability beyond our sample. The data, however, are derived from a sample of patients with largely orthopedic conditions; therefore, we cannot say that functional limitation severity codes would not be responsive to change in other groups of patients with different conditions.
We did not examine the accuracy of change in functional limitation severity modifier codes separately for each type of condition in our sample. There may be differences in AM-PAC MDC95 by type of musculoskeletal condition and differences in the accuracy of the change in modifier codes.
We do not know if MDC95 values demarcate “true” lack of change from “true” change in function; however, the MDC95 is more than the 2.3 to 2.6 times the SEM suggested to be clinically significant.7 We also calculated different MDCs for each tertile of baseline AM-PAC scores because it has been suggested that greater changes in functional measurement scores are necessary for patients to report experiencing important change when they begin at a lower level of function.16 It is unclear whether tertiles are the best way to demarcate score categories.
Another consideration in interpreting our results is that positive and negative predictive values are influenced by the prevalence of a condition. Therefore, if the prevalence in a sample is not similar to the prevalence in the population, the predictive values calculated based on the sample will not be accurate. The prevalence of improvement in function in our sample was slightly higher than those reported in previous research using the AM-PAC.17,18
It is possible that the accuracy of change in functional limitation modifier codes varies by the initial code assigned to the patient. In a cursory review of our data, we were unable to find any definitive pattern; however, future researchers may want to examine this question further.
Finally, we focus on the validity of functional limitation modifier codes for determining change in individual patients. It is unclear whether summaries of aggregate data using the G-code system may be adequate for making decisions at practice, state, or regional levels.
Recommendations for Future Research
Future research should include examination of the intertester reliability of therapists' assignment of functional limitation severity modifier codes when clinical judgment is used alone or in combination with standardized functional measures. Examination of the accuracy of functional limitation severity modifier codes in determining change based on standardized patient-reported measures other than the AM-PAC also would be useful. Similarly, assessment of the validity of functional limitation severity modifier codes using standardized performance-based measures would be of value. Additional lines of inquiry might be to examine the accuracy of severity modifier codes in determining change based on MID or to determine the relationship between patients' and therapists' determination of the severity modifier.
In conclusion, the results of this study demonstrate limitations in using G-code functional limitation severity modifier codes to determine change in function over a course of rehabilitation. The main limitation is that the sensitivity of change in severity modifier code as an indicator of change in function is affected by the initial score on a well-accepted standardized measure of function. Given that mandated reporting to CMS provides the prospect for aggregating data about millions of patients' function that could be used to conduct much-needed research, using a measure with demonstrated validity and reliability is critical. Furthermore, decisions about the effectiveness of therapy services and the Medicare reimbursement for therapy service must be made using valid data.
Footnotes
All authors provided concept/idea/research design and writing. Dr Stilphen and Mr Ranganathan provided data collection, project management, and institutional liaisons. Dr D. Jette provided data analysis. Dr A. Jette provided consultation (including review of manuscript before submission).
- Received January 16, 2015.
- Accepted August 4, 2015.
- © 2015 American Physical Therapy Association