Abstract
Background Practice guidelines (guidelines) have an increasing role in health care delivery and are being published more frequently. The Grading of Recommendations Assessment, Development and Evaluation (GRADE) is an approach for guideline development. The GRADE approach has been adopted by multiple national and international organizations producing guidelines related to physical therapist care.
Objective The purpose of this article is to introduce physical therapists to the GRADE approach for guideline development.
Results GRADE provides a consistent approach for guideline development and transparency in the communication of how the guidelines were developed and how the recommendations were reached, leading to informed choices by patients, clinicians, and policy makers in health care. GRADE leads to a clear distinction between the strength of the evidence and the recommendation. Both the direction (for or against) and the strength (weak or strong) of the recommendation are considered. For determining the strength of the recommendation, GRADE takes into account the quality of evidence, the balance of benefit and harm, uncertainty about or variability in patients' values and preferences, and uncertainty about whether the intervention is a wise use of resources.
Limitations The GRADE approach has been used primarily with interventions and clinical questions and less often with questions related to diagnosis and prognosis.
Conclusions The frequency of publication of guidelines is increasing. To make informed choices in the health care system, physical therapists should understand how guidelines are developed. The GRADE approach has been adopted by national and international organizations that produce guidelines relevant to physical therapist practice. Understanding the GRADE approach will enable physical therapists to make informed clinical choices.
Practice guidelines (guidelines) have an increasing role in health care delivery. Guidelines are systematically developed recommendations based on evidence and the consensus of the guideline developers.1 Guidelines inform patients' and clinicians' clinical care decisions given the specific characteristics and circumstances of patients.1 Guidelines can lead to improved care by providing standards of care that would be followed in most situations.1,2 Guideline developers producing well-developed guidelines find, appraise, and summarize evidence, thereby reducing commonly reported barriers to evidence-based practice—lack of knowledge and lack of time.3–5
Multiple professional organizations have produced guidelines. Within physical therapy, the American Physical Therapy Association and the Chartered Society of Physiotherapy are actively endorsing guidelines.6,7 With the increasing interdisciplinary nature of health care, guidelines produced by other health care and public health organizations may inform physical therapist care.8 Guidelines are being published more frequently. In 2001, Scalzitti9 reported that approximately 5,500 articles indexed by MEDLINE through the year 2000 were indexed with the MeSH term publication type “practice guideline.” In December 2013, a repeat search with the MeSH term publication type “practice guideline” retrieved more than 18,000 citations. When a search was done with the MeSH term publication type “practice guideline” combined with “physical therapy” or “rehabilitation” or “physiotherapy,” there were 294 citations through the year 2000. In mid-December 2013, the yield of the same search increased by 868, for a total of 1,162 citations. These results clearly indicate the growing importance of practice guidelines related to physical therapy and rehabilitation.
The Grading of Recommendations Assessment, Development and Evaluation (GRADE)10 is an approach for guideline development. GRADE is international in scope and interdisciplinary in its development.10 Although individual organizations may have unique guideline development processes,11,12 GRADE has been adopted by numerous organizations. These organizations include the World Health Organization, British Medical Journal, Agency for Healthcare Quality and Research, and Centers for Disease Control and Prevention Healthcare Infection Control Practices Advisory Committee. Recently, the importance of the GRADE system was recognized in the Journal of Physiotherapy.13 The broad adoption of GRADE by multiple organizations that produce guidelines related to physical therapist care highlights the need for physical therapists to understand the GRADE approach for practice guideline development. The purpose of this article is to introduce physical therapists to the GRADE approach for guideline development.
The GRADE Working Group is an international network of guideline developers who began their work in 2000 with the aim of developing a single system for guideline development that incorporates strengths of existing guideline development systems while addressing weaknesses of existing systems such as not explicitly considering the balance between health benefits and harms.14,15 The GRADE Working Group produced an approach that can be used across a wide range of international organizations and is supported by empirical evaluation.16 GRADE provides a consistent approach for guideline development and transparency in the communication of how the guidelines were developed and how the recommendations were reached, leading to informed choices by patients, clinicians, and policy makers in health care.15 GRADE leads to a clear distinction between the strength of the evidence and the recommendation.17,18 Both the direction (for or against) and the strength (weak or strong) of the recommendation are considered.19
Factors Influencing the Direction and the Strength of the Recommendation
To determine the strength of the recommendation, GRADE takes into account 4 factors: the quality of evidence, the balance of benefit and harm, uncertainty about or variability in patients' values and preferences, and uncertainty about whether the intervention is a wise use of resources.17,19 An overview of the GRADE approach is illustrated in the Figure.17,19
Quality of Evidence
Quality of evidence is assessed per outcome from a body of evidence that may involve 1 or more studies. GRADE recognizes that research design alone (eg, randomized controlled trial [RCT], observational study) does not necessarily determine the quality of the evidence. This recognition leads to the possibility of downgrading the quality of the body of evidence. Quality of evidence may be downgraded on the basis of limitations in studies, inconsistency of results, indirectness of evidence, imprecision of results, and publication bias.19,20
Limitations in studies.
Randomized controlled trials typically are considered to provide a high level of evidence. However, an RCT may be downgraded because of limitations such as risk of bias associated with the methods.20 For example, an RCT in which people taking subjective measures are not masked or in which there is a lack of allocation concealment may be downgraded for bias. Lack of masking or lack of allocation concealment has been shown to lead to an overestimation of treatment effects.21,22 If a body of evidence of RCTs has study limitations, the quality of the body of evidence is downgraded.19
Inconsistency of results.
Inconsistency of results occurs when a body of evidence for a question of interest within a guideline has inconsistent results (eg, some RCTs favor an intervention while others do not) but the reasons for the inconsistencies cannot be explained.23 For example, if the goal of the guideline is to understand interventions for the treatment of people with low back pain but the body of evidence includes studies that support and studies that do not support the intervention and this inconsistency cannot be explained (eg, intervention dosages, differences in populations), then the quality of the body of evidence is downgraded.23
Indirectness of evidence.
Indirect evidence is present when the population, intervention, or setting of interest differs from the population, intervention, or setting in the body of evidence.24 For example, if the guideline being developed is intended for use with people who are middle-aged and have a diagnosis of osteoarthritis but the preponderance of evidence is for people who are older and have a diagnosis of osteoarthritis, the evidence is indirect. Evidence also is indirect when the intervention in a body of evidence is similar, but not identical, to the intervention of interest for the guideline being developed. For example, if the goal of the guideline is to understand interventions for the treatment of people with heel pain but the body of evidence for the use of orthotics examines off-the-shelf orthotic devices for the treatment of people with heel pain, guideline developers must use judgment when considering whether this indirect evidence applies to custom-made orthotic devices. Evidence also is indirect when the setting in a body of evidence is similar, but not identical, to the setting of interest for the guideline being developed. For example, if a guideline related to stroke rehabilitation consists of a preponderance of evidence generated in academic or university hospital settings, the evidence is considered indirect when applied to community hospital settings.
Evidence also may be indirect when the outcomes of interest for a guideline differ from the outcomes in the body of evidence. This type of indirect evidence occurs when the body of evidence consists of surrogate outcomes rather than primary outcomes.24 Surrogate outcomes are measures that are believed to be related to measures of primary importance to patients (primary outcomes) but that do not directly measure primary outcomes.1 When considering a recommendation, guideline developers must take into account how closely related the surrogate outcome is to the primary outcome. Consider a guideline with the goal of understanding interventions for the treatment of people after stroke. If the body of evidence for improving ambulation examines surrogate outcomes, such as symmetrical weight bearing and step length, the evidence for the primary outcome of community ambulation is indirect. Guideline developers would need to determine whether the surrogate outcomes could reasonably be expected to affect the primary outcome.
Another type of indirect evidence occurs when a question of interest within a guideline compares treatment A with treatment B but the body of evidence does not directly compare treatment A with treatment B.24 For example, in the guideline development for the treatment of people with venous wounds, the body of evidence compares mechanical device compression with a control and compression via wrapping with a control. Because there is no direct comparison of mechanical device compression with compression via wrapping, the evidence for comparing mechanical device compression with compression via wrapping is indirect. If a body of evidence of RCTs provides indirect evidence, the quality of the evidence is downgraded.24
Imprecision of results.
The quality of evidence from an RCT may be downgraded because of imprecision. In the GRADE approach, there are 2 steps in the assessment for imprecision.25 The first is to consider whether the confidence intervals (CIs) from the body of evidence cross the minimal clinically important difference (MCID). If the CIs cross the MCID, the quality of the evidence is downgraded. If the CIs do not cross the MCID, the optimal information size (OIS) is considered.25 Optimal information size provides criteria to determine whether the body of evidence includes information from a sufficient number of participants to meet the desired significance and power. If the evidence does not meet the OIS criteria, the body of evidence is downgraded for imprecision.25 Consider that a score change of 10 is the MCID for the Oswestry Disability Index for patients with low back pain.26 If the CIs from a body of evidence for an intervention related to low back pain show that a score change falls on either side of the MCID, the next step is to calculate the OIS to determine whether the body of evidence takes into account a sufficient number of participants. If the OIS criteria are not met, the body of evidence is downgraded.25 Additionally, if the CIs include the MCID, the body of evidence is downgraded.25
Publication bias.
Publication bias occurs when a body of evidence does not include all of the studies that could be included in that body of evidence. There are multiple forms of publication bias: authors may choose not to publish nonsignificant findings, journal editors may reject studies with nonsignificant findings, and sponsors of funded studies may choose not to publish the findings.27 If published, nonsignificant findings may be delayed in publication or published in a nonindexed, non-English, or limited-circulation journal.27 In the latter 2 examples, although the findings are published, they may be more challenging for guideline developers to find. For example, in a systematic review, if only significant findings related to the treatment are published and nonsignificant, nonpublished findings for the treatment exist, the estimate of the treatment effect is overestimated.28 Similarly, in a guideline, if nonsignificant findings are not included in the body of evidence, the recommendation may be confounded.
Assessing a body of evidence for publication bias is difficult. Graphical methods, such as funnel plots, and statistical methods, such as “trim and fill,” that are used to assess publication bias have limitations.27,29 When unpublished studies with results that differ from those of published studies are available, guideline developers may be more confident that publication bias exists.27 If a body of evidence of RCTs is suspected of publication bias, the quality of the evidence likely is downgraded.27
Although the number of RCTs in the rehabilitation literature is increasing,30 observational studies also are present in the physical therapy literature.31,32 Observational studies (such as cohort and case-control studies) and nonrandomized interventional trials typically are considered to provide a lower quality of evidence than RCTs and to have the potential to overestimate treatment effects.20,33 However, despite study design, observational studies and nonrandomized interventional trials may be upgraded for multiple reasons. Observational studies and nonrandomized interventional trials may be upgraded if the magnitude of the treatment effect for the outcomes studied is large, if a dose response is present, or if the treatment effect is present even when plausible factors or biases working against the treatment are present.20,33,34
Large magnitude of the treatment effect is found.
The magnitude of a treatment effect is large when a group of people receiving a treatment have an outcome of a large magnitude that is different from those of people not receiving the treatment.20,33,34 Consider a longitudinal, observational study of children in which some children self-select for exercise and some do not. Over time, it is found that the children who self-select for the treatment (exercise) have significantly lower levels of obesity than the children not participating in the treatment. This study may be upgraded because, despite not having a controlled intervention, the magnitude of the treatment effect for the outcome of interest (obesity) is large.
Presence of a dose response.
A dose response is present when it is clear that there is a change in the outcome given a change in the dose of the intervention or exposure.20,33,34 Consider a longitudinal, prospective cohort study of older people who report minutes of strength training exercise per week and are monitored for risk of fracture. Group A reports engaging in strength training exercise for 200 minutes per week, group B engages in strength training exercise for 60 minutes per week, and group C engages in strength training exercise for 30 minutes per week. If people are analyzed on the basis of total amount of strength training exercise reported and risk of fracture and it is clear the highest dose of exercise (group A) is associated with a lower risk of fracture than the moderate dose of exercise (group B) and that the moderate dose of exercise (group B) is associated with a lower risk of fracture than the lowest dose of exercise (group C), a dose response is present. An observational study or a nonrandomized interventional trial with a clear dose response may be upgraded if the dosage of treatment is associated with a more positive outcome of interest.
Treatment effect is present even when plausible factors working against the treatment are present.
In some instances, a treatment effect may be present despite the presence of factors or biases that can decrease the potential for finding an effect.20,33,34 Consider a nonrandomized interventional trial examining fall risk in elderly people. All people in the study have a previous history of falls. People are assigned in a nonrandom fashion either to receive an exercise intervention aimed to decrease the risk of future falls or to not receive the exercise intervention (control group). People in the control group experience falls in the next 12 months. People participating in the exercise intervention program do not experience a repeat fall in the 12 months after the intervention program despite the presence of plausible factors that can increase fall risk (no control for medication use, people in the program with previous falls, and visual and sensory deficits). An observational study or a nonrandomized interventional trial with a clear treatment effect despite factors that can increase the chances of the intervention not having an effect may be upgraded.
After consideration of whether evidence should be upgraded or downgraded, a final rating of the evidence is provided. Although the quality of evidence is a continuum, the GRADE approach has 4 categories (high, moderate, low, and very low) for rating the quality of evidence. The commonality in all of the categories is how well the evidence estimates the effect in the population. The category of high-quality evidence implies high confidence that the evidence closely estimates the effect in the population. Conversely, the category of very low-quality evidence implies that the evidence does not closely estimate the population effect.35 As previously noted, the quality of evidence is assessed per outcome.19
Balance of Benefit and Harm
The balance of benefit and harm also influences the strength of the recommendation. Benefit and harm are considered in the context of patients' values and preferences.36 Patients may value some outcomes more than others. If there is a clear and large difference between benefit and harm, the recommendation likely is strong. If the difference between benefit and harm is small, the recommendation more likely is weak.37 Consider an intervention such as taping, with the benefit of a 50% reduction in pain during a functional step-down task and the possible harm of skin irritation. The difference between the benefit and the harm is large and addresses an outcome of high importance to patients, likely leading to a strong recommendation.36 With the GRADE approach, guideline developers should explicitly indicate how judgments pertaining to the recommendation are reached with regard to the balance of benefit and harm.36
Uncertainty About or Variability in Patients' Values and Preferences
Although empirical evidence related to understanding patients' values and preferences is limited,36 the GRADE approach allows these factors to be considered in the determination of the strength of the recommendation. Differences in patients' values and preferences in a body of evidence may vary across groups. Consider guidelines that recommend interventions delivered by physical therapists with specialized skills. Recommendations may vary on the basis of the access that patients may have to such specialists. Examples of access include the availability of physical therapists with specialized training and the availability of transportation to physical therapists with specialized training. Variability in patients' values and preferences or uncertainty about patients' values and preferences influences the recommendation.36,37 Guideline developers may use clinical experience to make judgments about the level of uncertainty in the absence of empirical evidence.36 Greater variability in the body of evidence or a higher level of uncertainty (of the guideline developers) about patients' values and preferences may result in a weak recommendation.36,37 With the GRADE approach, guideline developers should explicitly indicate how judgments regarding the impact of patients' values and preferences on the recommendation are reached.36
Uncertainty About Whether the Intervention Is a Wise Use of Resources
Resource consumption is considered by guideline developers when making judgments regarding the strength of the recommendation. Resource consumption goes beyond monetary cost; when possible, resource consumption should be presented as unit cost (eg, days of hospital admission, amount of clinician time).38,39 With the GRADE approach, guideline developers should explicitly indicate their perspective when considering resource use.38 Guideline developers should identify which resources are considered critical and search for evidence indicating resource use for interventions and alternatives.38 Defining high resource consumption is context sensitive and may vary by geographical area, socioeconomic status, and health care systems. Resource consumption may change over time.18,38 Interventions with a high fiscal resource cost, high personnel time cost, or both may be more feasible in a large health care system with multiple physical therapists in an urban setting than in a smaller health care system with one physical therapist in a rural setting. With the GRADE approach, guideline developers should explicitly state how judgments pertaining to the recommendation are reached with regard to resource consumption.38,39
GRADE Recommendation
The GRADE approach takes into account the quality of evidence, the balance of benefit and harm, uncertainty about or variability in patients' values and preferences, and uncertainty about whether the intervention is a wise use of resources. These considerations and relevant findings are available in evidence profiles and summaries of evidence tables.19 Judgments made during the guideline development process, such as those related to the importance of outcomes and resource consumption, are stated. This information transparently documents reasons for the quality of evidence rating and the direction (for or against) and strength (weak or strong) of the recommendation.19 Using clinical judgment and consideration of patient values, strong recommendations would typically be agreed to by most patients, recommended by most clinicians, and serve as markers of quality care by most policy makers. Conversely, using clinical judgment and consideration of patient values, weak recommendations may necessitate that clinicians explore a number of different choices to assist the patient at arriving at a health care decision that meets his or her values and preferences.40
Additional Considerations and Limitations
Reliability of GRADE
Recently, the reliability of the GRADE approach was assessed.16 A more experienced group of raters and a novice group of raters demonstrated acceptable reliability when using the GRADE approach to assess the quality of evidence (for 2 raters, the intraclass correlation coefficient [ICC] and the 95% CI were .72 and .63–.79, respectively, for the more experienced group and .66 and .56–.75, respectively, for the novice group). Training had a larger effect for the novice group (ICC=.11–.66) than for the more experienced group (ICC=.62–.72).16
Application of the GRADE Approach to Diagnostic and Prognostic Studies
Although the use of the GRADE approach for guidelines on interventions is well documented,17,19,35,37,41 efforts to develop the use of the GRADE approach for guidelines related to diagnostic and prognostic evidence are ongoing.42–45 Gopalakrishna et al42 recommended additional guidance for applying the GRADE approach to diagnostic evidence. Specifically, raters had difficulty applying the concepts of inconsistency, imprecision, and publication bias to studies of diagnostic evidence. Additionally, raters had difficulty linking the risk of bias as assessed with a revised tool for the quality assessment of diagnostic accuracy studies (QUADAS-2 as assessed with GRADE).42 Huguet et al44 also recommended modifications to the GRADE approach for the assessment of prognostic evidence. For example, the assessment of bias associated with prognostic studies differs from the assessment of bias associated with RCTs.44
Guideline developers should consider the appropriate application of the GRADE approach to certain questions. For example, if the question of interest is ambiguous or if there is insufficient direct evidence for the question of interest, the GRADE approach may not be appropriate.19
Conclusion
The frequency of publication of guidelines is increasing. To make informed choices in the health care system, physical therapists should understand how guidelines are developed. The GRADE approach has been adopted by national and international organizations that produce guidelines relevant to physical therapist practice. Understanding the GRADE approach will enable physical therapists to make informed clinical choices.
Footnotes
All authors provided concept/idea/project design, writing, and project management.
Portions of this article were presented at the American Academy of Orthopaedic Physical Therapy; October 16–20, 2013; Cincinnati, Ohio. An educational session that incorporates features of GRADE will be held at the Combined Sections Meeting of the American Physical Therapy Association; February 4–7, 2015; Indianapolis, Indiana.
- Received January 2, 2014.
- Accepted July 4, 2014.
- © 2014 American Physical Therapy Association