Abstract
A prognosis is a broad statement that predicts a patient's likely status, or degree of change, at some time in the future. Clinicians are likely to improve the accuracy of their judgments of prognosis by incorporating relevant research findings. In recent years, there has been substantial growth in the number of primary studies and systematic reviews addressing prognosis for people likely to receive physical therapy care. The purpose of this clinical update is to provide a framework for identifying, appraising, and utilizing these research findings to help make prognostic judgments.
A prognosis is a broad statement that predicts a patient's likely status, or degree of change, at some time in the future.1–4 At the level of the individual, a prognosis provides the practitioner and patient with critical information, including the patient's expected future health status, likely response to intervention, and likely duration of treatment.2 This information also is critical for family members, employers, and third-party payers who may need to plan for lost work time and consider the financial implications of the patient's condition and required treatment. At the level of specific patient populations, the formation of prognostically similar subgroups facilitates increasingly meaningful comparisons for intervention trials5,6 and the development of clinical guidelines for care.7 Considering this, the ability to generate an accurate prognosis is essential for patient management by a physical therapist.
A prognosis relates to future events and, therefore, always includes an element of uncertainty. Conceptually, one can consider a prognosis to be an “informed guess” in which a clinician juxtaposes relevant clinical information, such as the nature and severity of the patient's condition, with his or her prior experience treating people with similar characteristics.4 Prognostic judgments, therefore, are based, to various degrees, on clinical intuition. Intuition can be very important in the decision-making process; however, judgments made primarily from intuition are prone to bias.8 Clinicians are likely to improve the accuracy of their judgments of prognosis by incorporating relevant research findings. In recent years, there has been substantial growth in the number of primary studies and systematic reviews addressing prognosis for people likely to receive physical therapy care.9–13 The purpose of this clinical update is to provide a framework for identifying, appraising, and utilizing these research findings to help make prognostic judgments.
Prognostic Factors
To develop a prognosis, clinicians must consider unique characteristics of patients known as prognostic factors or predictors.1 Prognostic factors influence the likelihood of outcome, and may include a wide array of demographic traits, disease-specific factors, or comorbid conditions. For example, common demographic traits that influence prognosis often include patient age, sex, and occupation.9,10,12,13 Disease-specific factors may relate to the stage, severity, and natural history of the patient's condition,1,9,10 whereas comorbid medical and biobehavioral factors often include cardiovascular disease, obesity, elevated fear-avoidance beliefs, or depression (Figure).14–18 The wide spectrum of prognostic factors was recently illustrated by Cote and colleagues12 in a systematic review of literature on prognosis for patients with whiplash injuries. The authors reported that delayed recovery was associated with demographic factors (older age, female sex), disease-specific characteristics (high baseline pain, the presence of radicular signs), and a socioeconomic factor (hiring a lawyer early in the course of care).
Examples of patient characteristics that should be considered when determining a patient's prognosis for outcome following physical therapy intervention.
It is important to note that prognostic factors do not necessarily need to cause the outcome; they just need to have a strong enough association to be predictive of the likelihood of the outcome.3 For example, injured workers with acute low back pain who have high scores on the work subscale of the Fear-Avoidance Beliefs Questionnaire (FABQ) have a higher likelihood of delayed recovery compared with those injured workers with low scores.17 However, not all injured workers with acute low back pain and high scores on the work subscale of the FABQ will have delayed recovery. In addition, high scores on the work subscale of the FABQ may not be predictive of outcome in noninjured workers.18 Thus, prognostic factors often vary across different patient populations and under different conditions of measurement.19 Research designs that study prognostic factors, therefore, must be carefully constructed to control for these likely sources of variability. To accurately appraise their quality it is important to understand the unique components of prognostic research studies.
Components of Prognostic Research Studies
The Cohort Design to Investigate Prognosis and Prognostic Factors
An inception cohort is the most desirable design to use to study prognosis and prognostic factors.20,21 Inception cohort studies enroll all subjects at roughly the same, well-defined time (“zero point”) during the course of their illness (eg, at the onset of symptoms) or at a specific point postoperatively. This is important because substantial error in determining prognosis is likely to occur if the observation of patients is begun at different points in the course of their condition.3
Data may be collected under different conditions in an inception cohort study. A time-based model tracks the time it takes for an individual to achieve a specific outcome such as regaining full range of motion, independent ambulation, or return to work.3 Subjects are followed until the event of interest occurs, and are evaluated by statistical procedures called survival analyses that can track the time to the event.13,22 Cohort studies related to physical therapy care often are associated with intervention that is administered over a predetermined time period.9,10,20 These studies include subjects with similar diagnoses and treatment indications who receive the same treatment over a course of care, after which outcome is assessed and prognostic factors are evaluated.
This time-based model is illustrated by McIntosh et al,13 who followed a cohort of patients filing claims for workers’ compensation benefits in Ontario, Canada, after a job-related low back injury. A total of 1,752 subjects received daily physical therapy intervention for a maximum of 30 days. The outcome measure was the number of days each subject received benefits, for up to 1 year following the date of injury. The authors reported 5 factors that predicted increased time for receiving benefits: working in the construction industry, older age, lag time from injury to treatment, pain referred to the leg, and 3 or more positive “Waddell signs.”23
To address external validity, it is critical for prognostic studies to maximize the sample's likelihood of representing the population of interest.1,19 An ideal prognosis study would guarantee adequate representation by enrolling all people with the condition of interest who live in a specific geographic area and stratifying these individuals by the potentially most important prognostic factors.3 This design is usually not practical; thus, most prognosis studies use a representative cohort in which sampling procedures are carefully constructed to include a wide array of relevant patients. This approach is practical and, when performed correctly, allows adequate representation of the population of interest. In some instances, however, this may introduce a degree of “sampling bias” (ie, entry in the study may be limited to those patients seeking care at those facilities involved in the study).3 This may result in a sample with subtle, but important, unique characteristics; for example, the subjects may have the social and financial resources to receive care at the specific facility, resources that may not be available to other people with the same condition. These studies are often called a “false cohort” or an “available patient cohort.”3 False cohort designs can provide useful information, but they are not conclusive in describing the prognosis of all people with the condition of interest.
Referral bias may occur when subjects are only recruited from a specialty care clinic. These subjects often represent unique subgroups that are likely to have a higher prevalence of unfavorable prognostic factors than patients seen in community practices.3 This makes generalizing the findings problematic. For example, prognostic factors identified in patients receiving care in an occupational rehabilitation facility may not be applicable to those patients with similar diagnoses who receive care in a community practice. To allow clinicians to determine the degree of potential sampling or referral bias in a prognosis study, it is critical for authors to provide very detailed descriptions of their subjects and sampling procedures.
Randomized Clinical Trials to Investigate Prognosis and Prognostic Factors
Inception cohort studies can be difficult to perform and are time-consuming; therefore, some authors create cohorts from patients who have been randomly assigned to groups during randomized clinical trials (RCTs).3 This approach has many advantages (ie, the patients in the groups are typically well-described, receive consistent intervention, are followed over the course of care, and have valid measures of outcome). Caution should be taken, however, because subjects in these studies often are not representative of the larger population of people with the same disorder1,3; thus, the results may have limited generalizability.19
The entry and exclusion criteria required to maintain the internal validity of an RCT are typically quite rigid and may further narrow the generalizability of the results. For example, a high number of patients with back pain who are seen in general physical therapist practices may be receiving workers’ compensation13,16,24 or have evidence of depression25; however, these factors often are exclusionary criteria for potential research subjects. In addition, many RCTs will have rigid applications of intervention that also are needed to maintain internal validity, but which may be hard to replicate (ie, specific manual therapy procedures that require special training). This may further restrict the generalizablity of results to other practices.
Measures of Outcome in Prognostic Studies
The outcome of interest in a prognostic study often varies based on the specific goals of the patient who is seeking care. For example, when describing the prognosis of serious disease for which patient survival is the critical issue, single-rate measures of mortality (eg, the 5-year survival rate) often are used.3,4 The majority of patients seeking physical therapy care, however, do not have life-threatening conditions; thus, patient-related status measures, such as functional ability9,26–28 or return to work,13,17,24,29 at a given time are more useful to describe outcomes for these patients than survival and mortality. Kennedy et al9 argued that 2 main dimensions of outcome should be considered in prognosis studies: the patient's status at outcome and the degree of change from initial status that occurs at outcome.
Follow-up outcome measures must have evidence of responsiveness30–33 and should be obtained at intervals that are likely to reflect meaningful change.1 For example, a study that investigates predictors of recovery from acute ankle sprains might require several measurements in the first few weeks, whereas a study addressing prognostic factors for patients with hip joint arthritis might need measures spread over a period of years.
Reporting and Interpreting Prognostic Factors
Describing individual prognostic factors.
Several statistical representations are used to describe the relationships between individual prognostic factors and outcome.1,9,10,34–36 When using regression analyses with continuous outcome measures, authors typically will report P values to identify the presence of significant associations between the outcome measure and the prognostic factors being investigated. Although the presence of statistical significance is important, it is difficult to incorporate this information into clinical practice without some estimate of the “strength” of the factors.1 Unstandardized beta coefficients are useful for this estimation because they describe how much change occurs in the dependent variable (outcome measure) following a change of 1 unit of a prognostic factor9 (eg, an unstandardized beta coefficient of 4.00 means that a change in 1 unit of the prognostic factor will result in a change of 4 units in the outcome measure).
As an illustration, assume the prognostic factor was the presence of a workers’ compensation claim. Those subjects who do not have a workers’ compensation claim are coded as 0 and those who do are coded as 1. If the prognostic outcome was disability, measured using the Roland-Morris Scale (0–24 points), we would expect those subjects who have a workers’ compensation claim to have 4 more points on the Roland-Morris Scale at follow-up than those who do not have a workers’ compensation claim. Standardized beta coefficients are useful because they allow direct comparison between the different predictors by converting unstandardized beta coefficients to a “standardized’ score.
Another helpful way to report the strength of predictor variables is by partial R2 values that describe the amount of variance in the outcome measure explained by a prognostic factor beyond what is already explained by other prognostic factors in the model. For example, Kennedy et al9 reported a partial R2 of .20 for a high baseline score on the Disabilities of Arm, Shoulder, and Hand (DASH) measure37 in a model investigating prognosis for patients receiving physical therapy for soft tissue disorders of the shoulder. This suggests that, for this model, a high baseline DASH score explained 20% of variance in outcome.
Likelihood ratios (LRs)34,38 describe the probability of the outcome of interest after the prognostic factor has been identified and are valuable when the outcome measure is dichotomous. A good example is provided by Riddle and Stratford,34 who calculated the validity indexes for various scores on the Berg Balance Test to predict falls among elderly people. The authors reported that individuals with a Berg Balance Test score of ≤40 were nearly 12 times more likely to be a faller than a nonfaller (positive LR=11.7). To apply this finding to individual patients, the authors emphasized that the magnitude of validity indexes is influenced by the pretest probability of the event of interest occurring. For example, an elderly person who is in good health and is ambulatory in the community has a much lower pretest likelihood of falling than does an elderly person with cognitive impairment and congestive heart failure leading to syncope. If both of these patients scored 40 on the Berg Balance Test, the posttest likelihood of falling would be much higher for the patient with cognitive impairment and congestive heart failure. This example illustrates that prognostic factors are of great value, but should not be used in isolation (ie, they must always be considered in reference to the unique characteristics of the patient).4,19,34
Describing predictive models.
Predictive models often are developed in which combinations of factors can provide a stronger estimate of prognosis than individual factors. Authors typically provide measures, such as the multiple correlation coefficient R2, that indicate the amount of variance in the outcome explained by various combinations of prognostic factors.35,36 Large values for R2 suggest that the model is a strong predictor of outcome, whereas data sets with a low R2 have large portions of the variance unexplained, suggesting that other prognostic variables that are not included in the study are strongly influencing the outcome. This is known as “susceptibility” or “assembly bias,”3 and poses a major challenge for prognosis studies (ie, the need to sample the most relevant patient characteristics from which prognostic factors may be identified).1
Prior to beginning the study, potential prognostic factors may be identified by expert panels of clinicians and, in some cases, by people who have, or have had, the condition of interest. The number of potential prognostic factors and the degree to which patients are exposed to these factors can be very large; for example, Dionne et al29 identified over 100 potential predictors of return to work for patients receiving workers’ compensation. Studies that address conditions likely to have many prognostic factors require large sample sizes. This can be a problem when addressing conditions that have a low prevalence.
Using Findings From Prognosis Studies in the Clinical Environment
Locating Research Studies That Address Prognosis and Prognostic Factors
The development of nearly universal access to computer-based literature searches has given clinicians and researchers unprecedented opportunities to obtain and appraise relevant research.39 In recent years, however, the substantial increase in the number of research publications that address prognosis and prognostic factors makes it impractical, in many cases, to review all relevant individual studies.8,23 Busy clinicians may benefit greatly from obtaining systematic reviews of prognosis studies, which have already located, appraised, and summarized the relevant research. Systematic reviews assess similar bodies of published research by using a comprehensive search strategy, and they use specific procedures to include or exclude studies and to make judgments regarding the validity and consistency of findings.6,12 These reviews provide valuable information regarding the overall strength of evidence supporting or not supporting the role of various prognostic factors. From these reviews, the most relevant primary studies can be identified and appraised.
Primary studies and systematic reviews can be located from a variety of Web sites including PubMed (www.ncbi.nlm.nih.gov/entrez/query.fcgi). To identify systematic reviews, broad search terms include “systematic review” and “meta-analysis,” while more specific terms to help identify prognosis studies include “cohort,” “follow-up studies,” “clinical course,” “prognosis,” or “predictors.”
Triaging Studies Addressing Prognosis and Prognostic Factors
The inherent complexity of prognostic research creates an environment in which even well-performed studies are susceptible to bias that can lead to inappropriate application of the results.1,3,19 Considering this, studies of prognosis must be carefully evaluated before their findings are implemented in individual clinical practice. To simplify this process, systematic reviews and primary studies can be “triaged” by asking 2 questions1:
“Can the findings be generalized to my patient?” and
“Are the findings likely to make meaningful predictions of outcome?”
Table 1 provides a list of specific questions that will help readers make this determination. If the answer to either of these 2 questions is “no,” the search should be directed elsewhere. If the answer to both of these questions is “yes,” the systematic review or individual study should be carefully reviewed to determine whether the degree of potential bias in sampling or measurement is large enough to invalidate the results. It is important to note that virtually no study of prognosis will be perfect4,8,20; therefore, readers should ask, “Do the results make sense, and are the flaws minimal enough to accept the results?”1
To What Degree Does the Potential Bias in the Research Design Threaten the Results?
Hayden and Bombardier6 and other authors1,20 have published criteria for assessing the quality of prognostic studies based on the potential for bias (Tab. 2). These criteria include:
Study participation: The study sample should represent the population of interest in all key characteristics (ie, all patients should be enrolled at the same point in the course of their illness) (inception cohort). The study should control for sampling and referral bias. The inclusion and exclusion criteria should be adequately described.
Study attrition: In prognostic studies, it is critical to determine the outcomes for as many patients as possible. The majority of subjects lost to follow-up in prognostic studies are lost due to disease-related factors20 (ie, their condition is worsened) or they had an unfavorable response to treatment. Failure to account for these subjects may result in an overestimation of the likelihood of a favorable prognosis.
Prognostic factor measurement: Potential prognostic factors should be clearly defined, and reliability measured in a consistent manner. When a prognostic factor is used to classify a patient, evidence of the validity of the cutoff score for that classification should be provided.34
Confounding variables: Important confounding variables, such as variations in treatments between subjects,6 need to be clearly defined and reliably measured. Controlling for potential confounding variables should be addressed in the study design and in the data analysis.
Outcome measurement: The outcome measurements must be meaningful for the patients and obtained consistently for all patients at appropriate times. Follow-up measures should be spread out over a long enough duration.1 To address measurement bias, the person who is obtaining measurements should not be familiar with the presence of the potential prognostic factors.
Analysis: The analysis should be described in detail and include an adequate rationale for the process of model building. Statistical adjustments should be made for important prognostic factors and confounding variables.1,6,20 The data should be presented in enough detail to determine the appropriateness of the findings (ie, so that there is no selective reporting of results).
Implementing the Findings: A Clinical Example
Your patient is a 55-year-old, right-handed, male accountant whose examination findings are consistent with lateral epicondylitis of his right elbow that is associated with symptoms of radial nerve entrapment. His symptoms began gradually 4 months ago after he began to take tennis lessons. Despite giving up tennis 1 month ago, his symptoms have remained.
You administered the DASH, a standardized self-report measure of upper-extremity function and symptoms,37 and calculated that his initial score was 64%, with 100% representing the highest level of disability. Your patient would like to know whether he is likely to have a complete recovery following a 2-month course of physical therapy intervention.
Your intuition is that this patient may still have symptoms and a degree of disability at that time; to assist in your judgment, you decide to check the literature. Your initial PubMed search terms are “lateral epicondylitis” and “systematic review.” This search yields 188 citations. You add the term “physical therapy,” which narrows the search to 39 citations. To refine this search even further you add the term “prognosis,” which yields 9 citations. After triaging the abstracts, you find no systematic review that specifically addresses prognosis; therefore, you go to the primary studies by removing the term “systematic review” from your search.
A recent primary study by Waugh and colleagues10 specifically investigated prognosis for patients with lateral epicondylitis and seems promising. In your triage, you review the inclusion and exclusion criteria, and it appears that this sample has very similar characteristics to your patient. Because a continuous measure of outcome was used (the DASH), the authors reported P values, beta values, and an adjusted R2 from a regression analysis to describe significance and magnitude of the predictors on outcome. Based on this information, you decide to assess the article to determine the degree of potential bias in these findings.
After carefully reading the article, you address each of the criteria.
Study participation: Were all patients enrolled at the same point in their illness? No, there was a wide range in the chronicity of symptoms. Is there likely to be sampling or referral bias? Possibly, the focus of the study is limited to the effect of physical therapy on patients seeking care. Subjects were enrolled from 9 private sports medicine clinics and 2 outpatient hospital departments. Ninety-six percent of the subjects had white-collar jobs. Were the inclusion and exclusion criteria adequately described? Yes.
Study attrition: Was complete follow-up achieved? Almost all subjects (96.5%) provided complete follow-up data.
Prognostic factor measurement: Are potential prognostic factors clearly defined, and reliability measured in a consistent manner? Generally, yes. The authors used standardized questionnaires and reliable measures for most variables. The authors acknowledged that data regarding reliability for some of the clinical tests used in baseline assessment are not available.
Confounding variables: Were confounding variables clearly defined and addressed? Yes and no. Subjects received similar, but not identical, intervention. In all cases, the treatment time was equal (8 weeks).
Outcome measurement: Were meaningful outcome measures used? Yes, the DASH is a well-described instrument that yields reliable measurements. Was there likely to be measurement bias? No, the main outcome measure, the DASH, is a questionnaire and was self-administered.
Analysis: Was the analysis described in detail, and did it include an adequate rationale for the process of model building? Yes. Was there likely to be susceptibility or assembly bias? Possibly, the final model explained 61% of the variation in outcome. This means that 39% of the variation is likely to be explained by other factors that were not measured.
To summarize, you could argue that the potential exists for sampling bias, measurement error in classifying some baseline variables, and the confounding effect of variations in treatment. Considering these limitations, however, it would appear that this study provides substantial and meaningful information. To apply the findings of this study to your patient, consider the following: the mean improvement in the DASH score for all patients in this study was 13.6 points (95% confidence interval=10.6–16.3) points. If these data are generalized to your patient, it is likely that, following 8 weeks of similar intervention, he would have a final DASH score of between 32 and 46 points. However, the unstandardized beta coefficients indicate that a strong predictor of the final DASH score is the presence of nerve symptoms (β=7.32). This indicates that, if the baseline scores were identical between male patients, an individual who reports nerve symptoms will be likely to have a final DASH score that is ∼7 points higher than a patient who does not have nerve symptoms. Thus, your patient is likely to have a final DASH score of between 39 and 53 points, suggesting an unfavorable prognosis to be completely recovered following 8 weeks of care.
Summary
Prognostic judgments are of central importance to patient management at all levels. Prognosis is a prediction, and it cannot be directly measured or confirmed with other tests, only inferred from previous, longitudinally obtained measures. Thus, prognostic statements inherently have much more uncertainty than do most diagnostic judgments. The appropriate use of research findings can help reduce the uncertainty of prognostic judgments greatly; however, determining an accurate prognosis, as with all clinical decisions, requires interplay among the best available research, clinical experience, intuition, and the unique characteristics of the individual patient.
Footnotes
-
Both authors provided concept/idea/project design and writing. The authors thank Stacy Fritz, PT, PhD, and Claire Coyne for their kind assistance in the preparation of the manuscript.
- Received September 21, 2006.
- Accepted June 12, 2007.
- Physical Therapy