Abstract
Note from PTJ's Editor in Chief: Both investigators and readers get frustrated reading research on low back pain because of different definitions of “chronic” and different outcome measures. Lack of consensus on study methods makes it difficult to determine if contradictory findings are based on different methods or different interventions; lack of consensus also prevents synthesis across studies. Dr. Partap Khalsa, Deputy Director, National Center for Complementary and Integrative Health, announced the release of Research Standards for Chronic Low Pain, and the hope is that future investigations will adopt them and reduce variability in research reporting. The task force on research standards was an international, multidisciplinary team including Anthony Delitto, PT, PhD, FAPTA. Its findings have been published in leading pain journals. PTJ is among the first professional journals to share the report with its readers.
Despite rapidly increasing intervention, functional disability due to chronic low back pain (cLBP) has increased in recent decades. We often cannot identify mechanisms to explain the major negative impact cLBP has on patients' lives. Such cLBP is often termed non-specific and may be due to multiple biologic and behavioral etiologies. Researchers use varied inclusion criteria, definitions, baseline assessments, and outcome measures, which impede comparisons and consensus. Therefore, NIH Pain Consortium charged a Research Task Force (RTF) to draft standards for research on cLBP. The resulting multidisciplinary panel recommended using 2 questions to define cLBP; classifying cLBP by its impact (defined by pain intensity, pain interference, and physical function); use of a minimum dataset to describe research participants (drawing heavily on the PROMIS methodology); reporting “responder analyses” in addition to mean outcome scores; and suggestions for future research and dissemination. The Pain Consortium has approved the recommendations, which investigators should incorporate into NIH grant proposals. The RTF believes that these recommendations will advance the field, help to resolve controversies, and facilitate future research addressing the genomic, neurologic, and other mechanistic substrates of chronic low back pain. We expect that the RTF recommendations will become a dynamic document and undergo continual improvement. Perspective: A task force was convened by the NIH Pain Consortium with the goal of developing research standards for chronic low back pain. The results included recommendations for definitions, a minimum dataset, reporting outcomes, and future research. Greater consistency in reporting should facilitate comparisons among studies and the development of phenotypes.
- Low back pain
- chronic low back pain
- research standards
- minimum dataset
- NIH Task Force
The Institute of Medicine recently estimated that chronic pain affects approximately 100 million adults in the United States, with an estimated annual cost of $635 billion, including direct medical expenditures and loss of work productivity.3 Activity-limiting low back pain (LBP), in particular, has a worldwide lifetime prevalence of approximately 39% and a similar annual prevalence of 38%.61 The majority of people who have LBP experience recurrent episodes.62 The use of all interventions for treating chronic LBP (cLBP) increased from 1995 to 2010, including surgical, pharmacologic, and nonpharmacologic approaches. Despite increased utilization, however, the prevalence of symptoms and expenditures has increased.37,70,91
There is growing evidence that cLBP, like other chronic pain conditions, can progress beyond a symptomatic state to a complex condition109 involving persistent anatomic and functional changes in the central nervous system,9,93,100 in addition to structural changes in the back (eg, degenerative spinal changes, atrophy, or asymmetry of paraspinal muscles).10,11,14 Although some patients with cLBP may have clear pathoanatomic causes of pain, for many there is no clear association between pain and identifiable pathology of the spine or its associated soft tissues.26
Many patients who undergo procedures intended to correct the putative causative pathoanatomy continue to have pain. Furthermore, we often cannot identify mechanisms to account for the substantial negative impact cLBP has on the lives of many patients.20 Such cLBP is often termed nonspecific, idiopathic, mechanical, or due to instability, and may in fact be due to the contributions of different and multiple biologic and behavioral etiologies in different individuals.87
Many classes of interventions have been developed and tested in adults with cLBP, including spine surgery, injections, medications, psychological interventions, manual therapies, exercise, nutritional supplements, and lifestyle change and self-management approaches.17–20 Many of these have shown some clinical benefit, but few appear to consistently provide substantial, long-term reductions in pain with increased function.25,27–29
A critical issue for advancing research on cLBP is comparing results from the many classes of interventions. In 2009 and 2010, the National Institutes of Health (NIH) Pain Consortium convened 2 workshops on LBP research, inviting experts from the relevant scientific and clinical fields to provide research recommendations to NIH. These experts noted that clinical studies have used variable inclusion and exclusion criteria, case definitions for LBP chronicity or recurrence, baseline assessments, stratification criteria, and outcome measures. As a result, it is difficult to compare epidemiologic data and studies of similar or competing interventions, replicate findings, pool data from multiple studies, resolve conflicting conclusions, develop multidisciplinary consensus, or even achieve consensus within a discipline regarding interpretation of findings. Key recommendations from the workshops on how to advance cLBP research were to establish research standards for cLBP and to have the NIH facilitate this process.
In response, the NIH Pain Consortium established a Steering Committee for a Research Task Force (RTF) on Research Standards for cLBP. The Steering Committee was composed of representatives from the following NIH institutes/centers: National Center for Complementary and Alternative Medicine (NCCAM), National Institute on Aging, National Institute of Arthritis, Musculoskeletal, and Skin Diseases (NIAMS), National Institute of Child Health and Human Development, National Institute on Drug Abuse, National Institute of Dental and Craniofacial Research, National Institute of Neurological Disorders and Stroke (NINDS), and National Institute of Nursing Research. The Steering Committee developed goals for the RTF, identified the needed scientific and clinical expertise, selected 2 co-chairs, and invited 14 additional experts from outside NIH to join the RTF. The Steering Committee provided 2 representatives (Drs. Panagis and Khalsa) in ex-officio (ie, nonvoting) capacity to the RTF.
The NIH Pain Consortium charged the RTF with developing a set of standards for clinical research on cLBP that would address the following:
Consider the state of existing research relevant to the development of standards.
Conduct a comprehensive review of existing case definitions, diagnostic criteria, and outcome measures that are relevant
Develop a draft set of standards
Engage the broader research community and representatives from relevant government agencies in developing these standards
Chart a general plan for their incorporation into research studies and their future modification
This charge focused solely on developing standards for research and not for use in coding, billing, or general use in clinical settings.
Methods
Creating the RTF
The Steering Committee selected 2 co-chairs with complementary leadership expertise. Dr. Deyo was chosen for his expertise in LBP research and Dr. Dworkin for his prior leadership in developing research diagnostic criteria for temporomandibular disorders (TMD), another set of chronic pain conditions. The co-chairs, in consultation with the Steering Committee, selected the RTF members for their needed scientific and clinical expertise (Table 1).
Task Force Members, Affiliations and Expertise
Work Plan
The RTF Evolved a 3-stage Work Plan, Each Involving a 2-day Meeting
Stage 1. The first meeting opened with remarks by the NIAMS and NCCAM directors, Stephen Katz, MD, PhD, and Josephine Briggs, MD, respectively. The directors emphasized the nature of chronic back pain as a highly prevalent and costly public health challenge. They noted the existence of many stakeholders, including individuals with back pain, health care systems, clinicians, drug and device makers, regulatory agencies, and federal, state, and third-party payers. They emphasized the research—as opposed to clinical or administrative—focus of the task force.
Initial efforts of the RTF were directed at defining subsequent activities and products. At the initial and subsequent meetings, a consensus evolved on several important issues and strategies (Table 2).
Key Principles Developed by the Task Force on Research Standards for Chronic Low Back Pain
The RTF noted that although the intended users of the proposed research standards would be investigators submitting grant applications to NIH, the standards would be available to and encouraged for all researchers. The research standards could potentially allow cLBP phenotypes to be uncovered based on physical and psychosocial findings.
The RTF decided that it could not respond in detail to every component of the NIH Pain Consortium's charge. For example, producing explicit evidence-based diagnostic criteria for conditions such as spinal stenosis, sciatica, or spine “instability” would be impossible given the available time and resources and the current lack of professional consensus. However, stratifying cLBP by its impact might have equally important descriptive and prognostic value and could supplement any pathophysiologic description.
Stage 2. The co-chairs conducted a series of e-mail surveys of RTF members. The surveys were based on item lists generated at the first RTF meeting and addressed key issues from the meeting. The following surveys and literature review efforts were conducted:
Survey of candidate objective findings and medical history for a minimum dataset: Members ranked the importance of potential baseline descriptors for patients with cLBP. These included items of medical history, comorbidity, physical examination, and laboratory and imaging tests.
Survey of candidate self-report measures of behavior, mood, and symptoms: Task force members were asked to rank the importance of measures of pain-related behavioral, emotional, and psychosocial domains influencing the expression of cLBP.
Survey on feasibility of developing research diagnostic criteria for subsets of nonspecific chronic low back pain: Part of the charge from the Pain Consortium was to consider developing a research diagnostic classification system based on pathophysiologic or etiologic features (ie, criteria for subsets of nonspecific cLBP). This survey asked task force members to assess the feasibility of such an effort.
Review of existing literature on back pain classification and prognosis: The task force did not undertake a systematic literature review but considered previous work on back pain taxonomy,4,6,15,24,33,34,44,52,56,74,79,83,101,104,105,117 prognostic classification,13,19,31,38,39,43,48,49,53,54,57–60,66–68,73,76,80,82,85,86,97,106,108,110,112–116,122 pain and psychosocial measures,12,31,42,45,64,65,69,71,75,77,78,81,90,94,96,103,107,115,118–121,125 and outcome assessment.5,8,21,23,32,36,40,41,50,51,55,88,89,95 These sources informed the deliberations and recommendations.
At the second RTF meeting, the most highly ranked candidate items for the minimum dataset based on survey responses were accepted with minimal disagreement or need for further discussion. Special attention was directed to the possible use of the Patient Reported Outcomes Measurement Information System (PROMIS) measures.5,21,51,55,89,95 Progress was made toward defining cLBP and its impact. There was general agreement that developing pathophysiologic diagnostic criteria for subsets of nonspecific low back pain was unfeasible at present.
The RTF also heard presentations of 2 related NIH efforts. The first was the NINDS effort to create “common data elements” for use by all NINDS-supported researchers. The second related to the NIH PROMIS effort, which includes several psychometrically sound patient-reported outcomes measures directly relevant to the task force.
Stage 3. At the third meeting, the RTF agreed on a series of recommendations to be forwarded to the NIH Pain Consortium. These included a definition of cLBP and specific measures to stratify its impact. It also reached agreement on recommending specific domains and items to be integrated into a minimum dataset for research on cLBP. There followed a discussion of outcome measures and future research needs regarding the task force recommendations.
The task force also suggested strategies for obtaining feedback and support for its recommendations. These included consultation with the NIH Pain Consortium and relevant NIH institutes, other government agencies, and relevant journal editors. It would also include presentations at meetings of research and professional organizations.
Task Force Recommendations
The principles articulated in Table 2 led the task force to several specific recommendations that are summarized in Table 3. The rationales for these recommendations are discussed next. The first three recommendations refer to the questionnaire instrument in Fig 1.
Task Force Recommendations: Research Standards for Chronic Low Back Pain (cLBP)
Recommended minimum dataset.
Recommendation 1. Describe the Chronicity of Low Back Pain
The RTF recommended that “chronic low back pain” (cLBP) be defined as a back pain problem that has persisted at least 3 months and has resulted in pain on at least half the days in the past 6 months. A human figure drawing would illustrate the region defined as the low back, indicating the space between the lower posterior margin of the rib cage and the horizontal gluteal fold (Fig 1).
The RTF considered definitions based on time with pain, days with pain, severity of pain, and varying durations of pain. Minimal durations of 3 months or 6 months were considered, as was the problem of intermittent symptoms.
The RTF concluded that 2 questions should define chronicity (Questions 1 and 2 in Fig 1): (1) “How long has back pain has been an ongoing problem for you?” and (2) “How often has low back pain been an ongoing problem for you over the past 6 months?” A response of “greater than 3 months” to question 1 and a response of “at least half the days in the past 6 months” to question 2 would define cLBP. A patient with pain on at least half the days in the past 6 months would have accumulated at least 3 months' worth of pain days, and the Task Force concluded that this would be the recommended definition. It was decided that pain severity would not be included in the definition of cLBP.
Recommendation 2. Stratify Chronic Low Back Pain by Impact
The RTF overwhelmingly agreed that neither adequate data nor resources were available to offer a new pathoanatomic or pathophysiologic subclassification of cLBP that was clearly superior to those currently available. Rather, the RTF recommended stratification of cLBP by the personal impact of low back pain. “Impact” was defined as a combination of pain intensity, pain interference with normal activities, and functional status, using 9 items of the 29-item PROMIS short form (marked with asterisks in Fig 1). These items have substantial research support to validate their discriminatory and prognostic importance.13,19,31,38,39,43,47–49,53,54,57–60,66–68,73,76,80,82,85,86,97,106,108,110,112–116,122
This stratification of cLBP by impact would be appropriate whether or not there appears to be contributory degenerative pathoanatomy. Even when pathoanatomic conditions are thought to contribute to symptoms and dysfunction, they often coexist and overlap, and sometimes fail to respond to specific interventions. Thus, the stratification of impact seems to be a useful addition to but not a substitute for pathoanatomic, physiologic, or symptomatic classification.
After considerable discussion about formal prognostic scales for stratification, such as the Subgroups for Targeted Treatment (STarT) Back instrument,60 the RTF decided that there remained substantial uncertainty about generalizability to subspecialty patients and older adults. Thus, the RTF recommended further research in this area and included several items from the STarT Back instrument in the minimum dataset, but chose not to require them for stratification purposes.
The recommended RTF Impact Stratification approach uses the raw PROMIS scores with the usual scoring of the Physical Function items reversed. Thus, for each item in the Impact Stratification, a score of 1 is least severe, and 5 most severe. The exception is the single item on pain intensity, which ranges from 0 (least severe) to 10 (most severe). Thus, scores on the 9 PROMIS-based items yielding Impact Stratification range from 8 (least impact) to 50 (greatest impact). Items in Fig 1 with an asterisk comprise the Impact Stratification score.
Because the proposed impact score is a novel combination of 3 constructs (pain intensity, interference, and function), the RTF undertook a preliminary assessment of its validity and performance with the assistance of PROMIS investigators. The validation used existing PROMIS data from a group of patients with LBP, with or without leg pain, who underwent epidural steroid injections. This analysis was covered by an existing institutional review board approval from the University of Washington. Given the intervention, an improvement in average functional scores was expected.
The sample included 218 patients with a mean age of 54 years; 56% were females. There were 41% employed full or part time, 22% retired, and 12% receiving disability compensation, with the remainder being homemakers, students, or unemployed. The racial mix included 87% white, 3.8% African American, 4% American Indian, and 5% Asian or Pacific Islander. There were 46% with a college or more advanced degree, and 5% with less than a high school diploma.
The dataset included legacy measures of back pain–related physical function: the Roland and Morris Disability Questionnaire and the Oswestry Disability Index (collected at baseline only). The RTF Impact Stratification showed strong correlations with legacy measures. Furthermore, score changes on the RTF Impact Stratification correlated more strongly with patient satisfaction at follow-up than did change on the Roland-Morris score (Table 4).
Performance of the Research Task Force Impact Stratification Among 218 Subjects Undergoing Epidural Steroid Injections
In this rather severely affected sample, baseline RTF Impact scores were almost equally distributed among mild, moderate, and severe impacts. Although the cutoffs used in Table 4 for mild, moderate, and severe scores were deemed potentially useful by the RTF, they are relatively arbitrary. Simply reporting actual scores is recommended, along with any categorization that investigators may choose.
As expected, scores on the Impact Stratification measure for this sample improved over time. Measures of effect size and standardized response mean for the 170 patients available for 3-month follow-up suggested that the RTF Impact Stratification was more responsive than the Roland-Morris Disability Questionnaire (Table 3).
The task force found the results encouraging but acknowledged that the analyses reported reflect only an initial assessment. As suggested in the recommendations below for future research, further assessment of the reliability, validity, and clinical utility of this stratification strategy is a high priority.
Recommendation 3. Report a Minimum Dataset
A minimum dataset is recommended for describing individuals participating in all research studies on cLBP (Fig 1); the minimum dataset includes items of demographics, medical history, and self-report of symptoms and function.
Medical History, Physical Examination, Diagnostic Testing. In the survey of RTF members regarding items for a minimum dataset, the most highly ranked items of medical history and examination included demographics, involvement in workers' compensation or legal claims, work status, education, various measures of comorbidity, and previous treatment history. For many of these measures, the RTF adopted the format of the Common Data Elements system implemented by the NINDS (http://www.commondataelements.ninds.nih.gov).
The key comorbid conditions were judged to be smoking status, obesity, substance abuse, and widespread pain symptoms. The two-item conjoint scale (TICS) was judged to be an adequate and suitably brief screen for substance abuse.18 The key items of treatment history were thought to be history of surgical interventions and use of opioid analgesics.
Measures from the physical examination ranked lower than items of medical history. However, the most highly ranked of these were straight leg raising for patients with leg pain; hip internal rotation as a screen for hip arthritis (a potential cause of LBP); and lower extremity strength. There was general agreement that such physical examination items could be reserved for studies of invasive interventions (straight leg raising and lower extremity strength) or of older adults (hip examination). Thus, for example, physical examination measures would not be required of all epidemiologic studies.
No laboratory or imaging tests were highly ranked, because of the widely recognized weak association between degenerative spine changes on imaging and patient symptoms or function.26 However, magnetic resonance imaging (MRI) was considered the most valuable of potential tests, and there was agreement that it should be required in studies of surgical interventions.
Self-report of Functional Status, Psychosocial Factors, and Mood Disturbance. With regard to other self-report measures, there was discussion first about the domains to be included, then the potential sources of items, and then the desirable number of items. The key domains were judged to be physical function, depression, sleep disturbance, and catastrophizing. The task force felt that these constructs were important for a wide range of patients with chronic back pain, with or without specific pathoanatomic diagnoses. For parsimony, other important constructs such as anxiety, fatigue, and satisfaction with social role were considered but not included in the minimum dataset.
Although the minimum dataset in Fig 1 is recommended for inclusion in all NIH-funded research on cLBP and is available for use by all researchers, the RTF did not in any way intend to constrain the scope of investigators' proposed scientific inquiries. On the contrary, the RTF believes that the minimum dataset represents a major advance toward standardization of research reporting by asking researchers to include, at a minimum, a set of items that evidence supports as critical to scientifically advancing our understanding of cLBP.
After considering several potential instruments for assessing these domains, the RTF concluded that the short-form PROMIS measures1 offered the best trade-off of length with psychometric validity for a minimum dataset. Therefore, it recommended use of the relevant scales from the 29-item PROMIS short-form, which includes 4 items for each domain. Investigators and patient samples with access to computer adaptive testing could use the entire PROMIS item bank to measure the domains included on the PROMIS 29 Profile version 1.0, an acceptable or even preferable alternative.22
There was agreement that it would be acceptable if investigators preferred well-validated, lengthier legacy measures of these domains. For example, if investigators wanted more extensive legacy measures of physical function, they might substitute the Oswestry or Roland-Morris disability scale for the PROMIS physical function items. If they wanted legacy measures of depression, they might substitute the Patient Health Questionnaire (PHQ-9)76 or Beck Depression Inventory.12 In Fig 1, we have labeled the PROMIS constructs to facilitate such substitution if desired, though investigators may wish to remove the labels when using the dataset. If such substitutions are made, all the other recommended domains should still be assessed.
Investigators may find it useful to consult the website PROsetta Stone, supported by NCI-funded investigators at Northwestern University (www.prosettastone.org).2 This website provides a “cross-walk” between scores on the PROMIS measures and scores on several “legacy” measures, such as the Brief Pain Inventory,31 the Center for Epidemiologic Studies Depression Scale (CES-D),90 the PHQ-9,77 and the Short Form-36.120 The resulting proposed minimum dataset is presented in Fig 1. PROMIS items are identified with a superscript 1, and STarT Back items (or very similar items) are identified with a superscript 2.
The RTF was able to obtain institutional review board approval at Stanford University (RTF member Sean Mackay, principal investigator) to conduct an internet survey of back pain patients using the RTF recommended version of the Minimum Dataset. This cross-sectional sample was distinct from the patients described above for validity testing, who underwent intervention and follow-up. There were 221 participants recruited from the San Francisco Bay Area using high-visibility ads. Participants had a mean age of 46.2 years (range, 19–81), with 53% female subjects. Participants included 72% whites, 17% Asians, 7% African Americans, and 3.8% each of American Indians and Pacific Islanders. There were 52% with at least a bachelor's degree and only a single participant with no high school diploma. Thirty-nine percent were employed, 5% were retired, and 16% described themselves as disabled. Thirty-eight percent described leg pain in addition to back pain, and the mean pain intensity (on a 0–10 scale) was 5.5. In this sample, the median time-to-completion was 7 minutes, and 75% of subjects completed the questionnaire in less than 10 minutes.
Proposed Supplemental Data for Specific Situations. For studies of invasive therapies such as spine surgery, the RTF recommended that physical examination and imaging data be added to the minimum dataset. Straight leg raising, lower extremity reflexes, and lower extremity strength as indicators of radiculopathy were recommended as a minimum physical examination. Lumbar MRI was recommended in such studies as the minimal imaging evaluation.
In older adults, there is increased likelihood of hip osteoarthritis contributing to low back pain. Thus, for studies of adults mainly over age 65, the task force recommended testing internal hip rotation to help screen for potential osteoarthritis. A screen for cognitive function also may be important in such studies, as dementia may impair the validity of assessments or of consent for research.
In studies focused on behavioral or mood correlates of cLBP, the RTF recommended that investigators be free to incorporate additional measures. These might include, for example, assessment of emotional status, physical function and pain behaviors, substance abuse, interpersonal violence, or quality of life relevant to specific study interests. Such measures should have published reliability, validity, and responsiveness data at least equal to those of the minimum dataset's PROMIS short-form items. These additional measures should have population-based normative data to be included when relevant. The IMMPACT statement can be recommended as a starting point for selection of desired supplemental measures.40
Recommendation 4. Outcome Measures
Investigators are referred to earlier consensus documents on outcome measures.16,35,40 However, the RTF recommends reporting a “responder” analysis in addition to mean scores of outcome measures.
The RTF recognized that many parts of the baseline minimum dataset, such as the PROMIS measures, were highly appropriate as outcome measures, remembering that the initial focus of the NIH PROMIS effort was on patient-reported outcomes. It was also recognized that the primary outcomes of clinical studies would vary, depending on study aims. For example, some might focus on pain relief, whereas others might focus on return to work, physical function, mood, or need for subsequent therapy. Thus, the RTF did not make a recommendation regarding a minimum outcome dataset beyond recommending consideration of the minimum dataset for standardized recording of both baseline assessment and outcomes evaluation. Investigators are referred to earlier consensus statements on outcome measures for studying chronic pain in general or back pain in particular.16,35,40
Reporting of Outcomes. An important discussion centered on reporting of outcomes. There was general agreement that for (at least theoretically) continuous measures, such as pain or function, in addition to mean scores and score changes, the proportion of participants achieving certain thresholds also should be reported. For example, the proportion of participants achieving a prespecified minimum clinically important change might be reported. Investigators have proposed minimally important differences in PROMIS short forms, at least in the context of cancer therapy.123 Calculating the percentage of study participants who achieve such landmarks is referred to by the U.S. Food and Drug Administration (FDA) as a “responder” analysis.84
For example, other expert panels have suggested that a 30% improvement in pain or function might be a clinically important difference, and recommended reporting the proportion of participants with this degree of improvement.46 Statistical analysts have suggested potential problems with the use of percentage changes,111 but the approach has clinical appeal. One might alternatively specify a certain number of points as the relevant change, or the percentage of participants reaching some threshold pain level (eg, pain score below 3 out of 10).
An attractive option to the RTF was reporting the “cumulative distribution function” of responses for the treatment and control groups. This is a continuous plot of the proportion of patients at each scale score who experience change at that level or better. This amounts to calculating the percentage of responders at each value of the outcome score. This approach acknowledges the lack of consensus on the approach for establishing a responder threshold and provides information for any given threshold.84
Composite Outcome Measures. The RTF also discussed the potential for use of composite outcome measures. One member noted that it is common in studies of osteoarthritis to require improvement in pain score and functional status and global self-assessment before judging treatment successful. Similar combinations have been proposed for evaluating back pain.17,102
Composite measures are often required in FDA trials for drug or device approval. For example, “success” in trials of artificial disc replacement required functional improvement of 15 points on the Oswestry scale, improvement in quality of life on the Short Form-36, proper radiographic placement, and absence of new neurologic deficits or revision surgery.124 Such composites offer the potential advantage of defining success in terms that are clearly clinically important, and not merely statistically significant.
However, the RTF concluded that with the paucity of data on performance of such composite measures for low back pain, it could not make a recommendation about composite outcome measures. Instead, this was recommended as an important topic for future research.
Time Frames for Outcome Measures. The RTF chose not to make specific recommendations for timing of outcome assessments because appropriate timing would vary depending on an intervention. For some treatments (eg, analgesics or spinal manipulation), the goal may be short-term relief. For others, such as surgery, the goal more often is long-term relief. For studying patients with chronic pain, longer-term follow-up (eg, at least 6–12 months) is generally preferred.
Adverse Events. Reporting of adverse events was recognized as an important outcome measure. Because the likely adverse events vary enormously with the nature of an intervention, the RTF did not make recommendations for reporting specific adverse events. There was general agreement that for most intervention studies, it would be desirable to specify certain adverse events in advance and measure them prospectively, along with open-ended reporting of unanticipated events.
Recommendation 5. Research on the Proposed Standards
The RTF recommended new research to improve prognostic stratification of patients with cLBP; refine and test composite outcome measures for increasing the clinical importance of study results; undertake patient stakeholder assessment of relevant outcomes; and further evaluate psychometric properties of the minimum dataset.
Because the measures in the minimum dataset will often not comprise the sole measures used in a study, their widespread use will not only provide researchers a standardized set of data but also provide accumulating evidence for (or against) the reliability, validity, and clinical utility of the RTF recommendations. The potential for such an iterative approach to reevaluate scientific measures of chronic pain was successfully modeled in developing research diagnostic criteria for TMD. An iterative scientific process has successfully evolved the next generation of evidence-based measures for diagnosing and classifying the most common subtypes of TMD, including physical, behavioral, and psychosocial domains.99
Beyond viewing the present set of recommendations as appropriate topics for future research, the RTF identified several related knowledge gaps that limit our ability to define and classify critical domains and variables. These were seen as important topics for which further research should be encouraged.
Prognosis. Improving prognostic stratification of patients with cLBP is important clinically to help guide the nature and intensity of therapy, and important for researchers to adjust for confounding and to improve comparability among studies. Recent work such as the STarT Back project from the United Kingdom has made important advances in this regard,57–60 and others have systematically reviewed risk factors for the emergence of chronic back pain.30 However, the generalizability of such studies to interventions and populations outside primary care remains uncertain. Other approaches may be important for specific populations or for predicting specific treatment outcomes. Additional work in this area might improve the ability to characterize clinically important subgroups of patients with cLBP and improve our “impact stratification.”
Composite Outcome Measures. An ongoing frustration has been the seeming lack of progress in reducing back-related disability at a population level. In part, this may be a result of claiming treatment efficacy based on statistically significant but clinically trivial results. More work is needed to understand how certain outcome scores are associated with major events, such as return to work. Composite outcome measures, such as requiring simultaneous improvement in pain, function, and global self-assessment, may move us closer to important outcomes. However, more data are needed to determine the performance of such measures in terms of validity, reliability, responsiveness, and prognostic value.
Patient Stakeholder Assessment. Little work has addressed the outcomes judged most important by patients with cLBP. Such outcomes may vary with demographic features and diagnosis.
Psychometric Properties of the Proposed Minimum Dataset. Extensive effort has been made to validate the PROMIS measures,5,7,21,51,55,72,89,92,95 but there is modest information on their performance specifically in the context of cLBP. One recent study suggested excellent performance of the PROMIS physical function item bank among patients with back and neck problems.63 Further data on the precision of the domains is important (eg, the optimal number of items), as are data on responsiveness to change and sensitivity to small differences. Creating a “cross-walk” of scores with legacy measures, such as the Oswestry and Roland-Morris disability questionnaires, is also important.
Recommendation 6. Dissemination of the Report of the NIH Task Force on Research Standards for Chronic Low Back Pain
With adoption of recommendations by the NIH Pain Consortium, the RTF recommends dissemination to the broad research community, including publication of a report in multiple professional journals and presentations at professional meetings.
The NIH Pain Consortium has accepted the RTF report (to view the full NIH-approved RTF report on Standards for Research on Chronic Low Back Pain, see painconsortium.nih.gov). The consortium is recommending that all NIH institutes and centers require grant applications proposing clinical studies of cLBP to use the research standards set forth in the RTF report. Similarly, NIH encourages all other agencies that fund research on cLBP to consider incorporating these research standards for their respective awardees or investigators, as appropriate. The RTF proposed to disseminate these recommendations in professional journals and presentations at scientific meetings.
Discussion
Consistent with its charge from NIH, the RTF strove to recommend standards for conducting research into the complex, intertwined factors that influence the onset, natural history, and clinical course of cLBP. This remains one of the most important and costly of all public health conditions affecting the U.S. population. As adopted by NIH, these recommendations have the potential to standardize methods for identifying cLBP research cases, describing research subjects, and comparing published reports.
The new research standards should improve the comparability of research studies on cLBP, facilitate pooling data from multiple studies (eg, for meta-analyses), and improve the ability to define phenotypes among patients with low back pain. These standards will allow comparable core summary statistics to be included in all published reports without interfering with collection of specific measures needed to address specific research questions.
After extended review and discussion, the RTF concluded that at the current state of scientific evidence on cLBP, it was not realistic to create operationally defined research diagnostic criteria for subsets of cLBP. While creation of research diagnostic criteria has proven beneficial to research for some other conditions (eg, TMD99 and Alzheimer's disease98), the multifactorial nature of most cases of cLBP decreased enthusiasm for attempting to do so in this condition. However, creation of an impact stratification and a uniform minimum dataset will achieve many of the same goals.
In summary, the RTF has recommended a definition of cLBP and has proposed classifying it in terms of its impact, in addition to any presumed pathoanatomic diagnosis. Impact is conceived as a combination of pain intensity, interference with activities, and physical function. The RTF also has recommended a uniform minimum dataset, with recommendations for medical history, physical examination, diagnostic tests, and self-report measures of physical function, depression, and sleep disturbance, in addition to pain intensity and interference. Finally, recommendations have been made for reporting patient outcomes, further research, and dissemination of the recommendations.
Any effort to standardize research methods is only a starting point for further testing and refinement. The final recommendations were seen as a first step toward creating standards for research in cLBP. We anticipate that further validation, refinement, and possible extension of these recommendations will require years and the efforts of many investigators. Nonetheless, the RTF believes these recommendations can advance the field, help to resolve controversies, and facilitate future research addressing the prevalence and incidence and genomic, neurologic, and other mechanistic substrates of cLBP. Furthermore, it can help reveal the biologic-behavioral interfaces that confound our present-day understanding of cLBP and its evidence-based management.
It is anticipated that the RTF recommendations will become a dynamic document, and that the proposals are likely to undergo continual improvement. The proposed research agenda should facilitate this evolution.
- © 2014 by the American Pain Society. Reprinted from: Deyo RA, Dworkin SF, Amtmann D, et al. Report of the NIH task force on research standards for chronic low back pain. J Pain. 2014;15(6):569–585, with permission from Elsevier Inc/American Pain Society.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.↵
- 108.↵
- 109.↵
- 110.↵
- 111.↵
- 112.↵
- 113.↵
- 114.↵
- 115.↵
- 116.↵
- 117.↵
- 118.↵
- 119.↵
- 120.↵
- 121.↵
- 122.↵
- 123.↵
- 124.↵
- 125.↵