Measurement Properties of the Quebec Back Pain Disability Scale in Patients With Nonspecific Low Back Pain: Systematic Review
- Caroline M. Speksnijder,
- Tjarco Koppenaal,
- J. André Knottnerus,
- Mark Spigt,
- J. Bart Staal and
- Caroline B. Terwee
- C.M. Speksnijder, PT, PhD, Physical Therapy Science, Program in Clinical Health Sciences, University Medical Center Utrecht, Intern Post G05.122, PO Box 85.500, 3508 GA Utrecht, the Netherlands; Department of Oral-Maxillofacial Surgery, Prosthodontics and Special Dental Care, University Medical Center Utrecht, Utrecht, the Netherlands; and Radboud University Medical Center, IQ Healthcare, Radboud Institute for Health Sciences, Nijmegen, the Netherlands.
- T. Koppenaal, PT, MSc, Department of Allied Health Professions, Fontys University of Applied Sciences, Eindhoven, the Netherlands.
- J.A. Knottnerus, PhD, Department of General Practice, Care and Public Health Institute, Maastricht University, Maastricht, the Netherlands.
- M. Spigt, PT, PhD, Department of General Practice, Care and Public Health Institute, Maastricht University.
- J.B. Staal, PT, PhD, Research Group of Musculoskeletal Rehabilitation, HAN University of Applied Sciences, and Radboud University Medical Center, IQ Healthcare, Radboud Institute for Health Sciences.
- C.B. Terwee, PhD, Department of Epidemiology and Biostatistics, EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, the Netherlands.
- Address all correspondence to Dr Speksnijder at: C.M.Speksnijder{at}umcutrecht.nl.
Abstract
Background The Quebec Back Pain Disability Scale (QBPDS) has been translated into different languages, and several studies on its measurement properties have been done.
Purpose The purpose of this review was to critically appraise and compare the measurement properties, when possible, of all language versions of the QBPDS by systematically reviewing the methodological quality and results of the available studies.
Method Bibliographic databases (PubMed, Embase, CINAHL, and PsycINFO) were searched for articles with the key words “Quebec,” “back,” “pain,” and “disability” in combination with a methodological search filter for finding studies on measurement properties concerning the development or evaluation of the measurement properties of the QBPDS in patients with nonspecific low back pain. Assessment of the methodological quality was carried out by the reviewers using the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) checklist for both the original language version of the QBPDS in English and French and all translated versions. The results of the measurement properties were rated based on criteria proposed by Terwee et al.
Results The search strategy resulted in identification of 1,436 publications, and 27 articles were included in the systematic review. There was limited-to-moderate evidence of good reliability, validity, and responsiveness of the QBPDS for the different language versions, but for no language version was evidence available for all measurement properties.
Conclusion For research and clinical practice, caution is advised when using the QBPDS to measure disability in patients with nonspecific low back pain. Strong evidence is lacking on all measurement properties for each language version of the QBPDS.
One of the leading causes of disability worldwide is low back pain (LBP). Most of the time, LBP is benign and self-limiting and can be considered as nonspecific LBP, as no specific musculoskeletal pathology is found.1–3 It occurs in similar proportions in all cultures, interferes with quality of life and work performance, and is the most common reason for medical consultation.4,5
To measure the construct of disability in patients with LBP, several self-report back-specific questionnaires have been developed. They are recommended by the World Health Organization as instruments to evaluate the efficacy of treatments.4 Two of the most commonly investigated questionnaires are the Roland-Morris Disability Questionnaire (RMDQ)6–10 and the Oswestry Low Back Pain Disability Index (ODI).11–15 However, previous systematic reviews on available questionnaires to measure disability in patients with LBP indicate that the Quebec Back Pain Disability Scale (QBPDS)16–18 is another well-validated and often recommended questionnaire.17,19–21 The QBPDS also is commonly used in randomized controlled trials.20,22–24
The QBPDS (Appendix) was developed in 1995 in English and French.16,17,25,26 Contrary to the RMDQ and ODI, the QBPDS is based on a conceptual model of disability.16,17,27 The developers of the QBPDS used the World Health Organization's definition of disability as “any restriction or lack of ability to perform an activity in a manner or within the range considered normal for a human being.”28,29 Disability was operationally defined in terms of difficulty experienced while performing simple tasks.17,30,31 During the development of the QBPDS, factor analysis of 46 items showed that the QBPDS had a 6- or 7-factor structure, with 53% of the variance explained by the first factor.17 The decision to include 20 items in the final instrument was based on item analysis and practical considerations, which resulted in a 6-factor structure.17 These 20 items represented 6 correlated factors, which are selected and based on the following requirements: (1) all types of physical activities relevant to back pain should be represented, including bed/rest, sitting/standing, ambulation, movement, bending/stooping, and handling large or heavy objects; and (2) the QBPDS should be highly reliable and discriminative over a wide range of disability levels, while also being practical and acceptable to both patients and clinicians.17
The 20 QBPDS items are scored on a 6-point scale (0=“not difficult at all,” 5=“unable to do”). The total score is calculated by a summation of the scores for each item and ranges from 0 (“not being disabled”) to 100 (“being maximally disabled”).16,17
The QBPDS has been translated into different languages and adapted to different cultures. Studies have been performed on its measurement properties in these different adapted language versions.32,33 A systematic review on this topic could be useful because a review on cross-cultural adaptations of the McGill Pain Questionnaire showed there is often limited evidence for the measurement properties of translated or adapted language versions. Therefore, the results from translated questionnaires should be interpreted with caution.34,35 For the QBPDS, such a review has not yet been undertaken.
Studies of high methodological quality are needed to guarantee appropriate conclusions about measurement properties. The COSMIN checklist was developed to appraise the methodological quality of studies on measurement properties of health status questionnaires.36,37 The purpose of this review was to critically appraise and compare the measurement properties, when possible, of the different language versions of the QBPDS for measuring disability in patients with nonspecific LBP by systematically reviewing the methodological quality and results of the available studies.
Method
Search Strategy
The following computerized bibliographic databases were searched up to September 18, 2014: PubMed (1966–2014), Embase (1974–2014), CINAHL (EBSCOhost) (1981–2014), and PsycINFO (OvidSPhost) (1806–2014). The databases were searched with the key words “Quebec,” “back,” “pain,” and “disability” in combination with a methodological search filter for finding studies on measurement properties (eAppendix 1).38 Reference lists were screened to identify additional relevant studies.
Selection Criteria
Two reviewers (T.K., M.S.) independently assessed titles, abstracts, and reference lists of the studies retrieved by the literature search. Only full-text original articles were included, primarily concerning the development or evaluation of the measurement properties of the QBPDS. Articles in all languages were included.
For inclusion, the QBPDS had to be evaluated in adult patients (≥18 years of age) with general, nonspecific LBP. Studies in patients with sciatica without any reference to a specific cause were included as well. Studies in patients with sciatica due to a specific cause (eg, nerve root comprise) or LBP due to specific causes (eg, neurological disorder, ankylosing spondylitis, fracture) were excluded. There was no minimum sample size for inclusion.
In case of disagreement between the 2 reviewers, a third reviewer (C.B.T.) made the decision regarding inclusion of the article. Both primary reviewers (T.K., M.S.) are senior physical therapists and scientists, and the third reviewer (C.B.T.) is a senior epidemiologist, which made this an optimal team for selecting articles for this review.
Quality Assessment
Assessment of the methodological quality of the included studies was carried out using the COSMIN checklist.36,39 The COSMIN checklist consists of 9 boxes with methodological standards for how each measurement property should be assessed. Each item in a box can be scored on a 4-point scale (ie, “poor,” “fair,” “good,” or “excellent”), which is an additional feature of the COSMIN checklist.40 An overall score for the methodological quality of a study was determined by taking the lowest rating of any of the items in the 9 boxes. None of the studies used item response theory (IRT), so the IRT box was not used.
Data extraction and assessment of (methodological) quality were independently performed by 2 reviewers (T.K. and C.B.T. for 17 of the included articles* and C.M.S. and C.B.T. for 10 of the included articles†). In case of disagreement, a third reviewer made the decision (C.M.S. for data extraction and quality assessment performed by T.K. and C.B.T. and T.K. for data extraction and quality assessment performed by C.M.S. and C.B.T.). Two reviewers (C.M.S. and C.B.T.) are senior epidemiologists and, therefore, trained in psychometrics. One reviewer (C.B.T.) is one of the developers of the COSMIN checklist, and the other reviewers (C.M.S. and T.K.) were trained by the COSMIN team on quality appraisal and data extraction.
Measurement Properties
The measurement properties are divided over 3 domains: reliability (including internal consistency, reliability, and measurement error), validity (including content validity, construct validity [ie, structural validity, hypotheses testing, and cross-cultural validity], and criterion validity), and responsiveness. Hypotheses testing was done for the original version of the QBPDS developed by Kopec and colleagues16,17 by correlating the QBPDS with the RMDQ, ODI, Medical Outcomes Study 36-Item Short-Form survey (SF-36), and pain rated on a visual analog scale (VAS-pain), so we extracted data on the correlations of the QBPDS with these instruments for all language versions. Also related to pain, data on the correlation of the QBPDS with the Numeric Rating Pain Scale (NRPS) were included.
Part of cross-cultural validity testing concerns translation. The quality of the translation was determined by using items 4 to item 11 of the COSMIN cross-cultural validity box.
There is no gold standard for health status questionnaires available. Consequently, no level of evidence related to criterion validity can be determined for the QBPDS. The measurement properties and interpretability have been defined and discussed in detail elsewhere.46,47 Interpretability is not a measurement property, but rather an important characteristic of a measurement instrument.46
Data Synthesis, Levels of Evidence, and Meta-analyses
When the quality of the translation was at least fair, we determined the quality of the measurement properties by applying levels of evidence, as defined in Table 1. The possible overall rating for a measurement property is “positive,” “indeterminate,” or “negative,” accompanied with a level of evidence (“strong,” “moderate,” “limited,” “conflicting,” and “unknown”). To give a positive or negative rating for the results of the measurement properties, criteria for good measurement properties were used, based on criteria proposed by Terwee et al48 (Tab. 2).
Levels of Evidence for Summary Statements on Measurement Property Based on Overall Quality59
Quality Criteria for Measurement Properties (Based on Terwee et al48)a
Meta-analyses were not performed because there were no more than 2 studies per measurement property per language version. Moreover, it is more important to evaluate whether the results are above a defined cutoff (eg, ICC>.70) than to estimate the exact pooled value of the parameter.
Results
General
The search strategy resulted in 1,436 unique publications, of which 32 articles were selected based on title and abstract. Based on the full text of these articles, 5 articles49–53 were excluded, mainly because the articles were not about the development or evaluation of the measurement properties of the QBPDS.
Reference tracking did not result in additional articles. Finally, 27 articles‡ were included (Fig.).
Flowchart of search and selection. QBPDS=Quebec Back Pain Disability Questionnaire.
The general characteristics of these studies are presented in Table 3. One study resulted in 2 publications and, therefore, is mentioned only once in Tables 3 and 4.16,17 Rater scores for each criterion within each COSMIN quality appraisal boxes summarized in Table 4 can be requested from the first author.
Characteristics of Included Studiesa
Methodological Quality of Each Study per Measurement Property (COSMINa Checklist36)
Methodological Quality
Overall, the methodological quality of the studies was fair (Tab. 4). The reviewers (T.K., C.B.T., C.M.S.) came to an agreement on all quality assessments. Only 2 studies5,43 adequately described the measurement properties of the QPBDS. Twenty-four studies§ did not describe how missing items were handled, and 20 studies‖ did not describe the number or percentage of missing responses. Three studies25,42,43 had an inadequate sample size (<50).
The QBPDS was translated into Palestinian Arabic,44 Moroccan Arabic,45 Chinese,31 Dutch,33 French,17 Greek,5 Hungarian,37 Korean,35 Persian,25 Polish,24 Brazilian Portuguese,27 European Portuguese,7 Tswana,9 and Turkish.14,21,41 The original English and French versions of the QBPDS developed by Kopec and colleagues16,17 were used in 3 studies8,15,18 and 2 studies,42,43 respectively. The Dutch version was used in 5 studies.10,23,26,29,33 The European Portuguese version was used in 2 studies.7,30 All 3 studies using a Turkish version14,21,41 used different translations. All other language versions were used in only 1 study.#
The results per measurement property of the QBPDS are discussed below. The results from studies of poor methodological quality (Tab. 4 and eAppendix 2) are not mentioned because they may be biased. The data synthesis of the results and accompanying levels of evidence is presented in Table 5.
Data Synthesis48: Levels of Evidence of Overall Quality of the QBPDS Measurement Properties per Languagea
Reliability
Internal consistency.
Sixteen studies** assessed the degree of the interrelatedness among the items of the QBPDS, expressed by Cronbach α. The studies using the English-French,16,17 Hungarian,37 and European Portuguese7 language versions were of good quality, and the studies using the Greek5 and one of the Turkish21 language versions were of fair quality (Tab. 4). All 5 studies had positive results. The unidimensionality of the QBPDS was confirmed for the European Portuguese version by showing one predominant common factor explaining 52.1% of the variance.7 The Cronbach α for the whole scale in these 5 studies ranged from .895 to .96.5,16,17
We found moderate evidence for positive internal consistency of the English, French, Hungarian, and European Portuguese language versions of the QBPDS and limited evidence for positive internal consistency of the Greek and Turkish language versions.
Reliability.
Reliability (proportion of the total variance in the measurements that is due to “true” differences among patients) was evaluated in 17 studies.†† All of these studies evaluated the test-retest reliability of the QBPDS. One of these studies also conducted an interrater reliability study, as the QBPDS was administered twice by 2 different researchers.27 Eleven studies‡‡ used an appropriate time interval (2–14 days) between the first and second QBPDS administrations. The reliability studies concerning the Dutch,33 English,18 Brazilian Portuguese,27 and European Portuguese7 language versions were of good quality. The studies on the Palestinian Arabic,44 Moroccan Arabic,45 Chinese,31 English,8 Greek,5 Hungarian,37 Persian,25 Polish,24 Tswana,9 and Turkish21 language versions were of fair quality. All of these studies§§ had positive results and showed ICCs for test-retest reliability ranging from .707 to .99.31 The study on the Brazilian Portuguese language version27 was of good quality and evaluated both interrater and intrarater reliability. The ICCs were .96 and .93, respectively. The study on the Palestinian Arabic language version44 was of fair quality and showed weighted kappa values of .86 and .98 for the 2 different time points in that study.
We found moderate evidence for positive reliability for the Dutch, English, Brazilian Portuguese, and European Portuguese language versions of the QBPDS and limited evidence for positive reliability for the Palestinian Arabic, Moroccan Arabic, Chinese, Greek, Hungarian, Persian, Polish, Tswana, and Turkish language versions.
Measurement error.
The measurement error consists of the systematic and random error of a patient's score, which is not attributed to true changes in the construct of disability. Measurement error is calculated using data from a test-retest reliability study and is expressed in the unit of measurement of the scale (number of points on the QBPDS).47 Measurement error was evaluated in 12 studies.‖‖ Seven studies18,27,31,33,35,37,45 used an appropriate time interval (2–14 days) between the first and second administrations of the QBPDS. The studies concerning the Dutch,33 English,18 and Brazilian Portuguese27 language versions were of good quality. In the study on the English language version,18 the smallest detectable change (SDC; 1.65 × √2 × standard error of measurement [SEM]) was 11 points (14.5% scored below 11 points, and 0.66% scored above 89 points). In the study on the Dutch language version,33 the limits of agreement (LOA) ranged from −15.6 to 16.4. In the study on the Brazilian Portuguese language version,27 the intraobserver LOA ranged from −1.4 to 2.8 points, and the interobserver LOA ranged from −1.6 to 2.7 points.
In 5 fair-quality studies8,10,29,30,37 (2 concerning the Dutch version, 1 concerning the English version, 1 concerning the Hungarian version, and 1 concerning the European Portuguese language version), SDC also was determined. In the Dutch language version study by Demoulin et al,10 the SDC (1.96 × √2 × SEM) was 15.8 points. In the Dutch study by van der Roer et al,29 the SDC (1.96 × √2 × SEM) was 32.9 points (95% confidence interval [CI]=24.6, 49.8) in patients with acute or subacute LBP and 24.6 points (95% CI=19.9, 32.4) in patients with chronic LBP. In the English study,8 the SDC (1.96 × √2 × SEM) was 19 points (95% CI=15, 31) in patients with LBP. In the Hungarian study,37 the SDC (1.96 × √2 × SEM) was 14.4 points in patients with chronic LBP. In the study on the European Portuguese language version30 in patients with acute or subacute LBP, the SDC (1.65 × √2 × SEM) was 19 points. In 2 other fair studies (Moroccan Arabic45 and Chinese31), LOA also were reported. The intraobserver LOA of the Moroccan Arabic language version45 ranged from −19.3 to 20.7 points. The intraobserver LOA of the Chinese language version31 ranged from −17.1 to 18.1 points.
We found moderate evidence for a negative measurement error for the Dutch language version. However, for the Moroccan Arabic, Chinese, English, Hungarian, and Brazilian Portuguese language versions of the QBPDS, we could not perform a best evidence synthesis because of the lack of a minimal important change (MIC).
Validity
Content validity (including face validity).
The aim of the study by Kopec et al17 was to develop a new scale of functional disability associated with back pain. All 20 items of the QBPDS reflect disabilities in performing activities in LBP well (Appendix). The studies on the Palestinian Arabic44 and Greek5 language versions concluded that the QBPDS assesses the intended construct. In particular, patients agreed that the scale seemed to be a reasonable test for evaluating the functional disability of patients with LBP. In all other studies, it was not reported whether patients agreed that the scale appeared to be a reasonable test for evaluating the functional disability of patients with LBP.
Three studies5,17,43 evaluated the degree to which the content of the QBPDS is an adequate reflection of construct disability. Two studies (English-French16,17 and Greek5) were of at least fair quality. From the study on the development of the QBPDS16,17 and the study using the Greek version5 of the QBPDS, it can be deduced that all items were considered relevant to measure the construct disability in a population of patients with LBP, as rated by experts and patients, and no important items were missing. We found limited evidence for positive content validity for the English, French, and Greek language versions of the QBPDS.
Construct validity (including structural validity, hypothesis testing, and cross-cultural validity).
To assess structural validity, 3 methodologically good studies (English-French,16,17 Hungarian,37 European Portuguese7) and 1 fair study (Greek5) assessed whether the scores of the QBPDS are an adequate reflection of the dimensionality of construct disability. The studies on the English-French16,17 and the European Portuguese7 language versions suggested that the 20-item scale could be considered approximately unidimensional. The study on the Hungarian language version37 showed a 4-factor structure of the 20 items. The studies on the original English-French version16,17 and the Greek language version5 showed a 6-factor structure of the 20 items.
We found moderate evidence for positive structural validity for the English, French, Hungarian, and European Portuguese language versions of the QBPDS and limited evidence for positive structural validity for the Greek language version of the QBPDS.
Twenty studies## tested hypotheses of the relation between the QBPDS and other measurement tools. However, in 17 studies,*** hypotheses were vaguely or not described. Only 12 of the 20 studies††† adequately described the construct of the comparator instrument. Two studies (English18 and European Portuguese7) were of good quality, and 7 studies (Chinese,31 Dutch,26 Greek,5 Hungarian,37 Persian,25 Brazilian Portuguese,27 and Turkish21) were of fair quality. In the studies on the Dutch,26 Greek,5 Persian,25 Brazilian Portuguese,27 and European Portuguese7 versions, the relation between the QBPDS and the RMDQ was tested. The correlations between the QBPDS and RMDQ ranged from .6025,26 to .85,27 as supposed. The relation between the QBPDS and the ODI was tested in the studies using the Chinese,31 Dutch,26 Hungarian,37 Persian,25 and Turkish21 language versions. The correlations between the QBPDS and ODI ranged, as supposed, from .6721 to .90.31 The studies using the English18 and Persian language versions25 showed correlations between the QBPDS and SF-36 ranging from .6418 to .69,25 as we expected. The studies using the Chinese,31 Hungarian,37 Brazilian Portuguese,27 European Portuguese,7 and Turkish21 language versions showed correlations between the QBPDS and VAS-pain ranging from .3721 to .87.31
Correlations with the VAS-pain were expected to be lower than the correlations with disability measures, and indeed the correlation with the VAS-pain was .37 in one study on the Turkish language version21 and .38 in the study on the European Portuguese language version.7 However, in the studies on the Chinese,31 Hungarian,37 and Brazilian Portuguese27 language versions, the correlations of the QBPDS with the VAS-pain were higher than expected (.62,37 .75,27 and .8731). Also, the correlation of the QBPDS with the bodily pain subscale of the SF-36 was .50 in the study on the English language version18 and .62 in the study on the Persian language version.25 As expected, low correlations were found between the QBPDS and the SF-36 mental health (.2518 and .4025) and role-emotional functioning (.2618 and .3725) subscales.
We found moderate evidence for positive construct validity for the English language version related to the SF-36. For the Tswana language version, we found moderate evidence for positive construct validity related to the RMDQ and negative construct validity related to the VAS-pain. Limited evidence was found for positive construct validity for the Chinese (ODI, VAS-pain), Dutch (RMDQ, ODI), Greek (RMDQ), Hungarian (ODI, VAS-pain), Persian (RMDQ, ODI, SF-36), Portuguese Brazilian (RMDQ, VAS-pain), and Turkish (ODI) language versions. However, we also found limited evidence for negative construct validity for the Persian and Turkish language versions related to VAS-pain.
For cross-cultural validity and translation, none of the included studies assessed whether the performance of the items on a translated QBPDS were an adequate reflection of the performance of the original QBPDS (cross-cultural validity) (eg, by using multiple group factor analyses of evaluating differential item functioning).
In 15 studies,‡‡‡ a translation of the QBPDS was described. The QBPDS was translated into Palestinian Arabic,44 Moroccan Arabic,45 Chinese,31 Dutch,33 French,17 Greek,5 Hungarian,37 Korean,5 Brazilian Portuguese,27 European Portguese,7 Persian,25 Polish,24 Tswana,9 and Turkish.14,21,41 The Korean,35 Persian,25 and Turkish21 translation studies were of excellent quality; the Brazilian Portuguese27 and Greek5 translation studies were of good methodological quality; and the Moroccan Arabic,45 Chinese,31 Hungarian,37 and Tswana9 translation studies were of fair methodological quality. We found no evidence for cross-cultural validity.
Criterion validity.
As stated in the Method section, no gold standard for health status questionnaires is available.
Responsiveness
Eight studies8,10,15–17,23,30,33,42 evaluated the ability of the QBPDS to detect change over time in the construct of disability. One study on the European Portuguese language version30 was of good quality and showed an area under the receiver operating characteristic (ROC) curve (AUC) of 0.74. In the European Portuguese language version,30 the AUC was interpreted as the probability of correctly discriminating between “clinically stable” (score≤4) and “clinically improved” (score≥5) patient outcomes, based on the change in scores on the Patient Global Improvement Change Scale (PGIC-PT; ordinal scale from 1 to 754). One study on the Dutch language version10 and 2 studies on the English language version8,15 were of fair quality and showed positive results, with AUCs of .85,10 .74,8 and .87.15 In the Dutch language version,10 the AUC was interpreted as the probability of correctly discriminating between “clinically stable” and “clinically improved” patient outcomes, using a change score (score≥6) and an unchanged score (score=3–5) of the following ordinal scale: 1=“worse than ever,” 2=“much worsened,” 3=“slightly worsened,” 4=“unchanged,” 5=“slightly improved,” 6=“much improved,” and 7=“completely recovered.” In the English language version of the QBPDS in the study by Davidson and Keating,8 the AUC was interpreted as the probability of correctly discriminating between “unchanged” and “improved” patient outcome, using a change score (score≤3) and an unchanged score (score=4–6) of the following ordinal scale: 1=“completely gone,” 2=“much better,” 3=“better,” 4=“a little better,” 5=“about the same,” 6=“a little worse,” and 7=“much worse.” In the English language version of the QBPDS in the study by Fritz and Irrgang.15 The AUC was interpreted as the probability of correctly discriminating between “clinically stable” and “clinically improved” patient outcomes based on the 15-point rating scale of Jaeschke et al,55 using a change score (score≥3) and an unchanged score (score=−3 to 3) of the following ordinal scale: 1=“completely gone,” 2=“much better,” 3=“better,” 4=“a little better,” 5=“about the same,” 6=“a little worse,” and 7=“much worse.”
In 3 of these studies, patients were treated by physical therapists for LBP,8 acute LBP,15 and chronic LBP.30 In one study,10 patients were treated by a multidisciplinary team for chronic LBP.
We found moderate evidence for positive responsiveness for the European Portuguese language version and limited evidence for positive responsiveness for the Dutch and English language versions of the QBPDS.
Interpretability: MIC
In one study on the European Portuguese language version30 and 2 studies on the Dutch language version,10,29 the MIC of the QBPDS was estimated. These 3 studies used the ROC method to determine the MIC. In the European Portuguese study by Vieira et al30 and the Dutch study by Demoulin et al,10 the MIC was determined by identifying the point closest to the upper left corner on the ROC curve. In the Dutch study by van der Roer et al,29 the MIC was determined by the optimal cutoff point as that point that yields the lowest overall misclassification.
In the European Portuguese language version of the QBPDS,30 the AUC was interpreted as the probability of correctly discriminating between “clinically stable” (score≤4) and “clinically improved” (score≥5) patient outcomes, based on the change in the PGIC-PT (ordinal scale from 1 to 754) score. In the Dutch study by Demoulin et al,10 the AUC was interpreted as the probability of correctly discriminating between “clinically stable” and “clinically improvement” patient's outcome, using a change score (score≥6) and an unchanged score (score=3–5) of the following ordinal scale: 1=“worse than ever,” 2=“much worsened,” 3=“slightly worsened,” 4=“unchanged,” 5=“slightly improved,” 6=“much improved,” and 7=“completely recovered.” In the Dutch study by van der Roer et al,29 the AUC was interpreted as the probability of correctly discriminating between “stable” and “improved” patient outcomes, using a change score (score ≤2) and an unchanged score (score=3–5) of the following ordinal scale: 1=“completely recovered,” 2=“much improved,” 3=“slightly improved,” 4=“no change,” 5=“slightly worsened,” 6=“much worse.”
In the study on the European Portuguese version,30 the MIC was defined as 6.5 points (AUC=0.74) for patients with chronic LBP after 6 weeks. In one Dutch study,10 the MIC was defined as 5 points (AUC=0.85) or an 18.1% change from baseline (AUC=0.86) after 10 weeks in patients with chronic LBP who received multidisciplinary rehabilitation. In the other Dutch study,29 the MIC was defined as 17.5 points (AUC=0.74) for patients with acute or subacute LBP and 8.5 points for patients with chronic LBP after 12 weeks.29 Patients of the 2 last-mentioned studies received physical therapy. An expert panel recommended an MIC of 20 points or a change of 30% from baseline.22
In 3 of these studies, patients were treated by physical therapists for LBP,8 acute LBP,15 and chronic LBP.30 In one study,10 patients were treated by a multidisciplinary team for chronic LBP.
Discussion
There is limited-to-moderate evidence for good reliability, validity, and responsiveness of the QBPDS for different language versions. However, there is no complete evidence for all measurement properties in any language version of the QBPDS. Because of the wide ranges in SDC and MIC values, it is difficult to determine whether the QBPDS can distinguish true changes from the systematic and random error of a score in individual patients.
Concerning the degree to which the QBPDS measures the construct of disability,7,18 the construct of the QBPDS also seems to be correlated to bodily pain.18 By using the term “difficulty,” possibly both the constructs disability and pain are measured by the QBPDS.
Limitations
A limitation of this study is that we were not able to differentiate among study settings, follow-up durations, interventions, and subacute, acute, and chronic LBP. There were not enough studies to enable distinctions among these groups. Therefore, we are not sure that the same results apply to, for example, patients with acute and chronic LBP. One study of poor quality (because of a small sample size) measuring patients with acute LBP showed an ICC of .55.15 This finding may suggest that the reliability of the QBPDS is not as good in patients with acute LBP, but more evidence is needed in good-quality studies.
For this systematic review, we were interested in all language versions; however, we used only English terms in our search string to identify relevant articles, which could have limited the inclusion of non-English studies.
The QBPDS was translated into 14 different languages.§§§ The translation of the original version was of at least good quality in 5 studies.5,21,25,27,35 However, cross-cultural validity has not been assessed. It is therefore unknown if the translated versions of the QBPDS assesses disability in the same manner as its original version. Cross-cultural validity can be assessed by determining if the factor structure of the translated version equals the original factor structure (in a multiple-group factor analysis), or by assessing if there is differential item functioning between the 2 versions.47 Differential item functioning means that patients with the same true score on the construct have the same score on the measurement instrument item.39
We recommend these statistical analyses for each language version to show if the scores of the translated QBPDS versions can be interpreted in the same way as the original version of the QBPDS developed by Kopec and colleagues.16,17
The decision to include 20 items in the final instrument was based on item analysis and practical considerations, which resulted in a 6-factor structure.17 Two good-quality studies, one on the original English-French version16,17 and one on the European Portuguese version,7 suggested that the 20-item scale could be considered approximately unidimensional, explaining 52% to 53% of the variance. However, another good-quality study using the Hungarian version37 showed a 4-factor structure (everyday activities, ambulation, sitting/carrying, and bed/rest) of the 20 QBPDS items, and the fair-quality studies on the original English-French version17 and the Greek language version5 showed a 6-factor structure (movement, handling of large/heavy objects, bending/stooping, ambulation, sit/stand, and bed/rest) of the 20 QBPDS items. An explanation for these different results may be low cross-cultural validity. As the dimensional structure of the QBPDS, therefore, is not entirely clear, the results on internal consistency should be interpreted with caution because unidimensionality is a prerequisite for a clear interpretation of the internal consistency statistics.56
Future Research
For almost all measurement properties, additional studies are needed. Foremost, studies are needed to determine cross-cultural validity so that scores related to different language versions of the QBPDS can be compared with each other.
We recommend adequate factor analyses for each language version in future studies of the QBPDS. Also, IRT analyses are recommended to investigate the internal structure of the QBPDS.
Studies that determine the reliability to determine “true” differences between patients and measurement error to distinguish true changes from systematic and random error also are needed.46,57 Regarding interpretability, only 3 studies10,29,30 of at least fair quality determined the MIC of the QBPDS, and their results varied widely. Because of the wide range in SDC and MIC values, it is difficult to conclude whether the SDC is larger or smaller than the MIC. It is recommended, therefore, that future research should focus on determining the MIC and SDC in all language versions. Until there is more evidence regarding the MIC, we recommend using the conservative guidelines as recommended by the expert panel (an MIC of 20 points or a change of 30% from baseline).22
Furthermore, more high-quality studies on responsiveness are needed. Most studies assessing responsiveness of the QBPDS did not formulate or only vaguely formulated hypotheses regarding expected correlations in advance.8,16,33,42 Without specific hypotheses, the risk of bias is high because retrospectively it is tempting to come up with alternative explanations for low correlations instead of concluding that the questionnaire is not responsive.48 We, therefore, recommend performing additional high-quality studies, testing specific hypotheses regarding the correlation of changes in the QBPDS with changes in other disability questionnaires, pain measures, and measures of psychosocial functioning.
The original version of the QBPDS was in English and French. For these language versions, research of at least good methodological quality is needed for every measurement property mentioned by the COSMIN checklist.36 Internal consistency, reliability, measurement error, structural validity, and hypotheses testing have to be investigated once again by a study of at least good quality. Thereby, content validity, criterion validity, responsiveness, and interpretability have to be investigated once by a study of excellent methodological quality or at least twice by good methodological studies.
It is advisable to perform similar reviews for the RMDQ and the ODI to enhance the comparability among the different questionnaires. Finally, future research should focus on comparing the QBPDS, RMDQ, and ODI with newly developed, IRT-based instruments, such as the Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Functioning instruments (http://www.nihpromis.org). The PROMIS offers major advantages, such as the possibility of computer adaptive testing and comparability of scores across patient populations. Research has shown that the PROMIS has better measurement properties than traditional questionnaires.58 Although the PROMIS is a generic questionnaire, it may be as responsive as disease-specific questionnaires when used as a computerized adaptive test, and its responsiveness should be investigated in future studies.
For research and clinical practice, we advise using the QBPDS with caution to measure disability in patients with nonspecific LBP. Strong evidence is lacking on all measurement properties for each language version of the QBPDS.
Appendix.
Quebec Back Pain Disability Scale (Kopec et al16,17)a
a QBPDS © Jacek A. Kopec, 1995. All rights reserved. Used with permission from © Mapi Research Trust, Lyon, France: https://eprovide.mapi-trust.org.
Footnotes
Dr Speksnijder, Mr Koppenaal, Professor Knottnerus, Dr Spigt, and Dr Terwee provided concept/idea/research design. All authors provided writing and data analysis. Dr Speksnijder, Mr Koppenaal, Dr Spigt, and Dr Terwee provided data collection. Dr Speksnijder and Mr Koppenaal provided project management. Professor Knottnerus provided facilities/equipment and institutional liaisons. Mr Koppenaal, Professor Knottnerus, and Dr Terwee provided consultation (including review of manuscript before submission).
The authors thank Alice Tillema (Medical Library; Radboudumc Nijmegen, the Netherlands) for optimizing the search filter.
‡ References 5, 7–10, 14–18, 21, 23–27, 29–31, 33, 35, 37, 41–45.
§ References 5, 8–10, 14–18, 21, 23–27, 29–31, 33, 35, 41–45.
‖ References 5, 8–10, 14, 15, 18, 21, 23, 24, 26, 27, 29, 30, 35, 41–45.
** References 5, 7, 9, 16, 21, 24, 25, 27, 31, 33, 35, 37, 41, 43–45.
†† References 5, 7–9, 15–18, 21, 24, 25, 27, 31, 33, 35, 37, 44, 45.
§§ References 5, 7–9, 18, 21, 24, 25, 27, 31, 33, 37, 44, 45.
## References 5, 7, 9, 14, 16, 18, 21, 24–27, 31, 33, 35, 37, 41–45.
*** References 5, 7, 9, 14, 18, 21, 24–27, 31, 33, 35, 41–44.
‡‡‡ References 5, 7, 9, 17, 21, 24, 25, 27, 31, 33, 35, 37, 41, 44, 45.
§§§ References 5, 7, 9, 17, 21, 24, 25, 27, 31, 33, 35, 37, 41, 44, 45.
- Received October 29, 2014.
- Accepted May 3, 2016.
- © 2016 American Physical Therapy Association