Abstract
Background Sound measurement properties of outcome tools are essential when evaluating outcomes of an intervention, in clinical practice and in research.
Purpose The purpose of this study was to review the evidence on reliability, measurement error, and responsiveness of measures of gait function in children with neuromuscular diagnoses.
Data Sources The MEDLINE, CINAHL, EMBASE, and PsycINFO databases were searched up to June 15, 2012.
Study Selection Studies evaluating reliability, measurement error, or responsiveness of measures of gait function in 1- to 18-year-old children and youth with neuromuscular diagnoses were included.
Data Extraction Quality of the studies was independently rated by 2 raters using a modified COnsensus-based Standards for the selection of health status Measurement INstruments (COSMIN) checklist. Studies with a fair quality rating or better were considered for best evidence synthesis.
Data Synthesis Regarding the methodological quality, 32 out of 35 reliability studies, all of the 13 measurement error studies, and 5 out of 10 responsiveness studies were of fair or good quality. Best evidence synthesis revealed moderate to strong evidence for reliability for several measures in children and youth with cerebral palsy (CP) but was limited or unknown in other diagnoses. The Functional Mobility Scale (FMS) and the Gross Motor Function Measure (GMFM) dimension E showed limited positive evidence for responsiveness in children with CP, but it was unknown or controversial in other diagnoses. No information was reported on the minimal important change; thus, evidence on measurement error remained undetermined.
Limitations As studies on validity were not included in the review, a comprehensive appraisal of the best available gait-related outcome measure per diagnosis is not possible.
Conclusions There is moderate to strong evidence on reliability for several measures of gait function in children and youth with CP, whereas evidence on responsiveness exists only for the FMS and the GMFM dimension E.
Achieving, restoring, or sustaining the ability to walk is one of the main goals when rehabilitating children with congenital or acquired neuromuscular diagnoses. Although 1 in 2 children with cerebral palsy (CP) can walk independently, 1 in 6 use walking aids and thus are at high risk to lose walking function during adolescence.1
To evaluate interventions and monitor progress, valid, reliable, and responsive assessment tools are essential.2 Information on validity (the degree to which an instrument measures the construct it purports to measure), reliability (the degree to which the measurement is free from measurement error), and responsiveness (the ability of an instrument to detect change over time in the construct to be measured) is a prerequisite in the decision-making process of determining which measures should be chosen in the clinical setting as well as in research.
The International Classification of Functioning, Disability and Health (ICF) of the World Health Organization provides a theoretical framework and classification system for describing and measuring health and health-related states.3 In 2007, the ICF was complemented for children and youth (ICF-CY).4 Like the ICF, the ICF-CY classifies outcome into 4 domains: “body function and structure,” “activity” (the execution of a task or action by an individual), “participation” (the person's involvement in a life situation), and “environmental and personal factors.” When monitoring progress, it is important to distinguish between what a patient is capable of in a standardized controlled environment (capacity) and what the patient actually does in his or her daily environment (performance).5
Interventions are often carried out with the goal of affecting clinically important activity and participation domains. Bjornson et al,6 for example, demonstrated this approach in their comprehensive evaluation of the effect of botulinum toxin, not only on spasticity but also on functional activities in children with CP. Therefore, the evaluation of measures being classified in these domains is needed.
Systematic reviews on measurement properties of assessment tools can facilitate the clinician's or researcher's search for an appropriate measurement instrument.7 Knowledge on the reliability, measurement error, and responsiveness of evaluative measurement tools, in particular, is important for the evaluation and clinical interpretation of change scores. In quantifying gait and gait-related outcome, some systematic reviews focused on activity and participation measures in children with CP,8–12 dynamic balance in children with acquired brain injury,13 functional mobility tools in children with hereditary spastic paraplegia and other childhood neurological problems,14 or computerized gait analysis techniques in children with CP or spina bifida.15 Young and Wright identified physical function scales appropriate for pediatric orthopedics,16 and a recent review evaluated the Six-Minute Walk Test (6MWT) in children with chronic conditions.17 Children especially not only use gait to get from place to place but also integrate gait into all kinds of playful activities. Consequently, different constructs have to be considered when assessing gait function in children. To our knowledge, no review has summarized the evidence on the different constructs that are an integral part of gait function, nor do we know of any review that included studies on the whole spectrum of existing neuromuscular diagnoses in children. Therefore, the purposes of this literature review were: (1) to give an overview of available capacity and performance measures for the evaluation of gait function in children and youth with neuromuscular diagnoses and (2) to evaluate the current level of evidence for reliability, measurement error, and responsiveness of those measures.
Method
Data Sources and Searches
We performed an electronic search on June 15, 2012, in MEDLINE (via PubMed 1966–2012), CINAHL (via EBSCO 1981–2012), EMBASE (via embase.com 1974–2012), and PsycINFO (via EBSCO 1806–2012). Key search terms and MeSH terms were separately searched in 3 main filters, which comprised the construct (gait function), the target population (child and neuromuscular diagnosis), and the measurement properties, and were then combined with an exclusion filter. In PubMed, a validated search filter for finding studies on measurement properties was used.18 The full search strategy is described in the eAppendix. We also reviewed the references of the included articles to identify additional eligible articles. Additionally, a second search was performed in PubMed, including the identified outcome instruments in combination with the terms for the target population.
Study Selection
Screening of titles and abstracts as well as reviewing potentially eligible full-text articles were independently performed by 2 reviewers (C.A. and H.vH.). Cases of disagreement were discussed until consensus was reached. To be included, studies had to meet the following a priori formulated criteria: (1) the study evaluated capacity or performance measures related to gait function, thus being classified as activity or participation measures according to the ICF model; (2) constructs included were walking ability, functional mobility, speed, endurance, and dynamic balance, and all constructs had to be measured during walking or running, or they were excluded (eg, cardiorespiratory fitness during bicycle ergonometry); (3) the study population consisted of 1- to 18-year-old children and youth with neuromuscular diagnoses or developmental disabilities; (4) the study aim was the evaluation of reliability, measurement error, or responsiveness of the measure under question; and (5) articles were published in English or German as original articles in peer-reviewed journals. Semistructured interviews, multiple-item scales with less than 50% of the items being gait specific, and individualized tools not exclusively focusing on gait were excluded, as well as articles containing translations or transcultural adaptations of measures into languages other than English, German, or Dutch. Falls and static balance were not considered gait specific. Anaerobic and aerobic measures of pulmonary function, even when tested during walking, were considered body functions in the ICF model; thus, articles dealing with only these constructs also were excluded.
Data Extraction and Quality Assessment
We used a standardized protocol. Because no software was available, we developed a database based on Microsoft Access 2010 (Microsoft Corp, Redmond, Washington) for comprehensive data management, including article selection, methodological quality scoring, and the data extraction process.
Evaluation of the Methodological Quality of the Included Studies
Two reviewers (C.A. and H.vH.) independently evaluated the methodological quality of the included studies using the 4-point rating scale of the COnsensus-based Standards for the selection of health status Measurement INstruments (COSMIN) checklist. This instrument was developed in an international, multidisciplinary Delphi study to evaluate the methodological quality of studies on measurement properties of health status instruments.19–21 Within the COSMIN checklist, we used the 3 boxes evaluating the methodological quality of the studies on reliability, measurement error, and responsiveness. Whereas reliability is defined as the proportion of the total variance in the measurements that is due to “true” differences among patients, measurement error stands for the systematic and random error of a patient's score that is not attributed to true changes in the construct to be measured.22 Responsiveness is defined as the ability to detect change over time in the construct to be measured.22 Each item within a box was rated as “excellent,” “good,” “fair,” or “poor.” An overall score for the methodological quality of the study was determined by taking the lowest rating of any of the items in a box. Cases of disagreement were discussed until consensus was reached or, where not possible, resolved by a third, independent reviewer (C.B.). In every COSMIN box, there is an item concerning the sample size requirements. For a “fair” rating of this item, a study has to have at least 30 participants. In the neuropediatric field, this criterion would lead to a “poor” rating for many of the psychometric studies that otherwise would be scored at least as “fair.” As this criterion would result in a loss of these articles for best evidence synthesis, we decided to use a modified COSMIN checklist and omit the sample size item from the quality assessment. Instead, in line with previous systematic reviews and after consultation with authors from the COSMIN checklist, we accounted for it at the best evidence synthesis stage.17,23
Data Extraction
General characteristics of the instruments and data on interpretability and generalizability of the study results were extracted using the 2 corresponding COSMIN checklist boxes. A part of the CanChild Outcome Measures Rating Form was completed to describe the clinical utility of the evaluated outcome measures in terms of test format, time for test administration, required assessor training, and costs (eTable).24 If essential information was missing in the original publication, the authors were contacted to provide these data.
Best Evidence Synthesis
Based on quality criteria proposed by Terwee et al,25 one reviewer (C.A.) rated the results of the measurement properties for each study as positive, indeterminate, or negative (Tab. 1). If studies evaluating the same outcome measure were sufficiently homogenous concerning the study population, design, and measurement procedure, an overall rating, adjusted for methodological quality, was performed. The best evidence synthesis included only results from studies rated as being of “excellent,” “good,” or “fair” methodological quality on the COSMIN. The level of overall evidence was rated on the basis of the strategy from the Cochrane Back Review Group as “strong,” “moderate,” “limited,” “conflicting,” or “unknown” (Tab. 2).26
Quality Criteria for Measurement Propertiesa
Levels of Evidence for the Overall Quality of the Measurement Properties (Based on the Cochrane Back Review Group 200326)a
To account for the sample size, the level of evidence was rated as “strong” when the total sample size of the combined studies was ≥100, “moderate” for a total sample size between 50 and 99, “limited” for a total sample size between 25 and 49, and “unknown” when the sample size was fewer than 25.23
Role of the Funding Source
The authors gratefully acknowledge funding by the Mäxi Foundation, Zurich, Switzerland.
Results
Description of the Included Studies
The systematic search resulted in 2,467 references. After screening of titles and abstracts, 70 potentially relevant full-text articles remained. Finally, 42 studies with 27 different outcome measures (22 capacity measures and 5 performance measures) met the inclusion criteria (Figure). The characteristics of the studies are summarized in Tables 3 and 4. All available capacity and performance measurement tools, including their clinical utility, are shown in the eTable. Some studies evaluated measurement properties of different outcome measures and, therefore, are mentioned multiple times in Tables 3 and 4.27–35 Reliability was evaluated in 35 studies, 13 studies dealt with measurement error, and 10 studies examined responsiveness. Children with CP were most prevalent in the assessed studies (n=26), followed by Duchenne muscular dystrophy (DMD) (n=3), Down syndrome (DS) (n=2), and acquired brain injury (n=2). Children with stroke, spina bifida, developmental coordination disorder, and spinal muscular atrophy were each represented in 1 study, whereas 5 studies evaluated measures of gait function in mixed patient groups. The Gross Motor Function Measure (GMFM) dimension E (n=7), gait speed (n=6), cadence (n=5), Timed “Up & Go” Test (TUG) (n=4), and 6MWT (n=4) were the most frequently evaluated measures of gait function.
Flowchart of the literature search and the selection of the studies.
Characteristics of the Included Studies (Reliability and Measurement Error)a
Characteristics of the Included Studies (Responsiveness)a
Measurement Properties
All results concerning the methodological quality of the studies are based on the modified COSMIN checklist without the sample size requirements. The methodological quality of 16 of the 35 studies evaluating reliability was rated as “good,” 16 as “fair,” and 3 as “poor” (Tab. 3). Reasons for a “poor” rating were no or sparse description of the study population and study procedures36 and flaws in the statistical analysis, as judged based on criteria of the COSMIN checklist.37,38 Test-retest reliability (the consistency of a measure from one time to another) was evaluated 27 times, interrater reliability (the degree of agreement among different raters) was evaluated 15 times, and intrarater reliability (the degree of agreement among multiple repetitions of a test performed by a single rater) was evaluated 8 times. Most studies reported a positive result when the quality criteria concerning the reliability coefficients displayed in Table 1 were applied. Only 3 studies showed negative reliability results: 1 study evaluated gait speed of children with developmental coordination disorder on the GAITRite gait analysis system (CIR Systems Inc, Sparta, New Jersey),30 1 study investigated the gross motor domain of the Paediatric Stroke Activity Limitation Measure in children with stroke,39 and 1 study evaluated the balance subset of the Bruininks-Oseretsky test, the full-turn test, and the TUG in children with DS.29 However, this latter study was rated as “poor” and was not taken into account for best evidence synthesis.
Measurement Error
In the 13 studies concerning the measurement error, the quality was rated as “good” in 9 studies and as “fair” in 4 studies (Tab. 3). Information on measurement error was available for the 6MWT,27,28,41 the Shuttle Run Test,42,44 the 10×5-Meter Sprint Test,43 the 10-Meter Fast Walk Test,28 the Fast 1-Minute Walk Test,48 the GAITRite,31,46 the Community Balance and Mobility Scale,47 the Functional Walking Test,49 gait speed,50 the maximal speed during the Treadmill Walking Test,27 and the 28-item and 47-item Mobility Questionnaires (MobQues28 and MobQues47).32
Responsiveness
Of the 10 studies on responsiveness, 1 study was rated as “good,” 4 studies were considered “fair,” and 5 studies were rated as “poor” (Tab. 4). Reasons for a “poor” rating—based on criteria of the COSMIN checklist—were the use of inappropriate statistical methods in all 5 studies, the lack of a comparator instrument,33,34,65,66 and no clear description of the assessment protocol.33,34 The studies with a “poor” rating concerned the 6MWT,65 cadence and gait speed measured with the GAITRite34 and with 3-dimensional gait analysis,33 the dimension E of the GMFM,33,67 and the Functional Mobility Scale (FMS).66
Best Evidence Synthesis
Results of the best evidence synthesis are summarized in Table 5.
Best Evidence Synthesisa
Reliability
Given the large diversity of outcome measures and studied patient populations, results could rarely be combined. There was strong positive evidence for interrater reliability of the FMS in children with CP.63,64 Moderate positive evidence was shown for test-retest reliability of the 6MWT28,41 and the TUG57,69 in children with CP. Similarly, moderate positive evidence was found for intrarater reliability of the Functional Walking Test49 and the 28-item and 47-item Mobility Questionnaires32 as well as for interrater reliability of the Mobility Questionnaire in children with CP.32 Limited positive evidence for reliability in children with CP was available for the following instruments: the 10×5-Meter Sprint Test,43 the 10-Meter Fast Walk Test,28 the ABILOCO-Kids,45 cadence measured with the GAITRite,31,46 gait speed measured with a stopwatch,50 and the quality analog scale.55 Also, limited positive evidence for reliability was available for the Community Balance and Mobility Scale in children with acquired brain injury47 and the walking scale of the Functional Assessment Questionnaire in children with mixed neuromuscular diagnoses.62
Measurement Error
As no information on the minimal important change was reported in the included studies, the level of evidence concerning the measurement error of instruments remained unclear.
Responsiveness
Moderate negative evidence was available for responsiveness of the dimension E of the GMFM in children with DS. However, when—in addition to the standard scoring procedure—the child's caregiver was asked for information regarding activities of the child demonstrated at home but failed to perform during the assessment (reported score), there was moderate positive evidence for responsiveness instead.51 Furthermore, limited positive evidence was available for responsiveness of the FMS63 and the dimension E of the GMFM53 in children with CP.
Discussion
The purpose of this systematic literature review was to evaluate the evidence on reliability, measurement error, and responsiveness of functional measures of gait in children and youth with neuromuscular diagnoses. We identified 42 eligible articles evaluating the measurement properties of 27 different measures of gait function. These measures covered variable constructs, such as walking ability, functional mobility, gait speed, dynamic balance, and activity limitations—all being an integral part of gait function. Some authors also mentioned constructs such as aerobic fitness, power, and agility (eTable). However, although the authors interpreted their results reflecting these constructs, all the tests in question are just measures of mobility combining speed and endurance, as patients have to walk as fast or as long, or both, as possible.
Most of the studies were rated as “good” or “fair” on the 4-point rating scale of the modified COSMIN checklist without sample size, whereas 8 studies were judged as being of poor quality and had to be excluded from the best evidence synthesis process. Responsiveness studies mostly suffered from a lack of quality, with 5 out of 10 studies being classified as “poor.” According to the COSMIN manual, “the responsiveness issue is about whether the direction and magnitude of a correlation is similar to what could be expected based on the construct(s) that are being measured.”70 This statement implies that hypotheses on the relationship of different measurement instruments have to be stated and tested. This requirement was not fulfilled in any of the 5 studies with a “poor” rating. By excluding studies with poor quality from the best evidence synthesis, the North Star Ambulatory Assessment, an outcome measure specifically developed for children with DMD, was not represented in the best evidence synthesis anymore.36
Measurement properties still need to be determined for most instruments and most diagnoses. Not surprisingly, most evidence was available for children with CP, as these children also represent the largest fraction in the neuropediatric field. Nevertheless, even for these children, there was no measurement tool with evidence reported for all 3 measurement properties that were evaluated in this review. The instrument showing most information on measurement properties was the FMS, with strong positive evidence for reliability and limited positive evidence for responsiveness in children with CP. Although capacity measures strongly dominated, the FMS represents one of the few performance measures. Interestingly, far more studies evaluated the reliability than the measurement error. For evaluative measures, information on the agreement, which is quantified with the measurement error, is much more valuable than reliability. Whereas reliability parameters highly depend on the variation in the population sample, agreement is more a characteristic of the measurement instrument itself. As such, it allows the conclusion whether a change in score represents a real change. Because the measurement error is expressed in the unit of the measurement, it also facilitates the clinical interpretation of a change score.71
In general, results from different studies could rarely be combined. The GMFM dimension E, for example, was the instrument being most frequently represented in this review, with 7 different publications. Nevertheless, its reliability remains undetermined because the patient samples were too small and heterogeneous; therefore, a synthesis of different studies was not feasible. The poor possibilities of combining different study results are a problem also faced by other authors of systematic reviews on measurement properties.17,23 Main reasons are inhomogeneous patient populations, too large variation in test procedures, or just singular publications per outcome measure. In this review, some studies with mixed patient groups were included. In cases where the predominant diagnosis was CP and other diagnoses accounted for only a small portion of the study population, we combined the results for the best evidence synthesis. This procedure concerned studies on reliability of the TUG57,69 and of the cadence measured with the GAITRite.31,46
Although different approaches exist to assess the quality of studies, the COSMIN checklist has increasingly been used in systematic reviews on measurement properties of health measurement instruments in the last few years. Also, some of the systematic reviews on measurement properties in the pediatric field used the COSMIN checklist.12,14,17 This allows for a comparison of the performed quality ratings concerning the congruent studies and measurement properties across the different reviews. Eight studies included in our review also were rated in the systematic review of Adair et al.14 When the scoring (with the inclusion of the sample size item) was compared, 4 studies33,53,54,64 achieved an identical score, 3 studies52,63,66 differed by 1 level, and the quality rating of 1 study62 remained unclear in the review of Adair et al. Reasons for the scoring differences are hard to find, as Adair et al provided an overall score rather than a score per item. The ratings of Balemans et al12 are difficult to compare because they used an older version of the COSMIN checklist, which requires only a “yes” (adequate) or “no” (not adequate) rating in comparison with the 4-point rating scale used in this review. Finally, the review of Bartels et al17 on measurement properties of the 6MWT yielded 5 studies for a possible comparison, 2 of them being identically scored in the 2 reviews,41,65 whereas 3 differed by 1 level.27,28,40 Differences were related to at least 2 items per study as well as to the overall rating of each study. In all of these studies, a systematic discrepancy could not be detected between the ratings. The interrater agreement and reliability of the COSMIN checklist have already been investigated.72 Nevertheless, conclusions have to be drawn with precaution, as reliability has only been determined on item level and without the 4-point rating option, which has just recently been developed21 and was used in this review.
To increase the interrater agreement, the developers of the checklist recommend gaining some experience with the checklist before conducting a systematic review and finding consensus on how to score items that require some subjective judgment in advance. To follow these recommendations, we pretested 3 articles, discussed problems that occurred, and defined a joint procedure in items containing subjective issues. Furthermore, we strictly adhered to the taxonomy and terminology of the COSMIN checklist. Comparing the evidence for measurement tools that also were evaluated in other reviews, there is consensus that positive evidence exists for reliability10,14 and responsiveness14 of the FMS in children with CP as well as for reliability of the Functional Assessment Questionnaire in children with mixed neuromuscular diagnoses.10,14 Concerning the 6MWT, we found only positive evidence for reliability in children with CP. Bartels et al17 also reported a positive quality rating for reliability in children with DMD and spina bifida. There also were discrepancies regarding the various shuttle run tests. We concluded that evidence for all evaluated psychometric properties is unknown; in contrast to this conclusion, Balemans et al12 found excellent positive evidence for reliability and measurement error in those tests.
Several reasons exist for the differences in the reported evidence levels. Not all authors accounted for the methodological quality of the included studies when summarizing the evidence,10,14 or requirements regarding the quality criteria for the measurement properties were not stated.14 Balemans et al rated the minimal important change of a measure based on their clinical experience when it was not stated in a study,12 whereas we rated the evidence as unknown in those cases. Furthermore, requirements concerning the sample size had not been established10,12,14 or were different than in our review.17
Study Limitations and Methodological Considerations
The COSMIN checklist was originally developed to evaluate the methodological quality of studies on measurement properties of patient-reported questionnaires. Nevertheless, these standards also can be applied to other outcomes, such as performance-based tests or rating scales. By excluding the sample size item from the quality assessment and accounting for it in the best evidence synthesis stage, we deviated from the standard COSMIN procedure. Given that sample size is usually rather small in neuropediatric studies, the available evidence would have been drastically reduced by more than 50% without this approach. Two studies included in the best evidence synthesis were performed with mixed patient groups31,57 and were combined with studies specifically investigating children with CP. As the number of patients with a diagnosis other than CP was rather small and all children had a chronic, congenital condition, we assume that the results found for the mixed patient group reflected well the results of the children diagnosed with CP. Nonetheless, if this assumption was not true, the level of evidence for the concerning outcome tools (TUG and cadence measured with the GAITRite) could be biased. We decided to concentrate on reliability, measurement error, and responsiveness, as these are the most critical measurement properties for an evaluative measurement tool. As we did not include articles evaluating the validity of functional measures of gait, a comprehensive appraisal of the best available measure of gait function per diagnosis was not possible. By excluding articles that were not published in the English or German language, stronger evidence on the measurement properties or information on further measurement tools may have been missed.
Although we stringently followed the COSMIN guidelines, we did not always agree with the defined standards. Concerning responsiveness, for example, the COSMIN criteria recommend formulating and testing hypotheses on the direction and magnitude of change score correlations between 2 different measures. Hence, the use of effect sizes or a standardized response mean is considered inappropriate. Although this approach may be reasonable in health-related patient-reported outcomes, we question the clinical sense of this criterion in performance-based measures, as a gold standard hardly exists in these measures.
Implications for Clinical Practice and Research
The lack of evidence on the measurement properties of many measures assessed in this systematic review does not imply that these tools are not appropriate to assess gait function in children with neuromuscular diagnoses. Rather, it demonstrates the shortage of high-quality studies evaluating the measurement properties of many outcome tools that are already widely used in pediatric neurorehabilitation. In a first step, we recommend that rehabilitation specialists and researchers seek dialogue and find consensus on the most relevant measures of gait function and focus the testing of the measurement properties on these outcome tools. Until then, clinicians have to decide on an individual basis which tools are most relevant and feasible for their specific situation. For an integral picture of the child's walking abilities, capacity measures as well as performance measures should be represented. We believe that this review and the description of the tools' characteristics in the eTable are of help in this selection process.
Conclusions
Many different measures are used to assess gait function in children with neuromuscular conditions. Most of the tools give an indication of the child's capacity level, with only a few considering the performance level. There is moderate to strong evidence on reliability for several measures in children with CP (6MWT, TUG, FMS, MobQues28 and MobQues47, and functional walking test). Positive evidence for responsiveness exists for the FMS as well as the GMFM dimension E in children with CP. Evidence on measurement error is completely lacking.
This systematic review highlights the urgent need for high-quality studies evaluating the measurement properties of evaluative outcome tools assessing gait function in children with neuromuscular diagnoses. To facilitate the combination of different study results for best evidence synthesis, specialists from the clinic and research should find consensus on the most appropriate and relevant measures of gait function. Studies on responsiveness and measurement error, including information on the minimal important change, are especially desirable, as they allow a clinically relevant interpretation of a patient's change score. This latter issue is essential, as the success of future trials investigating the effectiveness of interventions aiming at improving gait will strongly depend on the use of appropriate outcome measures.
Footnotes
All authors provided concept/idea/research design. Ms Ammann-Reiffer provided writing. Ms Ammann-Reiffer, Dr Bastiaenen, and Dr van Hedel provided data collection. Ms Ammann-Reiffer, Dr Bastiaenen, and Dr de Bie provided data analysis. Ms Ammann-Reiffer, Dr de Bie, and Dr van Hedel provided project management. Dr Bastiaenen, Dr de Bie, and Dr van Hedel provided consultation (including review of manuscript before submission). The authors thank Dr Martina Gosteli from the main library of the University of Zurich for her comments on the literature search strategy and Dr Caroline B. Terwee for her valuable help and comments on the appropriate use of the COSMIN guidelines.
An oral presentation of the data from this study was given at the World Congress for NeuroRehabilitation; April 8–12, 2014; Istanbul, Turkey.
The authors gratefully acknowledge funding by the Mäxi Foundation, Zurich, Switzerland.
- Received July 17, 2013.
- Accepted March 14, 2014.
- © 2014 American Physical Therapy Association