Abstract
Background Children with cerebral palsy (CP) typically show muscle weakness of the lower extremities, which can be measured with the use of handheld dynamometry (HHD).
Objective The purposes of this study were: (1) to determine test-retest reliability and measurement error of isometric lower-extremity strength measurements in children with CP with the use of HHD and (2) to assess implications for measurement design.
Design A test-retest design was used.
Methods Fourteen children with hemiplegic (n=6) or diplegic (n=8) spastic CP (Gross Motor Function Classification System levels I–III), ages 7 to 13 years, were assessed for isometric strength on 2 separate days (occasions) with the use of HHD, with 3 trials per muscle group. The intraclass correlation coefficient, standard error of measurement, and smallest detectable difference (SDD) were calculated for different measurement designs.
Results Intraclass correlation coefficient values of single measurements for all muscle groups ranged from .70 to .90, and the SDD was large (>30%). Regarding measurement error, the largest source of variability was found for occasion. A 2-occasion mean decreased the SDD by 9% to 14%. For trials, a greater improvement in SDD was found when 2 trials were averaged instead of 3. A measurement design of 2 trials–2 occasions was superior to the often-used approach of 3 trials–1 occasion.
Limitations The small sample size was the major study limitation.
Conclusions Handheld dynamometry is reliable and can be used to detect changes in isometric muscle strength in children with CP when using the mean of at least 2 trials. To further improve reliability, taking the average of 2 occasions on separate days is recommended, depending on group size and muscle group.
Children with cerebral palsy (CP) typically show muscle weakness of the lower extremities,1–4 which may lead to limitations in walking ability and other gross motor activities.1,2,5 The expected effect of muscle weakness on gross motor ability in CP has led to the concept that increasing lower-extremity muscle strength through specific strength training may improve motor activities in this patient group.1,2,5–7
Lower-extremity muscle strength can be measured with the use of handheld dynamometry (HHD). The instrument contains a force transducer, which enables manual assessment of isometric strength by holding the dynamometer rigidly and perpendicular to a person's body segment. Although HHD often has been applied in clinical and research settings, current literature contains little convincing evidence regarding its reliability in children with CP.
A recent study of the use of HHD for lower-extremity strength measurements in children with CP showed that the method provides only moderate inter-assessor reliability,8 resulting in the recommendation that the same assessor should carry out repeated measurements on a particular individual. Three other studies have estimated the intra-assessor reliability of lower-extremity strength measurements with the use of HHD in children with CP. Whereas 2 of these studies showed intraclass correlation coefficient (ICC) values >.8 for all muscle groups except for the knee flexors,9,10 the third study11 showed that the ICC exceeded .8 for only half of the assessed muscle groups. Although efforts were made to standardize the testing procedures (eg, applying stabilization, test position), measurement variability still prevents the accurate measurement of changes in individuals or small groups. Nonetheless, HHD is the most practical instrument for testing isometric strength in clinical practice, meaning that there is a need for recommendations aimed at improving reliability.
Studies of HHD have included a variety of statistical approaches to the assessment of reliability (ie, the extent to which scores for patients who have not changed are the same for repeated measurement12,13), in which the ICC was the most frequently used statistical parameter. The ICC reflects the extent to which a measurement device can differentiate among individuals,14 yet it is highly dependent on the heterogeneity of the population.15 Another statistic is the standard error of measurement (SEM) (ie, the systematic and random error of a patient's score that is not attributed to true changes in the construct to be measured12,13). The SEM represents how far apart the outcomes of repeated measures are, expressed in the units of measurement,16 thereby providing important information on how well changes can be observed. The SEM allows the calculation of the smallest detectable difference (SDD), by use of 1.96 × sqrt(2) × SEM, an important statistic for clinicians evaluating changes in individuals. Because the SEM and ICC provide differing information about a test, it is important to report both when evaluating reliability.17
One approach to reduce measurement error (in addition to standardizing the measurement protocol and providing improved assessor training) is to take the average value of a number of trials, either within 1 measurement occasion or on separate measurement occasions. In the case of strength measurements, a common approach is to take the mean of 2 or 3 trials9,10 or to use the maximal across trials.9,11 However, although recommendations on the optimal number of trials do not exist yet, given the large measurement errors associated with HHD strength measurements,11 such recommendations are needed. This situation raises important questions, including what number of trials is needed at each measurement occasion and whether averaging separate occasions is beneficial to decrease measurement error. Therefore, the purposes of our study were: (1) to investigate the sources of measurement variability of HDD strength measurements in diverse muscle groups of ambulant children with CP and (2) to use the results to determine the reliability and measurement error of these measurements, thereby assessing the implications for different measurement designs (ie, number of trials and occasions). These findings will allow us to make useful recommendations for clinical practice.
Method
Participants
Children with spastic CP were recruited from a school for children with physical disabilities in the Netherlands. Inclusion criteria for the study were: (1) age between 7 and 13 years, (2) cognitively able to allow comprehension of and adherence to simple instructions, which was determined on the basis of information in the patient's medical record and the physician's confirmation, and (3) a Gross Motor Function Classification System (GMFCS) classification of level I, II, or III. Exclusion criteria were: (1) treatment with multilevel botulinum toxin injection <3 months before inclusion or (2) orthopedic surgery or selective dorsal rhizotomy <6 months before inclusion.
Procedure
Written informed consent was obtained from all participating children (12 years of age and above) and their parents. The test protocol consisted of lower-extremity isometric strength measurements. These measurements were taken at the same time of day, on 2 different test occasions separated by a 2- to 5-day period. It was assumed that strength does not change during this short period in children with CP. On the second occasion, the assessor did not have access to the strength measurements that were obtained earlier. All tests were performed by 1 assessor (physical therapist) who received training consisting of instructions on how to use the device and how to apply the standardized test position and stabilization procedures. In addition, practice trials were performed on separate days on volunteers without disabilities and children with CP, while the assessor received feedback about assessment performance.
Instrument
A Microfet handheld dynamometer (Biometrics, Almere, the Netherlands) was used for the assessment of isometric strength. This device has a force measurement range of 3.6 to 660 N, with a sensitivity of 0.4 N. Intra-assessor reliability of this instrument has been high (ICC >.80) when measuring people who were healthy.18 Evidence for the validity of a similar HHD to detect change in the lower force range was provided in patients with post-polio syndrome.19
Measurements
Handheld dynamometry was used to quantify isometric strength of the knee extensors, knee flexors, hip flexors, hip abductors, and ankle plantar flexors of the most involved leg. The order of testing was according to the sequence cited. Three trials were performed on each muscle group. Prevention of muscle fatigue was achieved by a 30-second recovery period after each trial and 2 minutes of rest between evaluations of the muscle groups. One or 2 trials were used to familiarize the participants with the testing procedures before the actual strength measurements of each muscle group. The participants' positions, joint angles, placement of applied resistance, and locations of stabilization are shown in Table 1. A strap was used to stabilize the participant. The “make test” was used to measure isometric strength because of its superior reliability in children with CP compared with the alternative (the “brake test”).8 In the “make test,” participants are asked to gradually apply maximal force against the dynamometer, which is held rigidly perpendicular to the body segment. Before the test, a standard instruction of “push as hard as you can” was given, and during the test the children were encouraged to apply maximal effort. Maximal strength was exerted for 3 to 5 seconds, at which point the examiner instructed the child to relax. Maximal isometric muscle strength was recorded for all measured lower-extremity muscle groups during each trial. Muscle strength was expressed in newton per kilogram of body weight (N·kg−1).
Muscle Test Positions
Data Analysis
Statistical analysis was carried out with the use of SPSS version 15.0 (SPSS Inc, Chicago, Illinois) for Windows. Reliability was assessed by use of the generalizability theory.14,16,20 This theory is based on the recognition that in any measurement situation, there are multiple sources of measurement variability.14 The strength of the generalizability theory, and the reason it was applied in the current study, is the potential to improve the reliability of a measurement by investigating the influence of those sources on the measurement in question.14 In the first stage, sources of variability are identified and estimated in a generalizability study (G-study).14 The effect of various conditions within the measurement design on reliability then can be investigated in a subsequent decision study (D-study).14
An analysis of variance was performed to determine the sources of measurement variability, by use of the method of restricted maximal likelihood with 2 factors: trials (3 levels) and occasions (2 levels). These analyses yielded variance components attributable to the variability between subjects (vars), trials (vart), occasions (varo), the interaction effect of subjects and trials (varst), subjects and occasions (varso), trials and occasions (varto), and the residual error variance (varsto,e).
First, reliability was assessed for 1 single measurement, which refers to generalization over trials and occasions (G-study). Based on the variance components that were estimated with this G-study, the ICC and 95% confidence interval, SEM, and SDD were calculated as14,17:
The SEM and SDD were reported in the actual units of measurement and as a percentage of the group mean strength.
Second, to assess the implications for reliability when using different measurement designs (ie, averaging over k numbers of trials or occasions), a D-study was performed in which the variance components of 1 single measurement were divided by k.20 From these D-studies, we identified the most optimal design (ie, combination of numbers of trials or occasions) for the different muscle groups to reduce the SEM (and consequently SDD).
Results
Fourteen children (9 boys, 5 girls) were included, 8 of whom were diagnosed with spastic diplegia and 6 with spastic hemiplegia (4 left-sided, 2 right-sided). All children had a GMFCS classification of level I (n=8), II (n=3), or III (n=3). Mean (SD) body height and weight of the 14 participants were 150.0 (18.6) cm and 40.6 (15.5) kg, respectively. The average age was 10 years 2 months (2 years 4 months), ranging from 7 to 13 years.
All participants successfully completed 3 trials of isometric strength measurements on both occasions. Mean maximal exerted isometric muscle strength (N·kg−1) for the 5 lower-extremity muscle groups across the 3 trials and the 2 occasions are presented in Table 2. Mean isometric strength values ranged from 2.5 to 5.0 N·kg−1 on both occasions. The strongest muscle groups were the knee extensors, followed by the ankle plantar flexors and the hip flexors. Knee flexors and hip abductors showed the least strength.
Peak Isometric Strength Values (N·kg−1) Obtained With the Use of Handheld Dynamometry
Table 3 shows the variance components estimated in the G-study. Variation as the result of occasion was greater than variation as the result of trial for knee extension only, whereas the opposite was true for hip flexion. The interaction between occasion and subjects (indicating that variation as the result of occasion differs among subjects) showed the highest values. For the knee extensors and ankle plantar flexors, the interaction between subject and trial also caused variation in the measurement.
G-Study Values of the Variance Components for the Muscles of the Lower Extremitya
Table 4 shows the effect of measurement design on reliability and measurement error. The first row of the table represents a measurement design of 1 trial–1 occasion (G-study). The reliability and measurement error parameters for this design were calculated from the variance components presented in Table 3 (without dividing by factor k). Because there is no averaging over trials or occasions in this measurement design, the reliability and measurement error parameters reflect reliability of 1 single measurement and show the lowest reliability. The D-study results shown in Table 4 clearly indicate that the SEM is only slightly lower (0.3%–0.6% of the group mean strength) when averaging over 3 trials (results in the third row), compared with averaging over 2 trials (results in second row). Furthermore, averaging over 2 occasions (results in fourth row) beneficially reduces the SEM, compared with including additional trials in the average. Taking the example of the knee flexors, this approach indicates that reducing the SDD with 10% of the group mean strength requires an additional occasion. The third row of Table 4 corresponds to the measurement design of 3 trials–1 occasion, which is most commonly applied in clinical and research settings. With the use of this design, the ICCs were >.80 for all muscle groups, except for hip flexors (ICC=.77), and SEM values were substantially higher for the ankle plantar flexors and the hip abductors (17.0% and 16.6% of the mean, respectively), compared with the knee flexors (9.9%), hip flexors (12.0%), and knee extensors (11.3%).
Intra-Assessor Reliability of Lower-Extremity Isometric Muscle Strength in Children With Cerebral Palsy, Using Handheld Dynamometry for a Single Measurement (G-Study, Row 1) and for Different D-Study Designs (Rows 2–6)a
Discussion
This study evaluated the reliability of the use of HHD for lower-extremity muscle strength measurements in ambulant children with spastic CP. Our results show that reliability can be improved by averaging over 2 trials, with a third trial contributing little to a further increase in reliability. We also show that the largest improvement in reliability can be achieved by averaging over two occasions, on separate days.
Both the ICC and SEM were used to express reliability. The ICC indicated that HHD is proficient in differentiating individuals. However, because of the dependency of the ICC on sample heterogeneity,15 this statistic is not always conclusive when the purpose is the determination of changes in muscle strength, as previously noted by Taylor et al.10 The SEM better serves this purpose because it provides information in the units of measurement, aiding interpretation, and because it is independent of sample heterogeneity. In addition, the SEM facilitates the interpretation of whether changes (eg, caused by an intervention) exceed measurement error.17 In the case of the most commonly applied measurement design, 1 occasion–3 trials (third row of Tab. 4), the lowest SEM was seen for the knee extensors and knee flexors (11.3% and 9.9%, respectively) and the highest for the hip abductors and ankle plantar flexors (16.6% and 17.0%, respectively). These values for the knee extensors and the knee flexors are lower than previously reported values of measurement error in children with CP, whereas the value for the hip abductors is comparable to earlier reports.9 These differences might be due to differing disease severity or to test positions and standardization.
Averaging trials to improve reliability is common in longitudinal studies and in therapeutic settings. This practice is supported by the results of the current study, which demonstrates that averaging over 2 trials is the most efficient approach. When the SDD was expressed as a percentage of the group mean strength, values for the 2 trials–1 occasion design ranged from 29.1% for the knee flexors to 48.4% for the ankle plantar flexors. Averaging over 3 trials gave only slightly improved results (an SDD decrease of 1%–2% of the group mean strength). On the basis of these findings, we recommend 2 trials rather than 3, taking into account the condition of the child and the limited available time.
A greater improvement in measurement error can be achieved by taking the average of 2 trials on 2 different occasions, 2 to 5 days apart (ie, SDD decreases of 8.5%–13.8% of the group mean value). For example, the SDD of the ankle plantar flexors could be reduced from 48.4% to 34.8% by taking the average over 2 occasions. It is important to note that an additional measurement occasion is strongly recommended for both the ankle plantar flexors and the hip abductor muscles, because the SDD values of 1 measurement occasion are too large to detect individual changes, even when the average over 3 trials is calculated (>46%). Another important consideration is whether performing several strength measurements in several muscle groups on different days in children with CP is feasible, given the amount of time and effort required. Therefore, each particular situation calls for a trade-off between the required accuracy of the strength measurement and the demands placed on the patient.
Although averaging over 2 trials and 2 occasions improves measurement error, the detectable strength changes remain relatively large (>20.6% for knee flexors and >34.8% for ankle plantar flexors), and only changes that exceed these percentages can be defined as genuine change for a given individual. Because previous strength training studies in children with CP have reported lower extremity muscle strength increases of 11% to 74%,2,7,21 it can be concluded that HHD often will be insufficiently sensitive for the detection of individual strength gains.
This limitation may not be insurmountable, however, because evaluating strength gains in groups of children with CP rather than in individuals will decrease the SEM by a factor √n.16,20 This approach will allow the positive effects of muscular strength training programs to be more accurately assessed and the detection of strength changes as low as 11%. In an example from the present study, an SDD for hip abductors of 47.5% (2 trials–1 occasion) would require a group of ≥18 participants to detect a change of 11% from the group mean in muscular strength using HHD, because an SDD of 0.29 N·kg−1 (11%) requires the SEM to be 0.10 N·kg−1 in this example (see formula 3). To detect the same change in knee flexor strength, a group of ≥7 participants would be needed. Thus, in a sufficiently large group of children with CP, the SEM will be small enough to allow the detection of physiologically expected changes, and HHD becomes a reliable method for the detection of gains in isometric muscle strength.
The interpretation of our results is subject to certain limitations. The group size in the current study was insufficient to allow an accurate judgment regarding the heteroscedasticity of the data,22 which may have implications for the analysis of reliability. In heteroscedastic data, variability depends on the magnitude of the variable mean, requiring that the data be logarithmically transformed before any analysis of the reliability. Because visual inspection of the data did not reveal heteroscedasticity in any of our outcome measures, we do not expect that heteroscedasticity will be present in a larger group. However, the absence of heteroscedasticity should be confirmed in future research with a larger sample of children. The small sample size, in general, is a limitation of the study.12 Nevertheless, our results provide a rationale for measurement design, which might be further investigated in a larger study sample. Our results can only be generalized to a population that resembles our study sample (eg, GMFCS levels I–III). An important subject for future research will be the further determination of the reliability of strength measurements in more defined populations, something that will be immediately useful in clinical settings.
In conclusion, our results suggest that HHD strength measurements are reliable in differentiating individual children with CP. Furthermore, HHD measurements are sufficiently sensitive to detect changes in isometric muscle strength at group level, when the mean of at least 2 trials is taken. By additionally taking the average of 2 occasions on separate days, the technique also is sufficiently sensitive to detect large changes in individuals (>20% for knee flexors and >35% for ankle plantar flexors). When attempting to detect smaller changes in individuals, a third measurement occasion should be considered. These findings can serve as guidelines for clinical practice, when considering measurement designs, at least when the same measurement procedures are planned. Nonetheless, the evaluation of each individual patient and each muscle group calls for a trade-off between the required reliability of the strength measurement and the demands placed on the patient.
Footnotes
All authors provided concept/idea/research design. Ms Willemse, Dr Brehm, Dr Scholtes, and Dr Dallmeijer provided writing and data analysis. Dr Scholtes, Ms Jansen, and Ms Woudenberg-Vos provided data collection. Dr Brehm, Dr Scholtes, and Dr Dallmeijer provided project management. Dr Brehm and Dr Scholtes provided study participants. Dr Scholtes provided facilities/equipment. Dr Brehm and Dr Dallmeijer provided consultation (including review of manuscript before submission).
The study protocol was approved by the Medical Ethics Committee of VU University Medical Center, Amsterdam, the Netherlands.
- Received February 29, 2012.
- Accepted March 20, 2013.
- © 2013 American Physical Therapy Association