Abstract
Background Goniometric measurements of hemiplegic arm joints must be reliable to draw proper clinical and scientific conclusions. Previous reliability studies were cross-sectional and based on small samples. Knowledge about the contributions of sources of variation to these measurement results is lacking.
Objective The aims of this study were to determine the interobserver reliability of measurements of passive range of motion (PROM) over time, explore sources of variation associated with these measurement results, and generate smallest detectable differences for clinical decision making.
Design This investigation was a measurement-focused study with a longitudinal design, nested within a 2-arm randomized controlled trial.
Methods Two trained physical therapists assessed 7 arm movements at baseline and after 4, 8, and 20 weeks in 48 people with subacute stroke using a standardized protocol. One physical therapist performed the passive movement, and the other read the hydrogoniometer. The therapists then switched roles. The relative contributions of several sources of variation to error variance were explored with analysis of variance.
Results Interobserver reliability coefficients ranged from .89 to .97. The PROM measurements were influenced by error variance ranging from 31% to 50%. The participant × time interaction made the largest contribution to error variance, ranging from 59% to 81%. Smallest detectable differences were 6 to 22 degrees and were largest for shoulder movements.
Limitations Verification of shoulder pain and hypertonia as sources of error variance led to a substantial number of unstable variance components, necessitating a simpler analysis.
Conclusions The assessment of PROM with a standardized protocol, a hydrogoniometer, and 2 trained physical therapists yielded high interobserver reliability indexes for all arm movements. Error variance made a large contribution to the variation in measurement results. The resulting smallest detectable differences can be used to interpret future hemiplegic arm PROM measurements with more confidence.
Of the 15 million people who have a stroke each year worldwide, between 77% and 81% of the survivors have a motor deficit in the extremities.1 The affected arm remains without function in almost 66% of survivors,2,3 rendering it inactive and immobilized. In recent years, several interventions believed to improve motor recovery or limit the development of secondary impairments in the paretic or paralyzed arm after stroke have been evaluated.4,5
To assess the arm function of patients with stroke during rehabilitation and in clinical research, physical therapists regularly assess passive range of motion (PROM) of joints by means of goniometry. In particular, the degree of passive shoulder external rotation and abduction and wrist extension are commonly used as outcome measures to evaluate the effects of interventions.6–13 Reliable measurement of PROM is therefore an important prerequisite for the interpretation of study results.
The reliability of arm range-of-motion measurements is good in people who are healthy14,15 and in patients with orthopedic conditions,16,17 but these findings cannot be generalized to patients with stroke because stroke-specific impairments may influence reliability. Over time, many patients develop contractures10,18 and hypertonia,19,20 especially in shoulder internal rotators and wrist flexors. Many patients also develop shoulder pain, a condition strongly associated with restricted range of motion.21,22 The aforementioned factors may hinder a therapist's attempts to move the hemiplegic arm, hence increasing the chance of making measurement errors. Such errors also may be increased if PROM measurements are obtained by only 1 therapist because it is difficult to handle a paralyzed arm and the goniometer and read the measurement simultaneously. Goniometric measurements of arm joints reflect both the true range of motion of a joint and measurement errors caused by different sources of variation. Identifying and quantifying these sources are important for finding strategies to reduce their influence on outcomes.23 In addition, to ensure accurate clinical interpretation of joint PROM measurements and changes in these measurements over time during poststroke rehabilitation or research, PROM measurements should be studied in the context of these sources of variation.
In previous studies of arm PROM reliability in patients with stroke, sample sizes have not exceeded 18 people.24,25 To our knowledge, research into factors that may influence hemiplegic arm PROM measurements is also lacking. During a randomized controlled trial (L. de Jong, P. Dijkstra, J. Gerritsen, et al, unpublished data, 2012), 2 physical therapists (hereafter referred to as “observers”) assessed arm joint PROM in 48 people on 3 occasions over 20 weeks. This design presented us with the opportunity to explore interobserver reliability, analyze the contributions of sources of variation to the measurement results, and calculate smallest detectable differences (SDDs). We chose to use 2 observers because we hypothesized that doing so would result in fewer measurement errors than using 1 observer only and because a similar measurement procedure previously yielded high reliability indexes.25
Method
As part of a randomized clinical trial investigating an arm intervention for people with subacute stroke and poor arm recovery, we used an existing measurement protocol that was specifically designed for measuring the PROM of 7 arm movements. All participants gave written informed consent before participation.
Participants
Participants were recruited from 3 Dutch rehabilitation centers between August 2008 and September 2010. All admitted participants were initially screened by a physician to check the following inclusion criteria: first-ever stroke or recurrent stroke (except for subarachnoid hemorrhages) between 2 and 8 weeks after the initial stroke, age of 18 years or older, paralysis or severe paresis of the involved upper limb (Brunnstrom stage of recovery of <4,26 as judged by the physician), and no planned date of discharge within 4 weeks. Participants meeting these criteria were referred to a research physical therapist, who excluded those with any contraindications for electrical stimulation, preexisting impairments of the affected arm (eg, frozen shoulder), severe cognitive deficits or language comprehension difficulties or both (<3/4 correct verbal responses or <3 correct visual analog scale scores on the AbilityQ27), and moderate to good arm motor control (scores of >18/66 on the Fugl-Meyer Assessment arm section28). After eligibility was confirmed, half of the participants were randomized to an experimental group, and half were randomized to a sham intervention group (L. de Jong, P. Dijkstra, J. Gerritsen, et al, unpublished data, 2012).
Observers
The 2 observers (both senior physical therapists) had 14 and 27 years of experience, respectively, across a wide range of diagnoses, including stroke. Before the trial, the observers were trained in obtaining the measurements using a detailed measurement protocol (the protocol, in Dutch, is available from the first author). They pretested the protocol on 3 participants with stroke. The observers had no pretrial experience with a hydrogoniometer and were not involved in the design of the study or the treatment of the participants.
PROM Measurement Procedure
All PROM measurements were obtained with a masked fluid-filled hydrogoniometer (MIE Medical Research Ltd, Leeds, United Kingdom). The measurement procedure was similar to the one described in detail in an earlier publication25 but was expanded to include wrist extension assessments. Each participant was independently assessed by the 2 observers at baseline and after 4, 8, and 20 weeks. Each time, 1 observer carried out the passive movement, and the other observer read the goniometer. The observers then switched roles. They were unaware of each other's results because they used separate score sheets and were instructed not to discuss or mention the values found. The measurement sequence was as follows: shoulder external rotation, shoulder flexion, and elbow extension with the participant in the supine position and then shoulder abduction, forearm supination, and wrist extension with and without finger flexion while the participant sat on an adjustable plinth with the back supported. The observers carried out all measurements in the same fixed order.
Data Analysis
The variance components and their 2-way interactions were calculated for the measurement conditions of participants (n=48), time (4 assessments over time), and observers (n=2) by analysis of variance (type III sum of squares). Initially, the allocated intervention was also included in the calculation of variance components. However, for shoulder PROM, the variance component for intervention could not be estimated, indicating a redundancy. We therefore decided not to include intervention in the calculations of variance components. In case of missing data (eg, because of participant dropout or vacation taken by 1 of the observers), only data from participants who were assessed by both observers were used in the analysis.
Error variance was calculated as the sum of all variances minus participant variance. The relative contributions of the sources of variation to this error variance were expressed as percentages. The agreement between the PROM ratings of the observers was calculated (see Streiner and Norman23[p159] for formulas) by means of interobserver reliability coefficients and accompanying 95% confidence intervals (CIs). Because the reliability coefficients alone did not indicate the magnitude of disagreement between the observers, the standard errors of measurement (SEMs) [
Role of the Funding Source
This study was funded by a grant from Fonds Nuts Ohra (main study, project SNO-T-0702-72) and Stichting Beatrixoord Noord-Nederland. Both funding sources had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Results
Figure 1 shows the flow of participants through each stage of the trial. The characteristics of the 48 participants are shown in Table 1. In general, they had restrictions in PROM for all 7 arm movements, especially shoulder movements. They had a median score of 5.5 on the arm section of the Fugl-Meyer Assessment.
Flow of participants through each stage of the trial from initial screening by physician to follow-up measurement. *If a participant was excluded for more than 1 reason, then all reasons were reported separately. †Five participants were assessed by 1 observer only. ‡One participant missed the 4-week assessment because of poor weather conditions. §Four participants were assessed by 1 observer only, and 1 participant was not assessed at 8 weeks because of temporary admission to a hospital. ∥One participant was assessed by 1 observer only.
Baseline Characteristics of the 48 Participantsa
Figure 2 shows the separate variance components for the results obtained from shoulder external rotation as an example. The contribution of error variance (Tab. 2) to total variance ranged from 31% (wrist extension with flexed fingers) to 50% (supination). The interaction of participant and time made the largest contribution to error variance, ranging from 59% (forearm supination) to 81% (elbow extension). Time made a smaller contribution to error variance, especially for shoulder movements (17%–24%) and forearm supination (19%). Time did not contribute to the variance in the elbow joint. The interaction between participants and observers contributed only marginally to error variance (0%–4%); the same was true for the main effect of observers (0%–2%). Residual (unexplained) variance contributed between 7% and 17% to error variance, and this contribution was generally lowest for shoulder movements. Table 3 shows the overall interobserver reliability coefficients (and 95% CIs) and SEMs and SDDs in both single sessions (“observers”) and overall for the 7 arm movements.
Variance components of shoulder external rotation. Total variance (left circle) comprised participant variance (main effect) and error variance. Several sources contributed to error variance. These sources (right circle) comprised main effects (time and observer), interaction effects (participant × time, participant × observer, and time × observer), and residual variance, all expressed as percentages of error variance.
Estimated Variance Components and Their Contributions (in Percentages) to the Error Variance of Repeated Measurements of 7 Arm Movements (n=48)
Interobserver Reliability Coefficients (and 95% Confidence Intervals), Standard Errors of Measurement (SEMs), and Smallest Detectable Differences (SDDs)a
Discussion
When different observers independently assess a joint range that does not change over time, interobserver reliability generally will be good provided that standardized protocols17 are used and the observers are trained.29 In addition to common sources of measurement variation, the development of contractures, hypertonia, and shoulder pain may complicate and negatively influence the reliability of PROM measurements in patients after stroke. We found that PROM assessment with a standardized protocol, a hydrogoniometer, and 2 trained observers yielded high interobserver reliability indexes (.89–.97) for 7 arm movements. We also found that error variance made a large contribution (31%–50%) to the variation in measurement results, with the participant × time interaction being the largest source of variance. The SDDs ranged from 6 to 22 degrees and were largest for shoulder movements.
Interobserver Reliability
The interobserver reliability of the 2 observers was high for all 7 arm movements. These results are in concordance with previous findings.25,30 The reliability coefficient for shoulder abduction (.97) was higher than previously reported values (intraclass correlation coefficients=.84–.87),25 and the reliability coefficient was lowest for forearm supination (.89). Supination intraclass correlation coefficients were higher than previously reported values (.94–.98),25 but the accompanying 95% CIs were wider (.84–.98). Because all of our measurements were obtained with the same measurement protocol,25 the values that we obtained may have resulted from the use of a larger sample. Differences in sample size may also explain the narrower 95% CIs (.89–.95) for elbow extension measurements in the present study than in a recent study (.68–.97)24 of 13 patients with stroke and elbow flexor spasticity. Because larger samples generally yield more precise estimates of reliability coefficients (indicated by narrower CIs and smaller SEMs), the results of the present study can be interpreted with more confidence than the results of previous studies.
To our knowledge, the reliability of wrist movements has not been reported in patients with stroke. We found that the assessment of wrist extension revealed slightly higher reliability coefficients and slightly lower SEMs when the fingers were flexed instead of extended. The long finger flexors typically show increased resistance to passive stretch (hypertonia), possibly partly because of the rapid development of wrist flexor contractures.10,31 This condition occurs especially in patients with limited arm function and clearly applied to our participants. Therefore, wrist flexor hypertonia or contracture may have had a slight negative influence on the reliability of the assessments of wrist extension with extended fingers. This hypothesis is supported by the fact that residual variance (to which wrist flexor hypertonia or contracture may also have been a contributing factor) accounted for 16% of the error variance of the PROM measurements; when the fingers were flexed, the value was 13%. In conclusion, the resulting high reliability coefficients suggested that our standardized measurement protocol may be of use for other observers under comparable circumstances.
Variance Components
While assessing 7 arm movements on 4 occasions during a 20-week time period, we found that the participants in our sample were the largest source of variance. This finding indicates that the participants could be distinguished on the basis of their arm PROM; they had a large variety of arm joint ranges. Error variance explained between 31% and 50% of total variance in the PROM values. Overall, time and the participant × time interaction were responsible for more than 78% of the variation in measurement results, with the participant × time interaction contributing the most. This interaction effect indicates that the effects of time on PROM of the arm were different in different participants, in accordance with clinical observations. In some participants, PROM increased over time probably as a result of natural neurological recovery or rehabilitation, whereas in other participants, PROM may have increased over time as a result of contracture formation. The main effects of time and observers did not contribute to the variation in the results for elbow extension PROM. For the latter, the participant × time interaction (81%) and random variance (17%) made large contributions to error variance. Clinically, this finding indicates that over time, elbow extension developed quite differently in the participants.
Observers contributed only marginally to the variation in measurement results, with a maximum of 4% (forearm supination). This finding indicates that the differences between the values obtained by the 2 observers were small, resulting in high interobserver reliability coefficients. The fact that 1 observer performed the passive movement and the other positioned and read the goniometer may have led to this finding. On the basis of these results, we argue that arm PROM assessments with a hydrogoniometer in patients after stroke should be performed by 2 observers. Clinically and economically, assessments by 2 raters may not always be practical or feasible.25 Therefore, clinical and economic arguments must be weighed against scientific arguments (reliability) in each situation. Further research is needed to analyze the influence of the number of observers on measurement results. Residual variance in the PROM measurements in our sample may be explained partly by random variations in PROM over time within a participant but may also have been caused by random variations in the force applied by the observers or the alignment of the hydrogoniometer between measurements.
SDDs
Overall, the SDDs ranged from 3 degrees to 22 degrees and were largest for shoulder movements. Taking shoulder external rotation as an example, these data mean that a change of 17 degrees or more over a period of 20 weeks (overall SDD) represents a change in PROM with 95% certainty. Physical therapists and clinicians can use the overall SDD to evaluate their patients' changes in arm PROM between admission and discharge. Similarly, researchers can use them to interpret changes in participants in clinical trials. The SDDs obtained in single sessions by our 2 observers also may serve another purpose. Taking elbow extension as an example, our results show that a difference of more than 3 degrees between 2 observers in 1 session indicates a significant difference in their measurements with 95% certainty.
In stroke research, the Modified Tardieu Scale32 is increasingly being used to differentiate muscle contracture from spasticity. Because this scale relies partly on PROM measurements, the SDD can be used as a threshold value that must be exceeded to ascertain with 95% confidence that the angles between R1 (“catch”) and R2 (“end range”) are significantly different and that spasticity is indeed present. Similarly, the overall SDD for elbow extension (7°) can be used to indicate significant changes in elbow PROM over longer periods of time. Comparing our SDDs with those reported in the literature24,25 is hindered partly by the influence of sample sizes on SEMs (larger samples produce smaller SEMs) and therefore SDDs (smaller SEMs produce smaller SDDs). Because of our larger sample, our data can be used to interpret differences or changes in PROM with more confidence.
Limitations
An important limitation of the present study is that half of our participants were allocated to a combination intervention consisting of static muscle stretch and electrical stimulation. Although the results of this intervention were not significantly different from those of a sham intervention and the variance component for intervention could not be estimated, we cannot rule out the possibility that the development of the outcomes over time was confounded by the intervention and therefore that the intervention contributed to residual variance. Initially, we also tried to verify whether shoulder pain and hypertonia of shoulder internal rotators, elbow flexors, and wrist flexors were sources of error variance. However, adding these variables to the statistical analysis led to a substantial number of unstable variance components. Therefore, we chose to analyze a simpler model. The best-fitting model was subsequently applied to all other arm movements by setting all negative variances to 0. Future research is needed to verify which factors are actually responsible for random variance, for example, by comparing patients with and without contractures, hypertonia, and pain. Another limitation is that, despite pretrial training, we cannot say for certain whether the competence of our 2 observers had any influence on the study results.
We selected people with stroke and poor recovery of arm motor control. A median score of 5.5 on the Fugl-Meyer Assessment arm section at about 6 weeks after stroke means that a patient typically shows only hyperreflexia or (partial) mass synergy patterns, which are usually dominated by shoulder internal rotation and elbow and finger flexion, at best. Although our results can be generalized only to similar groups of patients, such patients represent about 36% to 52% of those with subacute stroke between 2 weeks and 3 months after stroke.19 Finally, our results may indicate reliability within observers because it is generally recognized that intraobserver reliability is bound to be higher than interobserver reliability.23
Footnotes
-
Mr de Jong and Dr Postema provided concept/idea/research design and fund procurement. All authors provided writing and data analysis. Mr de Jong provided data collection and project management. Dr Dijkstra and Dr Postema provided institutional liaisons. The authors thank all of the study participants. Special thanks go to observers Ank Mollema and Marian Stegink.
-
This study was approved by the Medical Ethics Committee of the University Medical Center Groningen (project METc 2008.107).
-
This study was funded by a grant from Fonds Nuts Ohra (main study, project SNO-T-0702-72) and Stichting Beatrixoord Noord-Nederland.
-
The main randomized controlled trial is registered at the Dutch Trial Register (Unique Identifier: NTR1748) (available at: http://www.trialregister.nl/trialreg/index.asp).
- Received September 1, 2011.
- Accepted April 27, 2012.
- © 2012 American Physical Therapy Association