Abstract
Background A paucity of information exists on the psychometric properties of several balance outcome measures. With the exception of the Modified Functional Reach Test, none of these balance outcome measures were developed specifically for the population with spinal cord injury (SCI). A new balance assessment tool for people with SCI, the Activity-based Balance Level Evaluation (ABLE scale), was developed and tested.
Objective The purposes of this study were: (1) to develop a scale capturing the wide spectrum of functional ability following SCI and (2) to assess the initial psychometric properties of the scale using a Rasch analysis.
Design A methodological research design was used to test the initial psychometric properties of the ABLE scale.
Methods The Delphi technique was used to establish the original 28-item ABLE scale. People with SCI at each of 4 centers (n=104) were evaluated using the ABLE scale. A Rasch analysis was conducted to test for targeting, item difficulty, item bias, and unidimensionality. An analysis of variance was completed to test for discriminant validity.
Results The Rasch analysis revealed a scale with minimal floor and ceiling effects and a wide range of item difficulty capturing the large scope of functional capacity after SCI. Multiple redundancies of item difficulty were observed.
Limitations All raters were experienced physical therapists, which may have skewed the results. The sample size of 104 participants precluded a principal component analysis.
Conclusion Development of an all-inclusive clinical instrument assessing balance in the SCI population was accomplished using the Delphi technique. Modifications of the ABLE scale based on the Rasch analysis yielded a 28-item scale with minimal floor or ceiling effects. Larger studies using the revised scale and factor analyses are necessary to establish unidimensionality and reduction of the total item number.
A spinal cord injury (SCI) is a sudden, catastrophic, life-changing event. An estimated 12,000 new cases of SCI occur each year in the United States,1 and more than 1.2 million individuals are living with an SCI in the United States.2 An SCI results in some degree of sensation or motor loss below the level of the lesion, producing a balance impairment that affects the injured individual's ability to participate in functional activities and activities of daily living.3
Balance is difficult to assess, yet it is essential to the evaluation process. Few quantitative measures exist to adequately capture balance assessment in SCI. Studies utilizing forceplates and electromyography (EMG) capture changes in center of pressure and muscle activation patterns.4–7 Although these measures provide precise and quantitative data, time, equipment costs, and expertise needed for reliable use and interpretation of such data preclude their widespread utilization in the physical therapy clinic.8 Clinicians often turn to clinical outcome measures such as the Modified Functional Reach Test (MRFT)9 and the Berg Balance Scale (BBS)10 as indexes of balance and postural control post-SCI.11
The MFRT9 was adapted from the standing Functional Reach Test12 in an effort to differentiate levels of injury severity in nonambulatory individuals following SCI. Reliability of this outcome measure has been established in both the motor complete9,13 and incomplete14 SCI populations. The MFRT is easy and quick to administer, requires minimal equipment and training to perform accurately, and can be used in both ambulatory and nonambulatory people. The test assesses sitting balance only in the anterior-posterior plane and thus does not provide a complete assessment of functional sitting abilities. Further research is needed to establish the interrater reliability, validity, and minimal detectable change in the SCI population.
The BBS is a 14-item scale originally designed to assess fall risk in the elderly population.10 Although the psychometric properties of the BBS have been established for a wide range of neurologic populations, only 2 studies have examined its reliability and validity in the SCI population.11,15 Although these studies correlated the BBS with several walking indexes, both studies demonstrated a ceiling effect with the BBS. Another drawback to the BBS in this population is that there is only one sitting balance item. Therefore, for people with SCI who are unable to stand and walk, a floor effect will be observed.
The Functional Independence Measure (FIM) is an 18-item test used to assess the amount of assistance a person requires with transfers, walking, and several activities of daily living.16,17 The FIM is widely used in inpatient rehabilitation settings to measure burden of care and improvement in functional mobility in the SCI population.16 Although balance is a component of functional mobility tasks, the FIM does not specifically test balance. A person can improve in the functional mobility items on the FIM by compensating for paralyzed body parts or using adaptive equipment. Knowing only that a person requires a certain amount of assistance to transfer, or walk 45.7 m (150 ft) does not provide the clinician with useful information for evaluating balance deficits. Therefore, the FIM is not a sensitive measure for assessing balance in the SCI population.
In summary, forceplates and EMG recordings for the measurement of balance are not available for use in the typical physical therapy clinic. There currently are no outcome measures specifically developed and validated to assess balance abilities in the SCI population throughout the full spectrum of functional recovery. Clinical outcome measures that are currently utilized are limited in scope and present significant ceiling and floor effects. Therefore, there is a need for the development of a new balance outcome measure specific to the SCI population. The purposes of this study were: (1) to develop an all-inclusive clinical instrument, the Activity-based Balance Level Evaluation (ABLE scale), to assess balance across the full spectrum of recovery in the SCI population and (2) to determine the initial psychometric properties of the ABLE scale using a Rasch analysis.
Method
Scale Development
The initial ABLE scale was written by the primary authors (E.M.A., K.J.H., and G.P.Z.) based upon an extensive review of the literature in conjunction with clinical experience in administering the BBS and the MFRT to clients with SCI. The initial ABLE scale consisted of 30 items, which tested balance in the domains of sitting, standing, and walking.
This initial ABLE scale was further developed and refined through the use of a Delphi technique that seeks consensus among a group of experts using a series of questionnaires.18 There were 2 rounds of the Delphi technique plus a round of advanced critique by a panel of SCI researchers and educators. Experts in all 3 rounds were physical therapists who had at least 5 years of physical therapist practice, at least 2 years of evaluating and treating people with SCI, and at least 2 years of administering the BBS. Twenty-four experts participated in rounds 1 and 2 and were recruited anonymously from the 14 Model SCI Systems and from the 7 centers of the NeuroRecovery Network (NRN) and the NeuroPT listserve, an electronic mailing list operated by the Neurology Section of the American Physical Therapy Association.
In round 1 of the Delphi study, the experts were presented with the initial ABLE scale online via Seton Hall University's ASSET survey program. All experts recruited for the study were given instructions on how to access the survey via the ASSET platform and were given 2 weeks to complete the survey. The experts were presented with each item of the ABLE scale and were asked several questions regarding the item, including the importance of including the item, clarity of the wording, appropriateness of the scoring, and feasibility of administering the item in a physical therapy clinic. Experts also were provided with the opportunity to offer suggestions on improving each item and the scale as a whole. Through this process, content validity, which ensures that the test is free from the influences of factors that are irrelevant to the purpose of the measurement,19 and item reliability (internal consistency), which reflects the extent to which items measure various aspects of the same characteristic and nothing else,19 were established.
The results from the first round were reviewed by the research team. Although there is no universally agreed-upon percentage of agreement for consensus, the literature suggests that 70% to 80% is considered a reasonable guideline, and it is highly recommended that this level be set prior to the data analysis.20,21
Using an 80% agreement requirement for an item to be modified or deleted, the ABLE scale was revised. Nineteen of the 30 items reached an 80% consensus, 8 items were modified, and 3 items were deleted. The revised scale, noting the items modified or deleted, was posted online via ASSET. Experts were contacted again through either the supervisors at the Model SCI Systems centers and the NRN centers or through the NeuroPT listserve. The second survey presented each item of the scale, and the experts were asked to answer the questions following any item that had been modified.
Once the ABLE scale had gone through a 2-round Delphi review process with the clinical expert panel, a final review was conducted by an additional panel of 7 SCI researchers and educators to ensure that the scale would be appropriate for use in a clinical research setting. The latter panel of experts was asked to offer feedback on the clinical expert version of the scale developed via the Delphi process by responding to the relative importance and feasibility questions posed in the Delphi review process via ASSET. This final round resulted in what we considered a final Delphi review.
As a result of the 3 rounds of the Delphi technique, 3 items were removed from the scale, 1 item was added, and minor editorial changes were made. This process resulted in an ABLE scale with 28 items across the 3 functional domains of sitting, standing, and walking (Tab. 1).
Functional Activity Associated With Each Item of the Activity-based Balance Level Evaluation (ABLE Scale)
Participants
One hundred fifty-seven people were screened for inclusion, and a total of 104 individuals with SCI were included in this study.22 This was a sample of convenience, and participants were recruited from the inpatient and outpatient settings of Magee Rehabilitation Hospital, Philadelphia, Pennsylvania; Shepherd Center, Atlanta, Georgia; Kessler Research Center, West Orange, New Jersey; and Frazier Rehabilitation Institute, Louisville, Kentucky. Inclusion criteria specified that participants be at least 16 years of age and have a traumatic or nonprogressive, complete or incomplete SCI. Exclusion criteria, which disqualified 53 potential participants, included: inability to follow 2-step commands, need for a spinal stabilization device, spinal precautions that limit the ability to bend or rotate the thoracic or lumbar spine, and inability to tolerate upright supported sitting for at least 1 minute. We certify that all applicable institutional and governmental regulations concerning the ethical use of human volunteers were followed during the course of this research.
Procedure
To ensure standardization of the scoring and administration of the ABLE scale across the data collection centers, the primary investigator (E.M.A.) provided an in-person instructional session and responded via telephone call or e-mail to any concerns the therapists had regarding the administration and scoring of the ABLE scale.
All participants were asked for their consent to participate by the primary investigator or one of the designated physical therapists at the 4 data collection sites. Participants were tested on the ABLE scale in a single session in a quiet, designated area in each of the data collection sites. The ABLE scale was administered to each participant based upon the instructions for each item (eAppendix). The equipment used for testing was standardized across all centers according to the directions noted at the beginning of the scale. Participants were not allowed to use their personal wheelchair for items 7 and 8 and were asked to sit in a standard manual wheelchair provided by the clinic. Participants were positioned with their hips, knees, and ankles at 90 degrees in a wheelchair with a sling back to approximately scapular height and a solid seat. This positioning was done to prevent the influence of a participant's customized seating system on his or her balance. Participants who could complete only the sitting balance subscale were finished within 15 minutes, whereas those who could complete all 3 subscales required up to 45 minutes. The data for each individual were recorded in a standardized Excel spreadsheet (Microsoft Corporation, Redmond, Washington) and sent to the primary investigator. Data collection took place between May 2009 and November 2009. All participants were blinded as to the other participants in the study.
Data Analysis
After reaching consensus via the Delphi process, a Rasch analysis of scale scores was completed to further assess and develop the scale. Rasch analysis is a statistical model that can estimate the person “ability” and item “difficulty” of a measurement tool by comparing the responses of individuals with those of the entire sample.23 This model provides a method to analyze and improve a rating scale.24 Rasch analysis uses 2 values: the logit, which is the natural logarithm of the odds of a person being successful on a particular item, and fit statistics.23 Infit and outfit statistics determine how well raw data meet the requirements of the Rasch model.23 In the Rasch model, we would expect people with higher abilities to achieve higher scores on difficult items. People with lower abilities would be expected to score lower on difficult items. A Rasch analysis is used to test specific properties of a rating scale, including unidimensionality, item bias, targeting, and item difficulty. Unidimensionality, as measured by fit statistics, is the concept that all items on the scale are measuring the same construct, in this case, balance. Differential item functioning (DIF) tests for item bias by examining the estimates, or ability levels, for different groups of individuals.23 In this study, we tested for item bias across sex, age, and American Spinal Injury Association (ASIA) Impairment Scale (AIS) classification.25 Targeting reveals the range of difficulty of the items that correspond to the range in ability noted in the study population. It ensures that there are items that are appropriate to test every level of person ability. Testing the item difficulty may reveal redundant items or items that appear to have the same level of difficulty.23 Using a Rasch analysis to test item difficulty allows for the items to be placed in a hierarchy.23 Furthermore, to determine whether any changes needed to be made to any items, each item's rating scale categories, or scoring levels, were examined using threshold ordering.
Each item on the ABLE scale has distinct definitions for each rating scale category, so that a score of 1 on one item is not equal to a score of 1 on a different item.26 In order to correctly place the items on the scale according to level of difficulty, the rating scale categories need to be aligned. Pivot anchoring is a process of aligning these differently worded rating scale categories to assist in defining the difficulty of each item. Pivot anchoring consists of first assigning a point in each item's rating scale in which the categories represent passing or failing an item. For the ABLE scale, passing was defined as the ability to complete the specified task according to the item's instruction, without physical assistance or supervision. For example, passing item 1 was defined as “able to sit with posterior pelvic tilt for 2 minutes, independently,” or a score of 3, whereas passing item 6 was determined to be a score of 4. Using these definitions, pivot points were defined for each item's rating scale and are boldfaced in the ABLE scale (eAppendix). These passing points then are anchored to a common value for all items on the scale, and the item difficulties are recalibrated across the scale.26
A one-way analysis of variance (ANOVA) was performed to test the hypothesis that the person ability levels, or estimates, for 3 functional groups of wheelchair users, standers, and walkers were equal. Multiple comparisons were completed using the Bonferroni procedure. Descriptive statistics were used to analyze the demographic data, including age, sex, time since injury, severity of SCI, and functional level.
All demographic data and the ANOVA results were analyzed with Statistical Software for the Social Sciences (SPSS), version 14.0 (SPSS Inc, Chicago, Illinois). The Rasch analysis was completed using WINSTEPS software, version 3.68.2 (Winsteps, Chicago, Illinois).
Results
Demographics
One hundred four participants were tested once on the ABLE scale. Table 2 summarizes the demographic characteristics. Participants were stratified into 3 distinct categories based upon functional ability. Individuals who were unable to stand or walk (n=42) were classified as “wheelchair users,” those who could stand for at least 10 seconds with minimal to no physical assistance (n=30) were classified as “standers,” and those who could ambulate at least 6.1 m (20 ft) without an assistive device or physical assistance (n=32) were classified as “walkers.”
Demographic Characteristics of the Participants With Spinal Cord Injury (SCI) (n=104)
Threshold Ordering
An examination of the category threshold measures, in logits, was made to identify any disordered thresholds (ie, response categories that were utilized in a manner inconsistent with the trait being measured). On the ABLE scale, response category 3 should have a higher logit value than response category 2, indicating that category 3 is more difficult than category 2. Table 3 displays the 5 items with disordered thresholds. In items 4, 10, 13, and 14, category 2 had a higher measure than category 3. In item 11, category 1 had a higher measure than category 2.
Items With Disordered Thresholdsa
Consequently, we reviewed these disordered thresholds to see what changes could be made. The response categories of 2 and 3 for item 4 were reversed, as this made sense clinically. This change resulted in an improved fit of item 4, as the outfit value improved from 0.50 to 0.85. Review of the other items with disordered thresholds determined that reversing the response categories did not make sense clinically. Therefore, we rewrote these response categories, and we are retesting these items in a follow-up study. The revised version of the ABLE scale is presented in the eAppendix. (See a video demonstrating selected items from the ABLE scale.)
Unidimensionality
Although recent studies suggest that unidimensionality should be determined through a combination of Rasch fit statistics and principal component analysis (PCA) residuals, our sample size was too small to conduct a PCA.27,28 The fit statistics reported here are rudimentary analyses of the unidimensionality of this scale. Table 4 shows the infit and outfit mean square values for all of the items of the ABLE scale. Two items, 7 (transfers) and 8 (seated wheelchair perturbations), were determined to have infit mean square values of >1.4, suggesting that these items may be measuring a construct other than functional balance. Items with an outfit mean square value of <0.6 are considered to be less efficient in measuring the construct. Although these items are not a threat to the validity of the scale, they may produce deceptively high reliability estimates. Seventeen items had outfit values of <0.6: 2 (seated forward reach), 4 (pick up object in sitting), 6 (posterior external perturbations in sitting), 9 (sit to stand), 11 (stand to sit), 13 (standing with feet together), 15 (standing forward reach), 18 (turn 180°), 19 (alternate step test), 21b (left single-leg stance), 22 (walking over level surface), 23 (walking with head turns), 24 (walking with change in direction), 25 (stepping over object while walking), 26 (walking with object in 2 hands), 27 (walking up/down stairs), and 28 (walking up/down incline). Items with an outfit value of >1.4 are a greater threat to validity and represent outliers. Four items had an outfit value of >1.4: 3a and 3b (seated lateral reach to the right and left), 7 (transfers), and 8 (seated wheelchair perturbations). Therefore, these items should be tested further using a factor analysis with a larger sample size.
Mean Square Values for Each Item of the Activity-based Balance Level Evaluation (ABLE Scale)
Targeting and Item Difficulty
Rasch analysis places item difficulty and person ability along the linear continuum of a logit scale. The Figure is a person-item map that displays the item difficulty and person ability of the ABLE scale for 104 participants with SCI after pivot anchoring was applied. To the left of the dashed line are the person ability measures, and to the right of the dashed line are the item measures placed longitudinally by degree of difficulty in “passing” the item (see “Method” section). Each item is represented by its corresponding number on the ABLE scale (Tab. 1). The participants with the lowest balance ability are located at the bottom of the scale, whereas those with the highest ability are located at the top of the scale. Similarly, the easiest items are located at the bottom of the scale and the most difficult items are positioned at the top of the scale.
Person-item map for the 28 items of the Activity-based Balance Level Evaluation (ABLE scale) as tested on 104 individuals with spinal cord injury. Each “.” is one participant, each “#” is 2 participants.
Targeting compares the range of item difficulties with the range of person abilities. An extremely large range of abilities were identified in this sample, which reflects the wide range in abilities observed following SCI. A slight ceiling effect still existed, as there were no items to measure the one subject with abilities greater than 6 logits (Figure). There also was a slight floor effect, as there were no items to measure the one participant with an ability of less than −7 logits.
Analysis of item difficulty revealed that the most difficult item is 21a (right single-leg stance). The easiest item is 5 (scooting forward in a chair), which is located on the −7 logit. Several item redundancies were noted at the −1, 0, 2, and 3 logits (eg, 7 different items had a similar level of difficulty located on logit 0).
DIF
We used DIF to determine whether any items were biased according to sex, age, or AIS classification. The DIF effect sizes for sex and age were negligible and did not reach statistical significance for any item. When DIF was examined by AIS classification, 2 items (7 and 8) showed significant bias for the AIS C group (P<.05).
Discriminant Validity
A one-way ANOVA was conducted on the person estimates to assess whether the ABLE scale differentiates among the 3 distinct functional groups. The average person estimates were found to be different across groups (F2,101=258.37, P<.0001). Bonferroni post hoc comparisons performed at the .05 level of significance showed that the mean person ability for the “walker” group (X̅=3.64, SD=1.66, n=32) was significantly higher than for the “stander” group (X̅=−0.13, SD=1.04, n=30) and for the “wheelchair-user” group (X̅=−4.08, SD=1.54, n=42). The mean person ability for the stander group also was found to be significantly higher compared with the wheelchair-user group at the .05 level of significance.
Discussion
The purpose of this study was 2-fold. First, our goal was to develop an all-inclusive clinical instrument to assess balance in the SCI population, which was accomplished via the Delphi technique. The second purpose was to determine what modifications needed to be made to the scale based upon the initial properties of unidimensionality, targeting, item difficulty, and item bias identified with the Rasch analysis. We also tested the scale's ability to discriminate among the 3 groups of participants stratified across functional ability.
Analysis of the fit statistics suggests that 2 items (7 and 8) measure a construct other than balance. These 2 items, along with items 3a and 3b, also had a high outfit value, which implies that they are outliers. Our small sample size precluded performing a PCA, and a follow-up study on a larger sample is warranted to determine whether these items should be removed from the scale.
Analysis of the item map (Figure) shows that the ABLE scale has an appropriate targeting range, with minimal floor and ceiling effects. In an attempt to further minimize the ceiling effect, an additional walking item (walking during perturbations) was added following this analysis. An appropriate targeting range is important, as there currently is no outcome measure that can capture the full spectrum of recovery in the SCI population. Datta et al29 found floor and ceiling effects with the BBS and suggested the development of a new balance scale for this population. The large spread of item difficulty will allow a clinician to use a single outcome measure with a patient throughout his or her entire recovery. For example, a patient in the acute phase of recovery who may just be regaining sitting balance can be assessed using the sitting balance subscale. As he or she progresses, not only can progression of sitting balance be tested, but standing or walking items, scored specifically for people recovering from SCI, also can be incorporated into testing.
The analysis of item difficulty revealed 4 logits in which there were multiple redundancies. Some of these redundancies may have been caused by disordered thresholds of 5 items, as well as by a decreased ability to discriminate among scoring criteria in several of the items. Upon completion, scoring criteria were revised for the items with disordered thresholds, as well as for the items with outfit values of <0.6, to improve separation, clarity, and accuracy in scoring. Given the large number of items on the scale, we were not surprised to see some overlap in item difficulty levels. To address this overlap, further testing is needed on a larger sample to conduct a factor analysis, which would allow for reduction in number of items.
Analysis of item bias through DIF revealed only 2 items (7 and 8) with significant bias in individuals with AIS C classification. These 2 items may be unfairly difficult for this group of individuals. It is unclear whether the problem lies with the items themselves, as individuals with AIS C classification often have a complicated pattern of recovery and may be inconsistent in performance of functional tasks. However, as these items also had high infit and outfit statistics, they may be removed from a future version of the scale, after a PCA is completed.
One major strength of the ABLE scale is its ability to discriminate among individuals, not based on injury severity (eg, AIS classification), but by functional mobility levels. Several studies of the MFRT showed that it is able to discriminate among injury severities, but does not differentiate or correlate with functional mobility.9,13,30 The use of the ABLE scale will provide the clinician with a more detailed assessment of a client's balance abilities.
There were several limitations to this study. First, all of the participants were tested by raters who were experienced in administering balance assessments to the SCI population. It is unclear how these individuals might have been rated by physical therapists with less experience in balance assessment or the rehabilitation of people with SCI. The use of less experienced raters may have resulted in increased difficulty in distinguishing among the different rating scale categories for each item. As the purpose of this study was to determine what changes need to be made to the ABLE scale, experienced raters were specifically chosen so that reliable assessments of the participants could be made and would not influence the outcome of the study.
A second limitation was the sample size of 104 participants. Although this sample size has been shown to be appropriate for conducting a Rasch analysis of an outcome measure with 20 items, it precluded performing a PCA.22,26,27 Therefore, this study was only the first step in assessing the validity of this new instrument. In the future, a PCA will be completed on a larger sample of participants to further develop the unidimensionality of the scale and to ensure that all of the items on the ABLE scale measure balance, and not another related construct.
This study identified several weaknesses of the initial ABLE scale, and several of the redundant and poorly fitting items were rewritten to improve their clarity. This modified version of the ABLE scale is currently being tested on a larger sample in a multicenter format in order to conduct a PCA and reduce the total number of items of the scale. Once the PCA is completed, further research should be conducted to examine other psychometric properties. Intrarater and interrater reliability should be established for the ABLE scale in the SCI population using both experienced and novice clinicians. Concurrent validity of the ABLE scale with other currently utilized outcome measures, including the BBS and the MFRT, should be assessed. Finally, fall incidence and performance on the ABLE scale should be correlated to determine whether the ABLE scale has the sensitivity or specificity needed to predict fallers in the SCI population.
Conclusion
Currently, there is no clinical outcome measure that has been designed specifically to assess balance in the SCI population. This study was the first step in developing a scale that can assess balance across the full spectrum of recovery in this population. Although the Rasch analysis showed that the ABLE scale has an appropriate targeting range and discriminate ability, further study is needed to ensure that it is a unidimensional and valid scale.
Footnotes
-
Dr Ardolino, Dr Hutchinson, Dr Pinto Zipp, and Dr Harkema provided concept/idea/research design. Dr Ardolino, Dr Hutchinson, Dr Pinto Zipp, and Dr Clark provided writing and data analysis. Dr Ardolino and Dr Harkema provided data collection. Dr Ardolino provided project management. Dr Harkema provided study participants. Dr Clark and Dr Harkema provided institutional liaisons. Dr Hutchinson and Dr Clark provided consultation (including review of manuscript before submission).
-
The authors thank the staff and patients at Magee Rehabilitation, the Shepherd Center, Kessler Rehabilitation, and Frazier Rehabilitation for their time and participation in this study. They also thank the Balance Committee of the NeuroRecovery Network for their assistance in initiating the development of the ABLE scale.
-
Approval for the study was granted by the institutional review boards of Magee Rehabilitation Hospital, Shepherd Center, Kessler Research Center, Frazier Rehabilitation Institute, and Seton Hall University.
- Received August 12, 2011.
- Accepted May 3, 2012.
- © 2012 American Physical Therapy Association