Abstract
Background The Balance Evaluation Systems Test (BESTest) and Mini-BESTest are clinical examinations of balance impairment, but the tests are lengthy and the Mini-BESTest is theoretically inconsistent with the BESTest.
Objective The purpose of this study was to generate an alternative version of the BESTest that is valid, reliable, time efficient, and founded upon the same theoretical underpinnings as the original test.
Design This was a cross-sectional study.
Methods Three raters evaluated 20 people with and without a neurological diagnosis. Test items with the highest item-section correlations defined the new Brief-BESTest. The validity of the BESTest, the Mini-BESTest, and the new Brief-BESTest to identify people with or without a neurological diagnosis was compared. Interrater reliability of the test versions was evaluated by intraclass correlation coefficients. Validity was further investigated by determining the ability of each version of the examination to identify the fall status of a second cohort of 26 people with and without multiple sclerosis.
Results Items of hip abductor strength, functional reach, one-leg stance, lateral push-and-release, standing on foam with eyes closed, and the Timed “Up & Go” Test defined the Brief-BESTest. Intraclass correlation coefficients for all examination versions were greater than .98. The accuracy of identifying people from the first cohort with or without a neurological diagnosis was 78% for the BESTest versus 72% for the Mini-BESTest or Brief-BESTest. The sensitivity to fallers from the second cohort was 100% for the Brief-BESTest, 71% for the Mini-BESTest, and 86% for the BESTest, and all versions exhibited specificity of 95% to 100% to identify nonfallers.
Limitations Further testing is needed to improve the generalizability of findings.
Conclusions Although preliminary, the Brief-BESTest demonstrated reliability comparable to that of the Mini-BESTest and potentially superior sensitivity while requiring half the items of the Mini-BESTest and representing all theoretically based sections of the original BESTest.
Injuries resulting from falls contribute to decreased health status and increased mortality, particularly for individuals of advanced age or with chronic disease.1 In addition, falls are linked to a reduction in overall functioning and to early admission to long-term care facilities.2–5 Balance impairments, which often lead to injurious falls, can be quantified in a clinical setting in order to direct therapeutic rehabilitation aimed at mitigating an individual's specific impairments and minimizing the risk of falls.
To aid in balance assessment and therapeutic prescription, several reliable clinical tools have been developed.3,6–8 Although widely used, most of these assessments provide a measure of stability based on a single context of balance impairment.7–9 Research, however, has demonstrated that postural impairments may be evident across several contexts of behavior.7,10 A recently developed balance assessment tool, the Balance Evaluation Systems Test (BESTest), examines balance performance in 6 specific contexts (or systems, as termed by the test's developers) of postural control: mechanical constraints, limits of stability, anticipatory postural adjustments, postural responses to an induced loss of balance, sensory orientation, and gait.9 In some cases, balance control may be compromised by a single balance system or subset of systems. The BESTest allows for the identification of specific balance systems responsible for poor balance performance and, therefore, can help direct clinical interventions.9
The BESTest has been found to be reliable across raters evaluating a cohort of individuals with and without various neurological diagnoses, and its validity was initially confirmed on the basis that BESTest scores correlate with reported scores of balance confidence.9,11 BESTest scores also have been validated to differentiate people with and without fibromyalgia, chronic obstructive pulmonary disease, and multiple sclerosis (MS).12–14 In addition, the BESTest exhibited high test-retest and interrater reliability when used to evaluate participants with Parkinson disease, and BESTest scores were sensitive to these participants' prospective or retrospective fall reports.15–17 This initial literature shows promise for the BESTest, but its clinical feasibility is extremely limited due to the time required to complete all 36 items.
To address potential limitations of the BESTest's redundancy and lengthy test duration, Franchignoni et al18 identified a subset of the original BESTest items (ie, the Mini-BESTest), which consists of 16 items that can be administered in approximately 15 to 20 minutes. The Mini-BESTest has been reported to be just as reliable and capable of identifying fall status as the BESTest for individuals with Parkinson disease.16 In addition, the Mini-BESTest exhibited superior psychometric properties compared with the Berg Balance Scale for identifying motor impairments in people with Parkinson disease.19
Although the reduced time to administer the Mini-BESTest renders the examination more efficient than the original BESTest, anecdotal reports suggest this assessment remains too lengthy, given increasing constraints on patient contact time in the clinic. In addition, although the Rasch analysis used to define the test offers a powerful technique to generate an examination consisting of nonredundant items that measure a correlated construct and represent a range of difficulty to prevent ceiling or floor effects, the result was contrary to the theoretical basis of the BESTest. Specifically, the items defining the Mini-BESTest represent a singular construct (termed “dynamic balance” by the authors) identified by the Rasch analysis, but excluded items related to mechanical constraints and to limits of stability.
The construction of the Mini-BESTest thus implies that postural control represents a single construct and that a clinical assessment need only to evaluate this construct, as opposed to the original BESTest, which sought a global assessment of multiple constructs that influence balance impairment. Multiple constructs influence postural control and may be important for a broader clinical assessment of balance impairment intended for use across diverse clinical populations. Specifically, the Mini-BESTest's lack of items assessing mechanical constraints or limits of stability could inhibit its sensitivity when applied to people with musculoskeletal impairment or impaired limits of stability. Furthermore, without assessing these contexts of postural control, clinicians would be uninformed to direct interventions on the basis of these impairments. Indeed, the existing literature on the BESTest suggests that mechanical constraints or limits of stability differentiate groups with and without clinical health conditions or groups with and without a fall history.9,12–14,16 Alternative methods, such as classical test theory, therefore, may offer another approach to shortening the BESTest based on its original theoretical underpinnings, thereby advancing the clinical goal of generating a time-efficient balance assessment across several influential constructs of postural control for use across multiple clinical populations.
This preliminary study, therefore, evaluated the internal consistency of items in each section of the BESTest and used item-total correlations to identify each section's most representative item. Each section's most representative item then was included in a new Brief-BESTest examination. We evaluated each test version for interrater reliability and its validity to identify individuals with and without a neurological diagnosis. We further tested each examination version's validity by evaluating its ability to identify the reported fall history of people with or without MS. This second cohort was chosen because lower-limb strength and postural performance at the limits of stability have been found to represent significant predictors of future falls in people with MS and because the BESTest subsections on mechanical constraints and limits of stability are significantly affected in people with MS.14,20 Thus, evaluating the different examination versions' ability to identify fall status in people with and without MS would provide insight into whether retaining an evaluation of mechanical constraints and limits of stability is important to the utility of the Brief-BESTest versus the Mini-BESTest. We hypothesized that the Brief-BESTest would exhibit psychometric properties comparable or superior to those of the Mini-BESTest, but with fewer items and with representation from every section of the original BESTest.
Method
Participants
Twenty participants were included in the first cohort (Tab. 1). Recruitment occurred with the intent to include participants with a wide range of balance abilities. Participants were included if they were able to stand independently and ambulate 6.1 m (20 ft) with or without an assistive device, and were willing to complete the BESTest (45–60 minutes). No other criteria were applied. Five of the 20 participants in the first cohort reported at least 1 fall in the previous 2 months, for a total of 7 falls (range=0–2). As part of a larger study,14,21 thus representing a secondary analysis within this study, the second cohort included 13 people with MS (8 women and 5 men; mean age=50 years, range=31–64) and 13 people without MS (8 women and 5 men; mean age=50 years, range=31–66). People with MS were recruited by advertisement in the local chapter of the National MS Society and were included if they: (1) had neurologist-diagnosed MS, (2) had an Expanded Disability Status Scale score of less than 6, and (3) had no uncorrected hearing or visual impairments. People without MS were recruited by advertising within the local community and were included if they: (1) had no self-reported neurological, musculoskeletal, or psychiatric disorders; (2) had no uncorrected hearing or visual impairments; and (3) were matched to the individuals with MS according to sex, similar height and weight, and within 2 years of age. The disease severity of the participants with MS ranged from 0 to 4.5 on the Expanded Disability Status Scale. Seven of the participants with MS in cohort 2 reported at least one fall in the previous 3 months, for a total of 18 reported falls (range=0–6 falls; 5 participants reported multiple falls). The participants without MS reported no falls. All individuals gave written informed consent to participate in the study.
Participant Characteristics of the First Cohorta
Instrument
The BESTest consists of 36 items grouped into 6 specific postural control systems: biomechanical constraints, stability limits and verticality, anticipatory postural adjustments, postural responses to external perturbations, sensory orientation during stance, and stability in gait. Each item is scored based on a 4-level ordinal scale from 0 to 3. A score of 0 indicates failure or inability to complete the task, and a score of 3 indicates successful completion of the task according to all scoring criteria. As such, total scores range from 0 to 108, with subsection totals ranging from 0 to 15–21 (depending on the number of items in the respective subsection).
The Mini-BESTest is a subset of 14 tasks (16 items due to bilateral assessment) from sections of the BESTest related to anticipatory postural adjustments, reactive postural responses, sensory orientation, and stability in gait. The Mini-BESTest's items are scored on a 3-level ordinal scale from 0 to 2.
Training
The raters were a student in a Doctor of Physical Therapy program and 2 doctorate researchers with expertise in the postural control of individuals with balance disorders. Raters prepared to administer the BESTest by reviewing the written version of the test and viewing the accompanying DVD provided by the test developer. In addition, one of the BESTest's original developers provided a 2-hour training session to the raters. The raters practiced administering and scoring the BESTest on student and community volunteers. These practice sessions allowed all raters to become familiar and comfortable with the implementation and scoring of the BESTest prior to the studies.
Data Collection
Following the training sessions, the full 36-item BESTest was administered to all participants, regardless of cohort, although all 3 raters concurrently rated only the first cohort. The space was organized to facilitate transitions from one item to the next in order to minimize fatigue and mobility requirements. Five rest periods were offered at regular intervals, and participants were instructed to request additional rest if needed.
For the first cohort, 1 of the 3 raters administered the test while that rater or another rater served as a spotter for the participant during task performance in order to minimize the risk of falling. All raters independently scored the test for each participant. One rater evaluated the participants' BESTest performance for the second cohort.
Data Analysis
The data from the first cohort were analyzed for internal consistency, validity, and interrater reliability using PASW version 18 (SPSS Inc, Chicago, Illinois). First, Cronbach alpha and item-total correlations were generated for each BESTest section and rater. The item with the highest average item-total correlation for each section was selected for inclusion in a new Brief-BESTest. If the item assessed a lateralized behavior with a companion item that assesses behavior on the other side (ie, items that are performed on the left and right sides), both items were included in the Brief-BESTest.
After establishing the Brief-BESTest, total scores were calculated for each version of the BESTtest. For the Mini-BESTest, total scores were generated by transforming scores from the BESTest's 4-point ordinal scale to the Mini-BESTest's 3-point scale. Item-total correlations and Cronbach alpha also were reported for the Brief-BESTest and Mini-BESTest to confirm internal consistency and each item's contribution to the respective examination's total score. Interrater reliability of each test version was analyzed using 2-way, mixed-model intraclass correlation coefficients (ICC) testing for absolute agreement. Validity was initially assessed from the data of the first cohort by single-variable logistic regression models to determine the sensitivity, specificity, overall accuracy, and positive and negative likelihood ratios of each rater's total BESTest, Mini-BESTest, and Brief-BESTest scores to identify participants with or without a neurological diagnosis. Similar logistic regression models were used to determine the sensitivity, specificity, overall accuracy, and positive and negative likelihood ratios of the second cohort's scores for each examination version to identify participants with or without a reported fall history (ie, whether they reported experiencing at least 1 fall in the previous 3 months).
Results
Internal Consistency and Item-Total Correlations
The average Cronbach alpha coefficients for each section of the BESTest were .839, .621, .874, .863, .813, and .920 for mechanical constraints, limits of stability and verticality, anticipatory postural adjustments, postural responses, sensory orientation, and gait, respectively. The items with the highest item-total correlation coefficients to their respective section totals were hip abduction, forward reaching, single-leg stance, lateral compensatory stepping, standing with eyes closed on foam, and the Timed “Up & Go” Test (Fig. 1). These items, therefore, defined the Brief-BESTest. It should be noted that single-leg stance was selected as the representative item for the section on anticipatory postural adjustments based on an item-total correlation of .805, which was just slightly higher than the value of .800 elicited by the rise-to-toes item.
Item-total correlation coefficients for the items of the Balance Evaluation Systems Test (BESTest) to their respective section totals. The black bars represent the 8 items that were included in the Brief-BESTest on the basis of having the highest item-total correlations.
Cronbach alpha and item-total correlations for the Mini-BESTest and Brief-BESTest are identified in Table 2. Cronbach alpha was higher for the Mini-BESTest than for the Brief-BESTest in both the first and second cohorts, but both versions exhibited values above .85. On average, the item-total correlations were .732 for the Mini-BESTest and .737 for the Brief-BESTest in the first cohort and .617 for both the Mini-BESTest and Brief-BESTest in the second cohort.
Internal Consistency of the Mini-BESTest and Brief-BESTest Based on Cronbach Alpha and Item-Total Correlations
Interrater Reliability and Validity
All 3 versions of the examination (score distributions identified in Fig. 2) exhibited very strong levels of interrater reliability: ICC (95% confidence interval [CI])=.985 (.959–.994) for the BESTest, .995 (.988–.998) for the Mini-BESTest, and .994 (.986–.997) for the Brief-BESTest. Total BESTest, Mini-BESTest, and Brief-BESTest scores significantly differentiated people from the first cohort with and without diagnosed neurological disorders or injuries (Tab. 3). BESTest scores were more sensitive than the mini or brief versions to identify people with neurological disorders, whereas levels of specificity were similar among all versions of the examination. The relative sensitivity and specificity of the Mini-BESTest versus the Brief-BESTest depended on the rater, but the Brief-BESTest's average sensitivity and specificity were 3% higher and 4% lower, respectively, than the Mini-BESTest's sensitivity and specificity.
Frequency histograms of scores (percentage of possible maximum) from the first and second cohorts on the original Balance Evaluation Systems Test (BESTest) (top), Mini-BESTest (middle), and the proposed Brief-BESTest (bottom). With the combined cohorts, scores represent 24 individuals with a neurological diagnosis (white bars) and 22 without a neurological diagnosis (black bars).
Ability of the BESTest, Mini-BESTest, and Brief-BESTest to Differentiate People With and Without a Neurological Diagnosis for Each Rater
For the second cohort of 26 individuals with and without MS, the Brief-BESTest was 100% accurate in identifying whether the participants reported no falls or at least 1 fall in the previous 3 months. The Mini-BESTest and original BESTest also provided high levels of specificity for people without a fall history, but exhibited lower sensitivities for people with a fall history than the Brief-BESTest (Tab. 4).
Ability of the BESTest, Mini-BESTest, and Brief-BESTest to Differentiate People With and Without a Self-Reported Recent Fall History From the Second Cohort
Discussion
The results support our hypothesis that the Brief-BESTest, defined from items with the highest item-total correlations, exhibits psychometric properties comparable or superior to those of the Mini-BESTest. Although the original BESTest better identified people with a neurological diagnosis compared with either abbreviated version of the examination, the Brief-BESTest and Mini-BESTest exhibited very similar levels of overall accuracy. The Brief-BESTest offered the highest sensitivity and overall accuracy to identify people with and without MS who reported at least 1 fall in the previous 3 months. In addition, all 3 versions of the examination exhibited very high levels of interrater reliability. Thus, the Brief-BESTest offers an even more abbreviated alternative to the Mini-BESTest with similar or superior psychometric properties.
The primary objective of this study was to shorten the BESTest in a theoretically consistent manner, with secondary objectives to provide preliminary comparisons of each version's psychometric properties. The preliminary validity analysis to identify people with and without diagnosed neurological disorders is not intended to suggest that any version of the BESTest would be used to diagnose the existence of neurological disorders. The analysis was instead chosen to demonstrate each examination version's ability to differentiate people with and without a neurological diagnosis regardless of pathology, thereby supporting their use as a generalized balance assessment across the sampled clinical diagnoses.
The 8 selected items of the Brief-BESTest also exhibit face validity for representing each context of balance impairment beyond providing the highest item-total correlations for their respective contexts of impairment. In previously published research, each item has been shown to discriminate groups with and without neurological diagnoses, to associate with or predict falls or fractures, or to associate with ecological tasks or limited participation in activities of daily living.20,22–40 In addition, the Brief-BESTest includes many assessments reported to be most frequently executed by physical therapists (eg, one-leg stance, functional reach, and Timed “Up & Go” Test).41 Thus, the Brief-BESTest items appear to provide valid representative assessments for each context of balance impairment assessed by the original BESTest.
Both the Brief-BESTest and Mini-BESTest elicited a Cronbach alpha of greater than .85, regardless of cohort, although this measure of internal consistency was higher for the Mini-BESTest. This finding is not surprising, as the Mini-BESTest was derived based on a Rasch analysis designed to identify items representing a single construct, and our analysis confirms its internal consistency. In contrast, the Brief-BESTest was defined from the items most strongly representative of each of the BESTest's section scores that represent different contexts of postural control. The Brief-BESTest's items appear similarly associated to the total score as those of the Mini-BESTest are to its total score. Thus, although including items from all 6 contexts of postural control examined by the original BESTest, the selected examination items appear appropriate for assessing balance impairment.
The analysis of fall history on people with and without MS demonstrates either the importance of the Brief-BESTest's retention of all 6 contexts of postural control or that the Mini-BESTest includes additional items that diminish its sensitivity to falls. Although further testing is necessary to identify which is true, measures of lower-limb strength and the ability to maintain balance at the limits of stability are associated with falling in people with MS, and the 2 BESTest subsections are significantly different between people with and without MS.14,20 Thus, although the improved sensitivity of the Brief-BESTest may have been due to the removal of insensitive items rather than the retention of items related to mechanical constraints or limits of stability, including relevant contexts of balance impairment in people with MS likely contributed to its combined sensitivity and specificity to fall history.
The Mini-BESTest and BESTest have previously been reported to elicit 86% and 84% accuracy, respectively, in identifying fallers and nonfallers with Parkinson disease.16 Thus, our results suggest all versions of the BESTest could provide similar or higher levels of accuracy to identify the fall status of people with MS. A larger study is needed to confirm this finding, and further testing across multiple patient populations remains necessary to determine each test version's relative capability to serve as a falls screening tool.
When evaluating the internal consistency of the BESTest, items with particularly low correlations to their section totals were generally those with little variability across participants and that exhibited ceiling effects (eg, base of support, standing on a firm surface, feet-in-place responses). In addition, although the assessments of verticality and stability limits are included in the same section, those on verticality did not correlate well with their section scores, nor did they correlate well in an exploratory analysis with the sensory orientation scores (not shown). The interrater reliability of the verticality items also was reportedly low in previous research,9 suggesting these items do not provide very meaningful contributions to the examination.
Unexpectedly, the dual-task Timed “Up & Go” Test provided the lowest item-total correlations for the gait section despite the Timed “Up & Go” Test providing the highest correlations. One potential reason may be that the examination was challenging for most participants, with dual-task costs on either walking speed or counting being evident for most participants. Instructional standardization for attentional focus also may be an important factor,42 as the participants could have differentially prioritized either task. Given that the scoring of this item diminishes based on impaired performance of either or both tasks and our itemwise interrater reliability for the item was adequate (Kendall W=.76), this was not a likely cause of low item-total correlations. Alternatively, the addition of a dual task may represent another system of impairment that represents cognitive-motor interaction. It would be of interest to evaluate dual-task analogs to multiple tasks (reaching, one-leg stance, stance on firm and foam surfaces, and the Timed “Up & Go” Test) in order to determine whether dual-task impairment represents a unique context of impairment.
Conclusions
Given the economics of clinical evaluation allowing an extremely limited amount of patient-clinician contact time, it becomes even more imperative to develop an efficient examination. At the same time, the presentation of balance impairment is multifactorial, and the consequences of impaired balance are deserving of adequate assessment. This study confirms the validity and reliability of both the BESTest and Mini-BESTest for raters in research and raters in training as physical therapists, but also provides initial support for the Brief-BESTest. The Brief-BESTest elicited equally high levels of interrater reliability as the existing versions of the examination, its ability to differentiate individuals with and without diagnosed neurologic disorders or injuries was similar to that of the Mini-BESTest, and its ability to differentiate people with and without MS based on fall history was superior to that of either existing version. These psychometric properties are balanced by the most clinically feasible combination of only 8 scored items (compared with the 16 or 36 items of the other versions) while remaining theoretically intact with representative items from all 6 systems of postural control assessed by the original BESTest. The findings, however, are preliminary, representing a modest number of participants in both cohorts. Further use and examination of the Brief-BESTest are needed to confirm its validity and generalizability, as well as to identify clinically useful cutoff scores.
Appendix.
Scoring Form for the Brief Balance Evaluation Systems Test (Brief-BESTest).a
a The scoring form for the Brief-BESTest examination may not be used or reproduced without written permission of the authors.
Footnotes
-
All authors provided concept/idea/research design, writing, and data collection and analysis. Dr Jacobs provided project management. Dr Padgett and Dr Kasser provided study participants. Dr Jacobs and Dr Kasser provided facilities/equipment. The authors acknowledge Dr Fay Horak for providing training.
-
These data were presented as a poster and abstract at the Joint World Congress of the International Society for Posture & Gait Research (ISPGR) and Gait & Mental Function; June 24–28, 2012; Trondheim, Norway.
-
The study was funded by the University of Vermont's Department of Rehabilitation and Movement Science.
- Received February 12, 2012.
- Accepted May 30, 2012.
- © 2012 American Physical Therapy Association