Abstract
Background The Mini-Balance Evaluation Systems Test (Mini-BESTest) is a clinical balance test comprising 14 items assumed to reflect the unidimensional construct “dynamic balance.”
Objective The study objective was to examine the dimensionality of the test and the properties of each item and their interrelationships in elderly people with mild to moderate Parkinson disease (PD).
Design This was a cross-sectional study in a laboratory setting.
Methods A total of 112 participants (mean age=73 years) with idiopathic PD (Hoehn and Yahr stages 1–3) were assessed by physical therapists. Local independence among items was examined with Rasch modeling. Unidimensionality was tested by running a principal component analysis on the residuals. An exploratory factor analysis was used to examine the structure of the test, and a confirmatory factor analysis was used to evaluate the fit of the derived model.
Results The first residual component of the principal component analysis, with an eigenvalue of greater than 2, superseded the assumption of unidimensionality. After the omission of item 7 because of convergence problems, the exploratory factor analysis suggested that a 3-factor solution best fit the data. A confirmatory factor analysis demonstrated acceptable fit of the final model, although item 14 loaded poorly on its factor.
Limitations The sample size was on the lower end of what is generally recommended.
Conclusions This study could not confirm that the Mini-BESTest is unidimensional. Gait items were dispersed over all factors, indicating that they may reflect different constructs. Nonetheless, as there arguably is no clinical balance test superior to the Mini-BESTest today, we recommend using the total score for assessing gross balance in this population and individual items to identify specific weaknesses. Moreover, dual tasks should be assessed separately because they are an important aspect of balance control in people with PD, reflected in only one item of the test.
Loss of balance control is a common and disabling complication in Parkinson disease (PD).1 Several instruments for the assessment of balance are available, although none specifically target people with PD. Moreover, few instruments take into account the complexity of the multiple physiological systems implicated in balance control. One exception is the Balance Evaluation Systems Test (BESTest).2 Building on a systems model of motor control,3 the BESTest comprises 36 items representing 6 different subdomains of balance control: biomechanical constraints, stability limits, anticipatory postural adjustments, postural responses, sensory orientation, and dynamic gait. This multilevel approach makes it possible to determine which particular aspects of balance control are compromised and, thereby, to direct treatment accordingly.
However, the time-intensive nature of the BESTest limits its usability. Therefore, Franchignoni et al4 developed the Mini-BESTest—a short version of the BESTest—on the basis of a sample of people with a variety of neurological disorders. Through psychometric methods, the original 36 items were narrowed down to 14, and intra-item scoring was reduced from 4 levels to 3.4 During the process, all items originally representing biomechanical constraints and stability limits were omitted; the remaining items, instead of representing different subdomains of balance control, were now collectively thought to reflect the unidimensional construct “dynamic balance.” Somewhat contradictory to the concept of a unidimensional construct, the items of the test are arranged in 4 different subscales, and scores may be calculated separately for each subscale.5 Consequently, studies in which the scores for each subscale are analyzed and reported separately are emerging.6–11
Since its introduction in 2010, the Mini-BESTest has been increasingly used for evaluating balance function in various populations. Importantly, it includes items targeting balance problems that are typical in people with PD; examples include the ability to regain balance after perturbation and walking under different conditions. Additionally, the test was recently recommended—together with the Berg Balance Scale—as a method of choice for evaluating standing balance in adults.12 The psychometric properties of the Mini-BESTest were recently evaluated in a sample similar to that used to develop the test; the evaluation revealed satisfactory internal validity, reliability, and local independence among items.13 That study also confirmed that the instrument was unidimensional in people with neurological disorders generally. However, the validity of any instrument is highly dependent on the population to which it is applied, and the structural validity (ie, the extent to which scores on an instrument reflect the dimensionality of the construct to be measured14) of the Mini-BESTest has not been investigated in people with PD.
Previous studies on other aspects of validity accredited the Mini-BESTest for being able to discriminate between people with mild PD and those with severe PD15 and to discriminate between people who experienced falls (fallers) and those who did not (nonfallers).16,17 Scores on the Mini-BESTest also were found to correlate strongly with scores on other well-established balance tests for people with PD.15,18–20 Although these studies demonstrated adequate criterion validity (ie, the extent to which scores on an instrument correlate with a gold standard) and known-group validity (ie, the ability of an instrument to identify clinically relevant subgroups of a population), an in-depth analysis of the construct “dynamic balance” has not been performed in people with PD. Therefore, to gain a better understanding of what the Mini-BESTest represents, we performed the present study with the aim of examining the dimensionality of the construct “dynamic balance” as well as the properties of each item and their interrelationships in a sample of elderly people with mild to moderate PD.
Method
Participants
For this study, we used baseline data from a randomized controlled trial aiming to determine the efficacy of a balance training intervention for outcomes related to PD (the BETA-PD study; ClinicalTrials.gov registration number: NCT01417598). Participants were recruited via advertisements in local newspapers, from Karolinska University Hospital, and through the Swedish Parkinson Association.
Inclusion criteria were an age of 60 years or older and a clinical diagnosis of idiopathic PD, according to the Queen Square Brain Bank Criteria,21 with Hoehn and Yahr22 stages 1 to 3. Exclusion criteria were atypical PD, as defined by Hughes et al,23 or other neuromuscular disorders or medical conditions that significantly affect gait or balance performance. A total of 112 participants were enrolled and gave written informed consent.
Procedure
All tests were carried out by trained physical therapists in a movement analysis laboratory at a university campus. Disease severity was assessed with the Unified Parkinson Disease Rating Scale either before or after the completion of a set of gait and balance tests, including the Mini-BESTest. The whole procedure took approximately 2 to 3 hours, including breaks. All participants maintained the dosages and administration times for their medications, as prescribed by their neurologists. The full procedure used in the study has been described in more detail elsewhere.24
Instrumentation
The Mini-BESTest consists of 14 items derived from 4 of the 6 subscales of the BESTest: anticipatory postural adjustments (standing up from a seated position; rising to toes; standing on one leg), reactive postural control (forward; backward; sideways), sensory orientation (standing with feet together on a firm surface with eyes open; standing with feet together on a foam surface with eyes closed; standing on an inclined surface with eyes closed), and dynamic gait (changing gait speed; walking with head turns; walking with pivot turns; stepping over an obstacle; Timed “Up & Go” Test [TUG] with a dual task). Items are scored from 0 (unable or requiring help to perform) to 2 (normal function) on an ordinal scale with a maximal total score of 28. Items 3 and 6 are assessed separately on the left and right sides; the score from the side with the poorer performance is included in the total score.5 The Swedish translation of the test was used.20
Data Analyses
Three psychometric methods were used to evaluate the structural validity of the Mini-BESTest. Rasch analyses (partial-credit model) were conducted with R 3.1 statistical software (R Core Team, Vienna, Austria, 2014) and the Extended Rasch Modeling Package (eRm, http://erm.r-forge.r-project.org; 2015). Person-item maps were used to evaluate the distribution of latent dimensions and the sequencing of category difficulty thresholds. The infit and outfit measures of fit have an expected value of 1, and values can range from 0 to infinity. Values of greater than 1 indicate underfit to the Rasch model, that is, more randomness than expected. Values of less than 1 indicate overfit to the Rasch model, which could be interpreted as indicating redundancy.25 According to Wright and Linacre,26 0.5 to 1.7 could be considered a reasonable range for infit and outfit measures in clinical observations.
One way to evaluate whether the items in a group reflect the same underlying construct (ie, unidimensionality) is to save the residuals from a Rasch analysis and use them as items in a principal component analysis (PCA). Simulations have indicated that Rasch analysis is more powerful than factor analysis when the correlations between factors are strong and when the distribution of items between factors is uneven.27 Low eigenvalues on the received components indicate no systematic associations between the residuals and, hence, unidimensionality. As a rule of thumb, an eigenvalue of less than 2 on the first component may be interpreted as an acceptable cutoff for unidimensionality.
Another way to explore the dimensionality of an instrument is to perform an exploratory factor analysis (EFA). An EFA organizes items into factors (or components) according to their interrelationships. Varimax rotation, which maximizes the variance in squared correlations between items and factors, was selected to facilitate the interpretation of the results. Improvements in model fit with different factor solutions were estimated with χ2 tests. The factors of the best-fitting model were then examined in separate Rasch analyses (partial-credit model) to determine local independence among items and to assess whether each factor was unidimensional by running a PCA on the residuals. As an additional descriptive step, a confirmatory factor analysis (CFA) was subsequently estimated from the raw data to determine the fit indexes of the final model. Items were defined as ordinal, and a robust weighted least squares estimator was used. Two incremental fit indexes—the Tucker-Lewis Index and the Comparative Fit Index—were used to assess the fit of specified CFA models. A value of greater than .95 on these indexes indicates a close fit. The root-mean-square error of approximation was also used, with a value of less than .05 indicating a close fit.28 The EFA and the CFA were conducted with Mplus 7.11 statistical software (Muthén & Muthén, Los Angeles, California). Finally, the Cronbach α was calculated to estimate the internal consistency of the final model.
Role of the Funding Source
The study was funded by the Swedish Research Council, the StratNeuro Karolinska Institutet, and the Swedish Neuro Foundation.
Results
Descriptive Data
Complete sets of data were obtained from all 112 participants, and no adverse events were reported. Sample characteristics are shown in Table 1.
Sample Characteristicsa
Initial Rasch Analysis
To evaluate the distribution of latent dimensions and the sequencing of category difficulty thresholds, we conducted a Rasch analysis. A person-item map (Fig. 1) indicated that: (1) participants were approximately normally distributed on the latent dimension; (2) only 2 values were used on items 1, 7, and 10 (no participant received the lowest value); (3) item difficulty for item 7 was very low; and (4) all category thresholds were sequenced as anticipated. This ordering of thresholds indicates that on all items, each score is modal at some point along the latent dimension.29 Inspection of the outfit and infit mean square errors of the items revealed low outfit values for items 7 (0.217) and 1 (0.579), indicating that their content may overlap with the content of other items.30
Person-item map from Rasch analysis, with items in ascending order of mean difficulty (black circles). Gray and white circles indicate thresholds between first and second categories and between second and third categories, respectively. For example, for item 8, the lowest value was the most probable for participants with a value of less than or equal to 1.5 on the latent dimension, whereas the highest value was the most probable for participants with a value of greater than 1.5. Dist=distribution, F=item, Para=parameter (ie, item), Pers=person.
In the next step, the residuals generated in the Rasch analysis were used in a PCA to determine whether the instrument was unidimensional. The first component had an eigenvalue of greater than 2 (2.04), indicating that the scale may not be unidimensional. In an attempt to reduce dimensionality, the items with the poorest fit (7 and 1) were sequentially excluded; the PCAs were run with the remaining residuals, returning eigenvalues of less than 2 for the first component in both cases (Tab. 2). However, in agreement with the aim of the present study, no items were deleted at this point; all were carried forward to the subsequent analyses.
Outfit and Infit Values With and Without Items 7 and 1a
EFA
An EFA with varimax rotation was conducted with the raw data. The first analysis ran into convergence problems due to the fact that item 7 was more or less constant and had to be deleted. The second analysis, which did not include item 7, indicated a significant improvement in model fit up to 3 factors. Specifically, model fit improved with 2 factors compared with 1 factor (χ2=37.3612, P<.001) and with 3 factors compared with 2 factors (χ2=21.2211, P<.031), but not with 4 factors compared with 3 factors (χ2=14.3210, P<.159). However, with 3 factors, the estimated residual variance became negative for items 2 and 9, indicating an excessive number of factors. Factor loadings, model fit statistics, and explained variance for varimax-rotated solutions with 1, 2, or 3 factors are shown in Table 3.
Varimax-Rotated Loadings for Solutions With 1, 2, and 3 Factorsa
Subsequent Rasch Analysis
In 3 separate Rasch analyses for the 3 proposed factors, some items had low outfit or infit values; the lowest value was found for item 9 (outfit=0.374; infit=0.541). In addition, the latent dimension for factor 3 was highly skewed. A closer inspection of the factor loadings (Tab. 3) indicated that it might be better to relocate item 13 from factor 1 to factor 3. A switch was made, and 3 new separate Rasch analyses revealed more normally distributed latent dimensions and well-functioning category difficulty thresholds for all items.
To estimate whether the 3 factors were unidimensional, we used the residuals from the 3 separate Rasch analyses as items in 3 separate PCAs. In none of the cases did the first component have an eigenvalue of greater than 2 (1.56, 1.76, and 1.44 for factors 1, 2, and 3, respectively), indicating a high degree of unidimensionality. Outfit and infit statistics were quite low for some items, especially item 9, indicating the possibility of reducing the instrument to fewer items without any considerable loss of information (Tab. 4).
Outfit and Infit Statistics for the Final 3-Factor Solution
CFA
A subsequent hierarchical CFA was performed with dynamic balance as a higher-order factor in accordance with the theoretical concept of the Mini-BESTest. The results demonstrated good fit of the final model, although item 14 loaded relatively poorly on factor 3 (Fig. 2). Removing item 14 did not improve model fit but rather weakened it. The correlations between the factors were represented by an r value of .365 (P<.001) for factor 1 and factor 2, an r value of .359 (P<.001) for factor 1 and factor 3, and an r value of .205 (P=.001) for factor 2 and factor 3. The Cronbach α values were .584, .567, and .585 for factors 1, 2, and 3, respectively, indicating poor consistency, and .738 for the whole scale, indicating acceptable consistency.
Confirmatory factor analysis of a model in which item 13 was relocated from factor 1 to factor 3. The χ2 value was 76 (df=62), the root-mean-square error of approximation was 0.045, the Comparative Fit Index was 0.954, and the Tucker-Lewis Index was 0.942. *P≤.001. †P<.05.
Discussion
Although the Mini-BESTest is widely used to assess balance in people with PD, the structural validity of the instrument has not been investigated in this population. Therefore, in the present study, we explored the dimensionality of the construct “dynamic balance” as well as the properties of each item and their interrelationships in a sample of elderly participants with mild to moderate PD. An initial item-person map revealed a ceiling effect for items 1 (standing up from a seated position) and 7 (standing on a firm surface with eyes open), implying that these tasks may not be sufficiently challenging for this group of people. Similarly, outfit mean square errors indicated that items 1 and 7 were redundant. In addition, a PCA on the residuals retrieved from the Rasch analysis disqualified the assumption of unidimensionality, although the eigenvalue of 2.04 was close to the upper threshold of 2.
In contrast to these findings, Franchignoni et al13 recently demonstrated that all 14 items were properly sequenced in a sample of people with a wide range of neurological disorders, with infit and outfit values falling within the range of 0.8 to 1.3. They also found that the eigenvalue for unexplained variance conformed to the definition of unidimensionality. This finding suggests that when a large heterogeneous sample is used, all items add unique information to the construct, whereas when a homogeneous sample is used (as in the present study), some items exhibit redundancy and may instead compromise the validity of the scores.
After the Rasch analysis, we proceeded to further explore the underlying structure of the instrument by means of an EFA. Item 7 was omitted from the model because of convergence issues; however, this change likely had only a negligible effect on the factor loadings of other items or on model composition. A 1- or 2-factor solution demonstrated acceptable factor loadings of items in general, although a 3-factor solution best fit the data. This solution also seemed to be the most rational because the factor composition broadly corresponded to the subscales of the BESTest2 from which the items were originally derived: anticipatory postural adjustments (factor 1), sensory orientation (factor 2), and reactive postural control (factor 3). The most notable differences were that the number of factors was now 3 instead of 4 and that items of the fourth subscale (dynamic gait) were dispersed over all 3 factors. Specifically, in the final model, walking with head turns (item 11) was grouped together with tasks representing anticipatory postural adjustments, change in gait speed (item 10) and walking with pivot turns (item 12) were grouped with items representing sensory orientation, and stepping over an obstacle (item 13) and the TUG with a dual task (item 14) were grouped with items representing reactive postural control.
One plausible explanation for these results is that the gait items of the Mini-BESTest are highly diverse. Locomotion on its own is a complex behavior, depending on the interaction of multiple areas of the brain.31 In PD, the integrity of these structures is compromised, resulting in the typical features of larger step variability and asymmetry, slowness, and poor balance (for a review, see Peterson and Horak32). Adding a second task, as in the gait items of the Mini-BESTest, requires to a larger extent the engagement of structures in the prefrontal cortex—a requirement that is particularly challenging for people with PD.33 Although the neural networks involved in different types of complex tasks are still incompletely understood, it is reasonable to assume that walking with, for instance, head turns—a task that mainly manipulates the vestibular and visual systems—is different in nature from stepping over an obstacle, making a pivot turn, shifting pace while walking, or walking and performing a cognitive task (dual-tasking).
Because the aim of the present study was to explore the structural validity of the Mini-BESTest rather than to construct a new instrument, we excluded no items other than item 7; instead, we confirmed the model retrieved from the EFA by using a CFA. In accordance with the theoretical concept of the Mini-BESTest, dynamic balance was assumed to be a higher-order factor in the model. Overall, the final model demonstrated good fit statistics, although item 14 loaded poorly (0.27) on factor 3, suggesting that performing a motor task and a cognitive task concurrently (ie, dual-tasking) may represent a different construct. Moreover, the person-item map suggested that item 14 was the most difficult task, with only 6 participants attaining the highest score. This suggestion is in line with previous research showing that performing dual tasks is particularly challenging for people with PD because of aggravated cognitive-motor interference resulting from diminished motor automaticity (for a review, see Kelly et al34). These data are important because the relative significance of a single item measuring a rather unique aspect of the underlying dimension—in this case, the ability to perform dual tasks—will be outweighed by other items in the test.
The question of whether performing dual tasks belongs in the Mini-BESTest depends on how broad the definition of “dynamic balance” is considered to be. Essentially, this question is more academic than statistical; however, if performing dual tasks is considered to be part of the construct “dynamic balance,” then it may not be adequately covered by only one item but may instead compromise the consistency of the test. Because impairment of the ability to perform dual tasks has such a profound effect on balance control in people with PD,34 we recommend that dual-tasking ability be assessed in addition to the Mini-BESTest, with a more comprehensive set of tasks, when balance function in this population is evaluated. To our knowledge, such an instrument has not yet been developed.
In addition, a few other items were found to fit rather poorly with the construct. Most notably, standing on a firm surface with eyes open, standing up from a seated position, and standing on an inclined surface with eyes closed were too simple for the participants in the present study, in that virtually all of them attained the highest score. This finding introduces validity issues because redundant items not only will add little or no additional information to the test but also will have a disproportionately large impact on the total score. Because we studied a very narrow and homogeneous sample, compared with the heterogeneous sample used to develop the test, the finding of suboptimal fit for some items was not surprising.
Another concept that is highly relevant from a clinical perspective is dimensionality. Because balance control is multifaceted, a measurement tool for balance control arguably should be multifaceted, too. In other words, a balance test that allows specific balance functions to be assessed separately may help clinicians identify weak areas in individual patients and, thus, direct treatment accordingly. In the most recent version of the Mini-BESTest, items are organized and labeled according to the subscales of the BESTest from which they were derived.5 It is becoming increasingly common to report and interpret the scores from these subscales separately.6–9 To our knowledge, however, such a categorization has not been validated for either the BESTest or the Mini-BESTest. For the Mini-BESTest, conducting separate analyses of subgroups of items would be inconsistent with the concept of a unidimensional construct. Even though our results suggest the presence of 3 underlying dimensions of dynamic balance, they also show that some items, being either redundant or diverse, fit rather poorly into their subscales, and the configuration of these subscales cannot convincingly be related to a theoretical framework.
A brief version of the BESTest35 comprising 1 item from each subscale was recently presented. However, it has been argued that the psychometric procedures used to derive the test may not have been ideal.36 To our knowledge, the structural validity of the subcomponents of the BESTest has not been investigated specifically in people with PD, and although the theoretical model of balance control underpinning the test seems rational from a clinical perspective, additional research is needed to validate the model against evolving knowledge about the neurophysiology of PD and the interacting systems controlling balance.
Limitations
One limitation of the present study was the relatively small sample size of 112 participants. For Rasch modeling, the general recommendation is to include at least 10 observations per item37; following this recommendation in the present study would have required an additional 28 participants. For factor analyses, many different suggestions for a lower limit of participants have been made, ranging from 10038 to 50039 observations. The use of a different sample from the same population may have produced different results.
The fact that the sample in the present study was highly homogeneous may, to some extent, have compensated for the limited number of participants. On the other hand, the fact that our sample was homogeneous limits the generalizability of our results. In a strict sense, the results apply only to people with mild to moderate PD, and additional studies are needed to determine the validity of the test in people with more severe PD or other populations with balance problems.
Taken together, the results of the present study do not support the notion that the Mini-BESTest is unidimensional in people with mild to moderate PD. Instead, we found 3 underlying dimensions, partially resembling the following subscales of the original BESTest: anticipatory postural adjustments, sensory orientation, and reactive postural control.2 Although our findings call into question the structural validity of the test, it arguably still holds merit as a clinical tool for several reasons.
First, unlike other commonly used multiitem balance tests (eg, the Berg Balance Scale and the Tinetti Balance Assessment Tool), the Mini-BESTest comprises items derived from a multidimensional model of balance control and targeting aspects of balance that are highly relevant for people with PD (such as postural responses and gait items). Second, the Mini-BESTest has been shown to effectively distinguish fallers from nonfallers16,17 and people with mild disease severity from those with severe PD.15 Again, tasks targeting postural responses showed the highest discriminative ability, closely followed by tasks targeting anticipatory postural adjustments. Third, unlike the Berg Balance Scale, the Mini-BESTest does not seem to have ceiling effects, even in people with mild disease severity.15 Fourth, the Mini-BESTest has high inter- and intrarater reliability, meaning that different assessors score the test similarly and that the error of measurement on repeated occasions is small.9 Finally, the test is not time intensive and does not require any special equipment, making it easy to administer and feasible to use in everyday practice.
Therefore, until a better test emerges, the Mini-BESTest may indeed be the best instrument for assessing balance in people with PD. However, clinicians are advised to think carefully before using the alleged subscales of the test—especially when interpreting the summarized score of gait items—because they are highly diverse, likely reflecting different constructs. Our recommendation is to use the total score to assess gross balance and individual items to identify specific weaknesses. Particular attention should be paid to cognitive-motor dual-tasking ability because it is an important aspect of balance control in people with PD but is measured by only one item of the test. In closing, we anticipate the development of a multidimensional test specifically for people with PD—a test that builds on current knowledge about balance control and the neurophysiology of PD and comprises a balanced set of items that are neither too simple nor too difficult.
Footnotes
Dr Benka Wallén, Mr Löfgren, and Dr Franzén provided concept/idea/research design and data collection. Dr Benka Wallén, Dr Sorjonen, and Dr Franzén provided writing. Dr Benka Wallén and Dr Sorjonen provided data analysis. Dr Franzén provided project management, fund procurement, participants, facilities/equipment, and institutional liaisons. Dr Sorjonen and Mr Löfgren provided consultation (including review of manuscript before submission).
The authors acknowledge Dr David Conradsson and PhD student Håkan Nero for support with data collection.
The study was approved by the Regional Board of Ethics in Stockholm, Sweden (2009/819-32, 2010/1472-32, and 2012/1829-32).
The study was funded by the Swedish Research Council, the StratNeuro Karolinska Institutet, and the Swedish Neuro Foundation.
- Received June 9, 2015.
- Accepted May 14, 2016.
- © 2016 American Physical Therapy Association