Abstract
Background Hand and wrist injuries are one of the most common injuries seen in adults. The Patient-Rated Wrist Evaluation (PRWE) questionnaire has been developed as a patient-report outcome measure of pain and disability to evaluate the outcome after hand and wrist injuries.
Objective The aims of this study were (1) to evaluate the structural validity of the existing Dutch version of the PRWE (PRWE-NL) in patients with hand or wrist injuries and (2) to investigate the appropriateness of reporting subscale scores.
Design This was a retrospective analysis of cross-sectional data of 368 adult patients.
Methods Patients aged 18 to 65 years and treated either surgically or conservatively for an isolated hand or wrist injury were recruited. Patients were excluded if they were unable to speak or read Dutch. Confirmatory factor analyses were used to investigate structural validity, and Cronbach alpha (α) and omega (ω) coefficients were used to investigate internal consistency.
Results A series of confirmatory factor analyses revealed that all models (ie, a single-factor model, correlated 2- and 3-factor models, and 2 bifactor models) were associated with adequate model fit. However, inspection of the factor loadings, the explained common variance (ECV), and the different coefficient omega values revealed that the PRWE-NL should be considered a measure of a unidimensional trait. In addition, PRWE-NL subscales were associated with unacceptably low levels of reliability independently of the global PRWE-NL factor.
Limitations Although the sample size was adequate, the response rate was 37.1%. Participants were mainly patients with fractures of the wrist or hand, predominantly treated nonsurgically.
Conclusion This study suggests that the PRWE-NL measures a unidimensional trait. A single score should be used for the PRWE-NL, without subscale scores.
A certain level of pain and disability may be present after an injury, including hand and wrist injuries. Traditional outcome measures for evaluating the result of treatment following hand and wrist injuries include grip strength, range of motion, and radiological parameters. These traditional methods, however, do not take into account aspects that are important to the patient, such as pain, activity limitations, and participation restrictions.
The leading conceptual model of disability is the World Health Organization's International Classification of Functioning, Disability and Health (ICF).1 In the ICF, problems with functioning are categorized in 3 interconnected areas: “impairments,” “activity limitations,” and “participation restrictions.” Disability refers to difficulties addressed in any or all 3 areas of functioning. One of the most common assessment methods for measuring various aspects of functioning is the use of patient-reported outcome questionnaires. A patient-reported outcome is an outcome, such as pain, perceived activity limitations, and quality of life, reported directly by the patient and not interpreted by an observer.2 Patient-reported outcomes may be used in health care policies, reimbursement decisions, and clinical decision making. Patient-reported outcome data also can be used to determine patient progress, to evaluate the effectiveness of a treatment, and to establish treatment preferences.
Numerous patient-reported outcome measures are used to assess activity limitations and participation restrictions in patients with hand and wrist injuries.3 The most frequently used measures include the Disabilities of the Arm, Shoulder and Hand (DASH) questionnaire4 and its shortened version, the QuickDASH5; the Michigan Hand Outcomes Questionnaire (MHQ)6; and the Patient-Rated Wrist Evaluation (PRWE).7 In contrast to the DASH, QuickDASH, and MHQ, the PRWE was specifically developed to assess wrist injuries. In 1998, MacDermid et al7 developed the PRWE questionnaire as an outcome measure to assess pain and functioning in patients with injuries affecting the wrist joint area.8 Later, 2 optional questions concerning esthetics were added to assess hand conditions, without changing the scoring system.9 The PRWE is a 15-item questionnaire (excluding the 2 optional questions), divided into 2 basic elements of outcome: pain (5 items) and function (10 items). The pain items were selected to represent the whole spectrum of severity in frequency and intensity. These pain items represent problems in the “body function and structure” (ie, “impairments”) domain of the ICF. The function items were selected to represent a range of physical activities that required different ranges of motion or muscle strength capabilities. These items represent the ICF domains “activity limitations” and “participation restrictions.”
The PRWE has been translated and adapted into several languages.10–21 Brink et al16 evaluated the Dutch version of the PRWE (PRWE-NL) in a small sample. Among the 58 included patients, only 19 had an injury, whereas the rest had a chronic condition. Both the internal consistency and the test-retest reliability were found to be high. Construct validity was assessed by correlating the total score of the PRWE-NL to the total score of the Dutch version of the DASH questionnaire.22 A strong correlation (r=.84, P<.001) was found between the 2 questionnaires.
To date, no confirmatory factor analysis has been conducted to examine the underlying factors of the PRWE questionnaire. Factor analysis is a statistical technique used to identify a set of latent constructs underlying a battery of measured variables.23 With regard to the PRWE questionnaire, there is inconsistency about the number of underlying dimensions. Two and three factors have been described.11,17 In addition, a total score and 2 subscale scores can be computed.8
In this study, we investigated the structural validity of the PRWE-NL in a population of patients with hand and wrist injuries. Particularly, confirmatory factor analyses were conducted, and we examined the appropriateness of reporting subscale scores of the PRWE-NL. In addition, internal consistency was investigated.
Method
Research Design and Setting
Patients with an isolated hand or wrist injury sustained in 2012 or 2013 were recruited between November 2013 and February 2014 for this cross-sectional study. All consecutive patients who were treated either surgically or conservatively for these injuries at the University Medical Center Groningen (a level 1 trauma center) in the Netherlands were invited to participate. Patients were 18 to 65 years of age at the time of injury were included. Patients were excluded if they were unable to speak or read Dutch. All eligible patients were invited to complete a paper version of the PRWE-NL at home. If applicable, a reminder was sent 2 weeks after the initial invitation. All patients gave written informed consent.
Evaluating Structural Validity and Internal Consistency
Structural validity is the degree to which the scores of an instrument are an adequate reflection of the dimensionality (ie, the expected number of subscales) of the construct to be measured.24 In this study, the structural validity of the PRWE-NL was assessed by confirmatory factor analyses (CFAs). We planned to explore a single-factor model of the PRWE-NL (model 1), a correlated 2-factor model (pain and function subscale [model 2]), and a correlated 3-factor model (pain, specific activities, and usual activities subscale [model 3]). A correlated factors model indicates that 2 or more factors underlie the measured variables and that these factors covary or correlate.25 Generally speaking, when factors are moderately or highly correlated (ie, share a considerable amount of common variance), there is a possibility of a general factor that underlies the data.
A bifactor model is a model that includes a general factor underlying all items and 2 or more group factors (more specific aspects of that construct) associated with a limited number of items.26 That is, a bifactor model specifies that the covariance among item response data can be accounted for by a single general factor (reflecting the common variance among all items) and group factors (reflecting additional common variance among clusters of items). In a bifactor model, the general factor and the group factors are assumed to be uncorrelated (ie, the covariances of all factors are constrained to 0). Bifactor models can be used to address the unique contribution of each of the group factors independent to the general factor27 (ie, a bifactor model shows the unique contribution of each of the group factors over and above the general factor). Such a model can be used to explore the unique contribution of each subscale in predicting some outcome after controlling for the general factor and the appropriateness of using both subscale scores and a total score. We explored 2 bifactor models (ie, models 4 and 5). These models are comparable to models 2 and 3, including a general factor and 2 or 3 group factors, respectively, nested within the general factor. The latent variable models of all investigated PRWE-NL models are represented in eFigures 1–5.
Internal consistency is defined as the degree of the interrelatedness among the items in a questionnaire scale or subscale.28 It is the degree to which all items measure the same construct, assuming the scale or subscale to be unidimensional. The internal consistency of the PRWE-NL was determined after verifying the dimensionality through factor analysis.
Two approaches were used to estimate internal consistency. First, Cronbach alpha (α) was calculated to determine the internal consistency per scale or subscale in the correlated factor models. The Cronbach α represents the ratio of true score variance to total variance.28 Despite its popularity, Cronbach α tends to overestimate the reliability of the general factor when using scores associated with a multidimensional data structure.29 In addition, Cronbach α is sensitive to the number of items included in a scale. Alternatively, model-based internal consistency reliability coefficients (ie, omega total [ωT] and omega hierarchical [ωH]), were used to estimate internal consistency in the models with a bifactor structure.29,30 The coefficient ωT is interpreted as an estimation of the reliability of a latent factor combining the general and group factor variance, and the coefficient ωH is an estimator of how much variance in raw scores can be attributed to the single general factor.30 The coefficient ωH can be extended to estimate subscale reliability, controlling for that part of the reliability due to the general factor in a bifactor model.31,32 Reise32 used the term “omega subscale” (ωS) to make clear that it is the reliability estimate for a residualized subscale, controlling for the effects of the general factor. These coefficients provide useful information to judge whether scores for a group factor can be interpreted with confidence or only the total score (general factor) should be reported. A Cronbach α, ωT, ωH, or ωS coefficient of .70 to .95 was considered good internal consistency.
Following suggestions by Reise,32 2 “factor strength” indexes were calculated to evaluate whether our data were “unidimensional enough.” First, coefficient ωH is an estimator of how much variance in raw scores can be attributed to the single general factor.30 A high ωH value indicates that a composite score is reflected by a single common source (ie, one common factor underlies item responses). Second, the explained common variance (ECV)33 is calculated. This is the ratio of the general factor eigenvalue to the sum of all of the eigenvalues (including general factor and group factor eigenvalues). As such, it is a better indicator of unidimensionality than of the amount of test variance accounted for by a general factor.
Statistical Analyses
Factor loadings and model fit were analyzed with CFA for categorical items. Confirmatory factor analysis was performed using the R package lavaan,34 a package for structural equation modeling implemented in the R system for statistical computing.35 The weighted least squares means and variance estimator, a robust weighted least squares approach, was used to estimate the model parameters and to compute robust standard errors and means and variance-adjusted test statistics. A completely standardized solution was used to report the factor loadings and covariances. Standardized factor loadings of at least 0.50 were considered appropriate.
Several goodness-of-fit indexes are reported to provide different information about model fit.36,37 The chi-square statistic was computed as the test of global fit for each model. It is an absolute fit index assessing the absolute discrepancy between the sample and fitted covariance matrices. A statistically significant chi-square statistic supports “lack of fit.” This widely used statistic has numerous limitations (eg, it is, in essence, a statistical test and thus sensitive to sample size). Other goodness-of-fit indexes were examined to evaluate model fit. These indexes included 2 other absolute fit indexes (the root mean square error of approximation [RMSEA] and the standardized root mean square residual [SRMR]) and 2 incremental fit indexes (the comparative fit index [CFI] and the Tucker-Lewis Index [TLI]). Incremental fit indexes do not use the chi-square statistic in its raw form, but they compare the chi-square value with a more restricted baseline model. Incremental fit indexes, also known as CFIs or relative fit indexes, are less affected by the sample size of a study. A CFI and TLI close to 0.95 or higher, an RMSEA close to or less than 0.06, and an SRMR close to or less than 0.08 were considered as adequate model fit.36
A sensitivity analysis including only patients with wrist injuries (ie, those with fractures of the distal radius, carpal bones, or metacarpal bones) was conducted to analyze the same models in a more specific group of patients.
Although the sample size is an important aspect of factor analysis, there are varying views and guidelines.38 Comrey and Lee39 recommended a sample size of at least 300 cases. Therefore, in this study, we intended to include at least 300 patients with isolated hand and wrist injuries. Dispensation has been given by the medical ethics committee of our hospital, but the study was carried out in compliance with the principles outlined in the Declaration of Helsinki–Ethical Principles for Medical Research Involving Human Subjects.
PRWE
In this study, we used the Dutch version of the PRWE, without the 2 optional questions. The English version of the PRWE was developed in Canada to complement traditional outcome measures.7,40 The PRWE was developed after surveying experts, a thorough literature review, and patient interviews. The domains pain and function were identified as the 2 essential components of clinical evaluation of patients with hand and wrist injuries. The development process included item generation and selection, item refinement by patients and expert interviews, pilot testing patients, and evaluation of validity and reliability.40 Five questions were selected for the pain subscale (4 questions to represent pain intensity and 1 question to represent pain frequency). Ten questions were included in the function subscale. Six of these questions were selected to assess the spectrum of wrist motions and strength during specific daily activities, performed with either one hand or bimanually. Four of these function subscale questions were included to address difficulties experienced when performing usual activities (self-care, household duties, work role, and recreation).
Each item is scored on an 11-point Likert scale ranging from 0 (“no pain or no disability”) to 10 (“worst pain ever or unable to do”). Several total and subscale scores were provided for the PRWE.8,40 The pain subscale score is the sum of scores for the 5 items (maximum score of 50), with higher scores indicating more pain. The function subscale score is the sum of scores for the 10 items (special activities and usual activities) divided by 2 (maximum score of 50). The pain and function subscale scores can be summed to obtain a total PRWE score (range=0–100), equally weighting the pain subscale score and the function subscale score.
Results
A total of 992 patients were identified as having an isolated hand or wrist injury, of whom 368 patients (186 men and 182 women, 37.1%) with a mean age of 43.4 years (SD=14.2) participated in this study. The types of injuries are presented in Table 1. Of these injuries, 90% were fractures, mainly distal radius fractures (130/334). The majority (82%) of the hand and wrist injuries were treated conservatively. The mean time since injury until completing the PRWE-NL questionnaire was 13.4 months (SD=7.0, range=1–25). The PRWE-NL questionnaire was completed by 342 (92.9%) of the respondents. Eight patients (2.2%) had a missing response on the item “work.” The rest of the items were missing in fewer than 2.0% of the patients. We did not impute any of the missing values.
Frequencies of the Wrist and Hand Injuries
The fit statistics of the 5 CFA models are presented in Table 2, and the standardized factor loadings are presented in Table 3. Although the classical goodness-of-fit index, the chi-square statistic, of model 1 was significant (χ2=151.79, df=90, P<.001), this model was associated with adequate absolute fit indexes (RMSEA and SRMR) and incremental fit indexes (CFI and TLI). As shown in Table 3, all factor loadings estimated in model 1 were at least 0.85.
Fit Statistics for the 5 Confirmatory Factor Analysis Modelsa
Results of Different Confirmatory Factor Analysis Models on the PRWE-NL Itemsa
Model 2 yielded a chi-square value of 104.15 (df=89, P=.13). In addition, the absolute fit indexes and the incremental fit indexes suggested that the model fitted the data adequately. In model 2, all items loaded high on 1 of the 2 correlated latent factors (ie, pain and function). The factor loadings ranged from .85 to .96. The covariance between the factors was positive and statistically significant (pain versus function=0.91, P<.001).
Model 3 yielded a chi-square value of 92.81 (df=87, P=.32) and adequate levels of model fit indexes. Factor loadings of all items were large, ranging from 0.86 to 0.97. The covariance between the correlated factors was positive and statistically significant (pain versus specific activities=0.90, pain versus usual activities=0.90, and specific activities versus usual activities=0.93; all P values <.001).
The bifactor model 4 was associated with adequate levels of model fit indexes: χ2=67.76, df=75, P=.71, RMSEA=0.000, SRMR=0.025, CFI=1.00, and TLI=1.00. In model 4, all items loaded high (at least 0.78) on the general factor but much lower (maximum=0.48) on the group factors. For example, the correlated 2- and 3-factor models (models 2 and 3) suggested that item 10 (“Carry a 10-lb [4.5-kg] object in my affected hand”) was a strong indicator of the function and specific subscales (ie, factor loading of 0.95 and 0.97, respectively). In contrast, the bifactor models (models 4 and 5) showed that item 10 was just a very weak indicator (ie, factor loading of 0.07 and 0.03, respectively).
The bifactor model 5 also provided adequate levels of model fit indexes: χ2=72.09, df=75, P=.57, RMSEA=0.000 (90% confidence interval=0.000, 0.028), SRMR=0.029, CFI=1.00, and TLI=1.00. Again, all items in this bifactor model loaded high (at least 0.80) on the general factor but much lower (maximum=0.42) on the group factors.
The factor strength indexes are presented in Table 3. The explained common variance was 0.89 in both model 4 and model 5. The coefficient ωH was high for the general factors in both models. In model 4, the coefficient ωH was high for the general factor (0.93), but the coefficient ωS was low for the group factors (pain and function: 0.24 and 0.04, respectively). A similar pattern was seen in model 5: coefficient ωH was 0.91 for the general factor, but coefficient ωS was low for the group factors (pain, specific activities, and usual activities: 0.19, 0.13, and 0.19, respectively). This finding indicates that a high degree of total variance in the total score can be attributed to the general factor, whereas only a small degree of total variance in the subscale scores can be attributed to the individual group factors.
With regard to internal consistency, the traditional Cronbach α values of the single-factor and correlated 2- and 3-factor models (models 1, 2, and 3) were high, ranging from 0.91 to 0.97. In the bifactor models, coefficient ωT values varied from 0.92 to 0.98. With respect to the general PRWE-NL scores, coefficient ωH was estimated to be 0.93 and 0.91. In contrast, the coefficient ωS values of the group factor scores were found to be considerably lower (ie, less than 0.24 for each group factor in both bifactor models). These findings point out that it is not reasonable to report subscale scores.
The results of the sensitivity analysis are presented in Tables 4 and 5. All models fitted the data adequately. The factor loadings for all models were large and statistically significant. However, although the factor loadings of the general factor were large, the factor loadings of the group factors were mostly small and statistically insignificant. In addition, the explained common variance and the different omega values revealed that the PRWE-NL, also in this subgroup of patients, should be considered a measure of a unidimensional trait and that a single score should be used for the PRWE-NL.
Fit Statistics for the 5 Confirmatory Factor Analysis Models in a Subgroup of Patients With Distal Radius, Carpal, or Metacarpal Fractures (n=235)a
Results of Different Confirmatory Factor Analysis Models on the PRWE-NL Items in a Subgroup of Patients With Distal Radius, Carpal, or Metacarpal Fractures (n=235)
Discussion
To investigate the structure of the PRWE-NL, we used various CFA models to better understand how the components of the PRWE-NL relate to each other and to explore the appropriateness of using total scale scores and subscale scores. In this study, we found that the PRWE-NL reflects a unidimensional trait. In addition, a single score should be used for the PRWE-NL, without subscale scores. In practice, the interpretation of PRWE-NL subscale scores as reliable indicators of a unique construct is extremely limited (ie, very little reliable variance exists beyond the general PRWE-NL factor). Clinical interpretations of the PRWE-NL questionnaire should be restricted to the total PRWE-NL score. The sensitivity analysis, including only patients with distal radius, carpal, or metacarpal fractures, supported these findings.
The original PRWE was described as a measure of pain and disability and structured into 2 subscales: a pain subscale and a function subscale.7 In addition to reporting these 2 subscale scores, most users prefer to report a total PRWE score as well. To date, only 2 factor analyses have been conducted to examine the underlying factors of the PRWE questionnaire.11,17 In a principal component analysis, Wah et al11 retained 2 factors in the Chinese version of the PRWE, with several item cross-loadings. In the Japanese version,17 2 exploratory factor analyses were conducted to examine the unidimensionality of the PRWE pain subscale and function subscale separately. The pain subscale was found to demonstrate unidimensionality, whereas the function subscale exhibited 2 factors, separated into a special function scale and a usual function scale. However, all of these subscales were found to be substantially correlated, with correlations ranging from .59 to .76. In addition, the ratio of the first and second eigenvalues (one of the various methods frequently used to assess unidimensionality) of the function subscale was found to be 4.84, evidence for a common factor. Recently, Packham and MacDermid41 suggested a 3-factor structure (pain, specific activities, and usual activities) of the PRWE. Their Rasch analyses revealed that the total PRWE scale did not fit the Rasch model. After reanalyzing the data, the 3 distinct subscales did show acceptable fit measures. In their study, the PRWE was administered after the absence of any contraindications for hand and wrist movements and at the discretion of the therapists. In addition, fewer than 25% of the included patients had a fracture. These aspects make a comparison with our study difficult.
All models in our study provided adequate fit to the data. It appears that the bifactor models showed the best model fit. These models are associated with lower values of absolute fit indexes (RMSEA and SRMR) and higher values of incremental fit indexes (CFI and TLI). However, several important findings from these models provide evidence that the PRWE-NL has a one-factor structure.
First, the general factor loadings in the bifactor models were similar to the loadings in the unidimensional (single-factor) model, indicating unidimensionality. Any inconsistency among general factor loadings and loadings in the single-factor model would result in problems with the unidimensional model parameter estimates by forcing inherently multidimensional data into a unidimensional structure.42 In addition, the factor loadings were high on the general factor but low on the group factors in the bifactor models. This finding implies that the unique contribution of each group factor over and above the general factor is limited.27 Second, the explained common variances showed that the general factor in both bifactor models accounted for nearly 90% of the common variance, reflecting a high degree of unidimensionality. Third, although the coefficient ωT values estimated in both bifactor models showed outstanding reliability for the latent general factor and the various group factors, the values of coefficient ωH of the latent general factor differed substantially from the coefficient ωS values of the various group factors. Coefficient ωS estimates the reliability of a subscale, controlling for the effects of the general factor, and thus provides useful information on whether scores of a group factor can be interpreted with confidence or only the general factor score should be used. In both bifactor models, coefficient ωS was considerably low for all group factors (range=0.04–0.24), whereas coefficient ωH was high, (>0.91) for the general factors. Although the general factors accounted for more than 90% of the variance in the summed standardized scores, the group factors accounted for only 4% to 24% of the variance in the subscale scores. These coefficient omega values indicate that when both total and subscale scores were to be formed, the interpretation of subscales as reliable indicators of unique constructs is extremely limited.
To summarize, inspection of the factor loadings, the ECV, and the coefficient ωH reveals that the PRWE-NL should be considered as measuring a unidimensional trait. Conceptually, the chosen items for the PRWE regarding pain, specific activities, and usual activities could be seen as indicators of a unidimensional general trait “disability,” comparable to the definition of disability, which is defined as dysfunctioning at one or more levels (impairments, activity limitations, and participation restrictions) of human functioning classified by the ICF.1 It represents the level of dysfunctioning after a traumatic hand and wrist injury.
In the correlated 2-factor and 3-factor models, we found that the standardized loadings of all PRWE-NL items were high and statistically significant. In addition, the covariance among the factors within each model was very high and statistically significant. By creating subscales from one general trait, multicollinearity would compromise our ability to judge the unique contribution of each subscale in predicting some important outcome, indicating poor discriminant validity. When factors overlap this much, it is advisable to combine them to acquire a more parsimonious model. Another problem with creating highly correlated subscales emerges from a bifactor perspective. Subscale scores may appear to be reliable, but that reliability reflects the general trait, not the particular group factors.42 In our study, coefficient ωH values for the general PRWE-NL scores were estimated as .93 and .91, respectively. In contrast, the coefficient ωS values of the group factor scores were found to be considerably lower (ie, less than .24 in both bifactor models). These unique reliabilities associated with the subscale scores are considerably lower than estimated (using Cronbach α) in the 2-factor model or the 3-factor model. Cronbach α can be misleading in bifactor models.31,32 The low estimates of unique internal consistency associated with the scores of the PRWE-NL subscales means that the subscale reliability is limited and that it is not reasonable to report subscale scores for the PRWE-NL. As subscales often are so unreliable compared with total scores, Sinharay and Puhan43 reasoned that the total score is actually a better predictor of an individual's true score on a subscale than the subscale score itself.
This study has some limitations to be noted. First, the response rate was 37.1%. Although such a response rate is not uncommon,44 it could have introduced selection bias. Moreover, included were mainly patients with fractures of the wrist or hand, predominantly treated nonsurgically. This factor may limit the generalizability of the results. However, this study included an adequate sample size of patients, and there was a small number of missing values. In conclusion, these CFA results suggest that the PRWE-NL measures a unidimensional trait and that a single score should be used for the PRWE-NL, without subscale scores. Future studies should assess validity in more detail (eg, content testing), and other measurement properties of the PRWE-NL, such as test-retest reliability, measurement error, and responsiveness, should be evaluated.
Footnotes
Dr El Moumni and Dr Mokkink provided concept/idea/research design. Dr El Moumni, Dr Van Eck, Ms Reininga, and Dr Mokkink provided writing. Dr El Moumni and Dr Van Eck provided data collection. Dr El Moumni, Dr Van Eck, and Dr Mokkink provided data analysis. Dr El Moumni provided project management. Dr Wendt provided participants, facilities/equipment, and institutional liaisons. Dr Wendt and Dr Mokkink provided consultation (including review of manuscript before submission).
- Received January 21, 2015.
- Accepted November 5, 2015.
- © 2016 American Physical Therapy Association