Abstract
Background Pelvic-floor dysfunction (PFD) affects a substantial proportion of individuals, mostly women. In responding to the demands in measuring PFD outcomes in outpatient rehabilitation, the Urinary Incontinence Questionnaire (UIQ) was developed by FOTO in collaboration with an experienced physical therapist who has a specialty in treating patients with PFD.
Objective The purpose of this study was to evaluate psychometric properties and practicability of the 21-item UIQ in patients seeking outpatient physical therapy services due to PFD.
Design This was a retrospective analysis of cross-sectional data from 1,628 patients (mean age=53 years, SD=16, range=18–91) being treated for their PFD in 91 outpatient physical therapy clinics in 24 states (United States).
Methods Using a 2-parameter logistic item response theory (IRT) procedure and the graded response model, the UIQ was assessed for unidimensionality and local independence, differential item functioning (DIF), discriminating ability, item hierarchical structure, and test precision.
Results Four items were dropped to improve unidimensionality and discriminating ability. Remaining UIQ items met IRT assumptions of unidimensionality and local independence. One item was adjusted for DIF by age group. Item difficulties were suitable for patients with PFD with no ceiling or floor effect. Item difficulty parameters ranged from −2.20 to 0.39 logits. Endorsed items representing highest difficulty levels were related to control urine flow, impact of leaking urine on life, and confidence to control the urine leakage problem. Item discrimination parameters ranged from 0.48 to 1.18. Items with higher discriminating abilities were those related to impact on life of leaking urine, confidence to control the urine leakage problem, and the number of protective garments for urine leakage.
Limitations Because this study was a secondary analysis of prospectively collected data, missing data might have influenced our results.
Conclusions Preliminary analyses supported sound psychometric properties of the UIQ items and their initial use for patients with PFD in outpatient physical therapy services.
Pelvic-floor dysfunction (PFD) affects a substantial proportion of individuals, mostly women.1–3 It is estimated that up to one third of adults experience one or more PFD conditions during their lifetime.2,3 To improve functional outcomes and reduce PFD symptoms, many patients seek outpatient pelvic-floor physical therapy.4 In a previous longitudinal cohort of 2,452 patients with PFD receiving outpatient physical therapy services,5 most patients (92%) were female, and for most of them the PFD had been present for more than 90 days (74%). A majority (55%) had urinary leakage, and combinations of urinary, bowel, and pelvic-floor pain disorders were common (37%).
To assist in clinical care planning and outcomes assessment in patients with PFD, there is an increasing demand for patient-reported outcomes (PROs) to be applied in this patient population during routine clinical practice and research.6,7 There are several reasons that stimulate this demand. First, individuals with PFD are commonly managed in outpatient physical therapy services.8–10 Second, to assess the PFD outcomes, many health indicators by nature rely on subjective patient reports. For example, PFD symptoms commonly include urinary urgency, urinary frequency, bowel constipation, pelvic pain, and sexual dysfunction. Functional outcomes of PFD frequently involve whether patients have reduced urgency and frequency, less restriction doing daily activities, or more ability to participate in social events. These assessments strongly rely on patients' perspectives, instead of laboratory tests or physical examination. Third, because PRO measures provide information related to patients' perception of their health status without interpretation from clinicians or a third party, several institutes such as the National Institutes of Health,11 Food and Drug Administration,12 and World Health Organization6 are encouraging the medical research community to use PROs to support intervention effectiveness13–15 and monitor patient management.16
In 1998, the first International Consultation on Incontinence (ICI) was held,6 and the ICI Scientific Committee recognized the need to develop a universally applicable questionnaire for wide application across international populations in clinical practice and research to assess urinary incontinence. Since then, many questionnaires measuring urinary incontinence have been developed, such as the ICIQ-UI Short Form,17 Incontinence Impact Questionnaire (IIQ),18 Pelvic Floor Distress Inventory (PFDI),19,20 Pelvic Floor Impact Questionnaire (PFIQ),19,20 and Urogenital Distress Inventory (UDI).21 In responding to the demands in measuring PFD outcomes in outpatient rehabilitation, the FOTO Pelvic Floor Dysfunction Assessment was designed by Focus On Therapeutic Outcomes, Inc (FOTO) in collaboration with an experienced physical therapist who has a specialty in treating patients with PFD. Questions were designed that would be sensitive to change in the issues of greatest concern to patients with PFD seeking outpatient rehabilitation therapy and to develop an item response theory (IRT)-based item bank suitable for computerized adaptive testing (CAT) application for this patient population. One part of the development involved an assessment of face validity by collecting feedback on the initial item bank (item description and rating categories) from a small group of physical therapist clinical experts. In 2008, FOTO added various patient history–related questions, along with 4 additional validated surveys for patients with PFD to facilitate research at the Rehabilitation Institute of Chicago and collect pilot data for the initial item bank. The additional validated surveys included the PFDI, PFIQ, Pelvic Floor Prolapse/Urinary Incontinence Sexual Function Questionnaire (PISQ), and Pain Disability Index (PDI).
The psychometric properties of the initial FOTO PFD item bank have not been studied. The purpose of the current study was to evaluate psychometric properties and practicability of the self-report Urinary Incontinence Questionnaire (UIQ), as part of the FOTO Pelvic Floor Dysfunction Assessment, in patients with PFD seeking outpatient physical therapy services.
Method
Data Collection
The platform used for outcomes data collection has been described.5 Briefly, patients with PFD were managed in outpatient rehabilitation clinics participating with FOTO, an international medical rehabilitation outcomes database management company.22,23 Prior to initial evaluation and therapy (intake), patients entered demographic data and completed self-report surveys using Patient Inquiry, a computer program developed by FOTO (Knoxville, Tennessee).22,23 Demographic variables of interest were age, sex, symptom acuity, surgical history, number of comorbid conditions, exercise history, and payer source. Data on age were collected with age as a continuous variable and categorized as 18 to 44, 45 to 64, and 65 years and older. The participants' sex was categorized as female and male. Symptom acuity, which we operationally defined as the number of calendar days from the date of onset of the condition being treated to the date of initial therapy evaluation, was categorized as acute (<22 days), subacute (22–90 days), and chronic (>90 days). Surgical history was categorized as none, 1, 2, 3, or 4 or more surgeries related to the condition being treated. Number of comorbid conditions was assessed using a list of 29 conditions common to patients entering an outpatient rehabilitation clinic (eg, arthritis, asthma, diabetes, heart attack, AIDS, sleep disturbance, cancer).24,25 Exercise history prior to receiving therapy was categorized as exercising 3 times a week or more, exercising 1 to 2 times a week, or exercising seldom or never. Last, more than 15 payer sources (eg, preferred provider organization, Medicare) were listed for patient to select from.
When clinic staff recorded patient data into the software and the staff selected “Pelvic Floor” as the broad heading for the reason for treatment, PFD-related questions were administered to the patients. Because data were collected in routine, busy outpatient clinics, we used a branching system to administer questions to collect data efficiently and reduce administrative burden (ie, reduced the number of items administered). When PFD surveys were administered, patients were instructed to select disorders that might apply to them (ie, urinary, bowel, and pelvic pain). For any selected disorder, subsequent subtypes pertinent to a specific disorder were given. For example, if patients selected “urinary,” they were instructed to select a more detailed subtype (ie, leakage, frequency, or retention). At any time, patients could choose one, more than one, or no subtype. Patients could skip any question and proceed to the next question without explanation. Based on the subtype reported by the patient, only items relevant to that subtype were given, which led to 7 possible branching routines that produced groups of patients with different numbers of items asked. Patients received the full 21 UIQ items only if they selected all 3 subtypes (leakage, frequency, and retention).
UIQ
The UIQ was designed to evaluate urinary function in patients with PFD seeking outpatient physical therapy services. The UIQ consists of 21 items: 17 related to urinary leakage problems, 2 related to frequency problems, and 2 related to retention problems. Each item has its own Likert rating scale structure and operational definition (Appendix).
Data were selected from the database if patients: (1) were 18 years of age or older, (2) were managed for their PFD problems, (3) received outpatient physical therapy services, and (4) responded to FOTO Patient Inquiry computer-based UIQ items at admission to therapy between May 2007 and January 2011.
Analytical Procedure
We assessed the UIQ for its unidimensionality and local independence, differential item functioning (DIF), discriminating ability, item hierarchical structure, and test precision using the two-parameter logistic Item Response Theory (IRT) approach.
Data management.
Prior to data analysis, item responses from all items, except item 17, were recoded, with higher (more positive) responses representing higher functioning. As an example, the original rating categories of item 1 were reversed (ie, the rating categories of 1 to 6 were replaced with those of 6 to 1) so patients with higher scores were those patients who never have urine leakage when they are awake. Based on our preliminary analysis using a 1-parameter IRT model, the category thresholds increased in order (ie, there were no disorder thresholds). For item 17, we collapsed 2 of the lowest (1 and 2) and highest (10 and 11) responses because of low frequency counts (11% and 5% for items 1 and 2 and 7% and 2% for items 10 and 11, respectively) for those category choices and challenges in analyzing responses with 2-digit width.
Unidimensionality and local independence.
To assess IRT assumptions of unidimentionality and local independence, we conducted exploratory factor analyses (EFAs) of latent trait variables, followed by confirmatory factor analyses (CFAs) utilizing Mplus (Muthén & Muthén, Los Angeles, California)26 on all items.
Unidimensionality of a scale means its items represent only one construct.27 To test for unidimensionality, we analyzed (1) the factor loadings and (2) variances explained by each factor. As suggested by Nunnally,28 we eliminated items with factor loadings below 0.40.
Local independence means that, after taking into account patient ability, patient responses to the items are statistically independent.27 To test for local independence, we analyzed (1) the residual correlation matrix, (2) the magnitude of the standardized coefficients, and (3) the percentage of absolute residual correlations >0.10. Model fit was evaluated using comparative fit index (CFI), the Tucker-Lewis index (TLI), and the root-mean-square error of approximation (RMSEA). The TLI and CFI range from 0 (poor fit) to 1 (good fit). Values of CFI and TLI greater than 0.90 are indicative of good model fit; RMSEA values less than 0.08 suggest adequate fit.29 To our knowledge, there is no empirically substantiated standard for the cutoff of residual correlation. We eliminated one item in each pair of items with a residual correlation of 0.20 or more.30 Items that had a higher number of residual correlation (>0.10) with other items were inspected and removed if necessary to improve the model fit.
Because the minimum covariance coverage was not fulfilled for all items using the original data set due to missing values, for the purposes of assessing unidimensionality and local independence of the 21 UIQ items, we generated a set of data where imputed values supplanted missing responses, as described by Hart et al.31 To generate the imputed values, the original data set, which contained actual responses and missing values, was used to generate a simulated set of values using Masters' partial credit model (PCM)32 and WINSTEPS software (Winsteps, Chicago, Illinois).33 Once a complete set of imputed responses was generated, each missing response in half of the original data set (ie, 50% of the patient records) was randomly selected and replaced with the imputed value for that patient. The simulated data set was used only to assess unidimentionality and local independence utilizing Mplus.26 The original data set was used for the remaining analyses.
DIF.
All patients at a given level of ability should have an equal probability of scoring positively on each item regardless of their group membership (eg, sex).34 Items are flagged “significant DIF” when this requirement does not hold. Measuring DIF was 1 of 10 recommendations for advancing patient-centered outcomes measurement35 because if items in a health assessment instrument are biased, detection rates can be overestimated or underestimated.35
For the purposes of DIF detection, we followed a method developed by Crane et al36 and described in detail by Hart et al37 and Nilsagård and Forsberg.38 Specifically, we calibrated item responses to Samejima's 2-parameter graded response model (GRM)39 using Parscale (Scientific Software International Inc, Lincolnwood, Illinois)40 and difwithpar software (University of Washington, Seattle, Washington).41 The difwithpar software examines 3 ordinal logistic regression (OLR) models for each item and each demographic category selected for analysis: sex (female and male), age group (18–44, 45–64, and ≥65 years), symptom acuity (acute, subacute, and chronic), and number of PFD comorbid conditions (1=patient reported only one urinary problem, 2=urinary and one other symptom, 3=urinary, bowel, and pelvic pain symptoms). As described by Crane et al,36 items were examined for the presence of (1) uniform DIF by examining the relative difference between beta coefficients in the regression models (ie, a 10% difference) and (2) nonuniform DIF by comparing the −2 log likelihoods of 2 of the regression models. Uniform DIF exists when the probability of answering the item correctly or endorsing the same rating category is greater for one group than the other uniformly over all levels of ability. Nonuniform DIF exists when there is interaction between ability level and group membership (sex, age group, symptom acuity), with certain combinations having a higher probability of answering the item correctly or endorsing the same rating category.
Discriminating ability.
We continued to use Samejima's 2-parameter GRM39 to estimate item parameters. The GRM was selected because it is a model for polytomous ordinal data,39 and it allows items to have different slopes (ie, discrimination parameters). The slopes allowed us to assess how well each item is able to discriminate between patients with different abilities (ie, high and low urinary function), as well as to estimate item information functions for each item. The slopes were expressed in logits, with higher positive values indicating a better discriminating ability. Items with a low slope of <0.40 were excluded from the item pool because of low discriminating ability.
Item hierarchical structure.
Item difficulty hierarchical order was inspected via estimated item difficulty parameters. Item difficulty parameters were expressed in logits with higher positive values indicating a more challenging task that usually is accomplished or endorsed by patients with higher functioning.
Test precision.
We assessed the test precision using the test information function (TIF) and standard error (SE). The TIF27,42 indicates the level of information or score precision provided by the scale over the range of the construct's continuum and is the sum of the item information functions (IIFs) at each patient ability level along the construct's continuum being measured (ie, urinary function). The amount of information provided by a scale at each ability level is inversely related to the error with which functional status is estimated at that level of ability.42 We plotted the TIF generated using data from the UIQ items. The shape of the TIF provides a visual comparison of the level of test precision for UIQ items. To quantify measure precision at each ability level, we plotted averaged SEs of functional status estimates from the UIQ item and superposed with the TIF.
Results
Data from 1,628 patients with PFD symptoms receiving outpatient rehabilitation in 91 clinics in 24 states were analyzed (Tab. 1). Patients were primarily female (93% female), with 75% of patients being under 65 years of age (mean age=53 years, SD=16, range=18–91) and having chronic PFD. Of 1,628 patients who reported urinary problem, 58% had solely urinary problems, 15% had both urinary problems and pelvic pain, 14% had both urinary and bowel problems, and 13% had urinary and bowel problems as well as pelvic pain. Most patients had urinary problems affecting leakage (82%), with fewer reporting problems with urinary frequency (60%) or retention (27%).
Patient Characteristics at Rehabilitation Intake (N=1,628)
Unidimensionality and Local Independence
The EFA indicated that the 21 UIQ items tended to represent one dominant factor, with the first 3 factors explaining 42%, 6%, and 5% of the total variance. Preliminary analysis showed no item pair had a residual correlation of 0.20 or more. The results suggested possible local dependence between 21 item pairs (10%) with absolute correlation residuals higher than desired (>0.10). After inspecting the patterns, we decided to remove items 2 (How much urine usually leaks for no obvious reason when you are awake?) and 11 (How much urine usually leaks when you are physically active or coughing or sneezing?) because of redundancy, but kept other items based on clinical reasons to cover different types of urinary incontinence. In addition, item 18 (What is the frequency of your daytime urination?) had a low loading (0.4) on the first factor. We felt item 18 was more descriptive than functional and thus removed it.
The remaining 18-item set was reanalyzed. All remaining items met the evaluation criteria. The first 3 eigenvalues were 7.81, 1.20, and 1.02, with the first 3 factors explaining 43%, 7%, and 6% of data variance. Fit statistics for 1-, 2-, and 3-factor models were CFI values of 0.88, 0.94, and 0.96, respectively, TLI values of 0.97, 0.98, and 0.99, respectively, and RMSEA values of 0.07, 0.05, and 0.04, respectively, supporting unidimensionality.
DIF
After removing items 2, 11, and 18, the results of DIF analysis using the 18 UIQ items with real data values were suggestive of no DIF by sex, age group, acuity, and number of PFD comorbid conditions, except the presence of nonuniform DIF by sex for item 12 (What type of protection do you use for your urine leakage?) (P<.0001) and uniform and nonuniform DIF by age group for item 15 (To what extent do you feel your sex life has been affected by urine leakage?) (P<.0001 and change in estimate >0.1). Detailed inspection of item 12 showed female patients tended to use underpants liners or mini-pads, whereas male patients did not. Temporarily removing the response category 2 from item 12 by treating it as a missing value eliminated the DIF effect. Item 15 was split into 3 new items by age group: age group 1 (18–44 years), age group 2 (45–64 years), and age group 3 (≥65 years) to account for the DIF effect. However, due to low frequency counts on response categories of age group 3 after splitting, the convergence was not achieved. Because we were unable to obtain stable parameter estimations on item 15 for age group 3, this item was removed from the parameter estimation analysis (described below).
Discriminating Ability
Item 19 (How often do you urinate at night?) had a slope of 0.28 (<0.40) and was excluded from the item pool because of its low discriminating ability. Table 2 lists the item characteristics of the remaining UIQ items sorted by the item difficulty parameter. Item discrimination parameters ranged from 0.48 to 1.18. When comparing the item discrimination parameters, item 14 had the highest item discrimination value, followed by items 16, 13, 7, 1, and 17, implying these items were able to discriminate between patients of different ability within a narrow effective range around their item difficulty parameter estimates.
Item Characteristics of the Urinary Incontinence Questionnaire (UIQ) Itemsa
Item Hierarchical Structure
Item hierarchical structure of the final UIQ items is presented in Table 2. The numbers of patients who responded to specific items are listed in the “Frequency Count” column. Items are ranked based on the item difficulty parameter, with more difficult items on the top. Item difficulty parameters ranged from −2.20 to 0.39 (logits). Items representing more difficult tasks to be endorsed by patients with a high level of functioning were related to control of urine flow (item 21), impact of leaking urine on life (item 14), and confidence in ability to control the urine leakage problem (items 16 and 17). Items representing easier tasks endorsed by patients with a low level of functioning were related to the amount of urine leakage under different situations (items 6, 4, and 8).
The patient ability distribution was bell-shaped, with no ceiling or floor effects. The mean of the patient ability estimations was 0.00 (SD=0.83). Patient ability parameters ranged from −3.61 to 2.87 (logits). Compared with the patient ability distribution, the UIQ items were slightly easier relative to this sample's overall ability level. Figure 1 illustrates the item-person map of the UIQ items.
Item-person map of the Urinary Incontinence Questionnaire (UIQ). The item-person map was derived by analyzing the UIQ items using Samejima's 2-parameter graded response model and Parscale. The map illustrates the relationship of the person score distribution (right) with the hierarchical order of UIQ items (left). Both person ability and item difficulty are expressed on a common metric, which is expressed along the central axis in logits, with higher positive values indicating a more difficult item or a person with a higher level of functioning.
Test Precision
Figure 2 illustrates a bell-shaped TIF curve with one peak located at the middle ability level. The SE values were small in the middle range of patient ability measures but increased as ability measures (logits) became extreme. The average SE value for all patients was 1.84, but the average SE value for 90% of the patients with ability measures between −1.4 and 1.4 was 0.71.
Test information function (TIF) and standard error (SE), illustrating a bell-shaped TIF curve with one peak located at the middle ability level. The SE values were small in the middle range of patient ability measures but increased as ability measures (logits) became extreme. Overall, the TIF curve shifted slightly toward the left (lower ability measures), which implied more difficult items were needed to increase test information and thus reduce the measurement error at the high-functioning level.
For individual item information (IIF) curves, item 14 had the highest peak, followed by items 13, 17, 7, 16, and 1. These items could be potential items for single-item screening purposes. However, the TIF curve shifted slightly toward the left (lower ability measures), which implied more difficult items were needed to increase test information and thus reduce the measurement error at the high-functioning level.
Discussion
The purpose of this study was to evaluate psychometric properties and practicability of the UIQ in patients seeking outpatient physical therapy services due to PFD. Overall, the results showed that the final UIQ scale produced reliable and precise measures of urinary function for patients at different levels of urinary function. The results indicated that the final revised UIQ items met IRT assumptions of unidimensionality and local independence and were free from DIF for the variables assessed. Measures of urinary function were free from floor and ceiling effects and covered the functional continuum well with good measurement precision. Item difficulties were suitable for patients with PFD with different levels of urinary function. More challenging and discriminating items are recommended to expand the existing item bank. The data fit the GRM measurement model well. Findings from this study will be used to develop an initial pelvic-floor, body part–specific CAT application to be used in the outpatient physical therapy services.
To our knowledge, this is the first study designed to develop an IRT-based item bank suitable for CAT application for patients with PFD seeking outpatient rehabilitation therapy. Our results suggest the UIQ scale represents an adequate first step in the development of multiple CATs for this population, particularly because we analyzed data from a relatively large sample (N=1,628). Two previous studies used IRT methods to examine the psychometric properties of urinary incontinence questionnaires: Handa and Massof18 (N=27 women with stress urinary incontinence) and Bower et al43 (N=156 children with bladder dysfunction). Compared with these 2 studies,18,43 our larger and more diverse sample should produce more stable and precise estimates of item parameters for patients with PFD in general. Comparing our results with the findings of these 2 studies was difficult because the questionnaires used were related to the quality of life in children (eg, body image, family and home, self-esteem)43 or the impact of urinary incontinence on social life (eg, hobbies, ability to do household chores, going on vacation),18 whereas the UIQ emphasizes urinary urgency and frequency, as well as severity of the urinary symptoms.
We were unable to run Mplus26 to assess unidimentionality and local independence using our original data set because the minimum covariance coverage was not fulfilled for all items (insufficient frequency counts for all items). As a result, we generated a data set in which each missing response in half (50%) of the original data set was randomly selected and replaced with an imputed value. Such replacement may lead to better results than using the original data set with real values. We explored such an effect by generating 2 additional data sets where 25% and 100% of the original data set were randomly selected and replaced with imputed values and by conducting the same analytical procedures. Comparing the CFI, TLI, and RMSEA values of these 3 data sets (with 25%, 50%, and 100% records supplemented with imputed values), all 3 analyses demonstrated that one factor was sufficient for adequate model fit. For 25%, 50%, and 100% imputed data sets, respectively, there were local dependence relationships among 35 (15%), 21 (10%), and 3 (1%) item pairs (out of 210 item pairs), with absolute correlation residuals higher than desired (>0.10). Because the data set with 25% imputed data revealed too many large correlation residuals to examine the pattern, and the data set with 100% imputed data showed unrealistically good results, the data set with 50% imputed values was used.
To make the decision of removing items using the IRT methods, different criteria existed. To test unidimensionality and local independence, we chose a selection cutoff of a correlation residual of 0.20,30 although a cutoff of 0.25 has been used.44 We used a more restrictive criterion because we expected better results using the imputed data set than using the original data set with just real values. To assess the discriminating ability, we decided to remove items with a low slope of <0.40, although a much higher criterion of 0.70 has been used.44 On average, the majority of UIQ items had relatively low discrimination parameters. Lower estimations of discrimination parameters may suggest: (1) modifications of wording of the question or rating scale structure or (2) challenges in quantifying the urinary function accurately because the leakage, frequency, and retention problems may partially depend on the details of daily events (eg, beverages a person consumes in a day, a sudden cough, heavy lifting). Keeping items with low slope values in the item pool should not affect the measurement, although these items would have a smaller chance of being selected in the CAT application.44
In the process of developing the questionnaire, we administered the same questionnaire to both male and female participants. In the future, as we continue to collect more data, we intend to develop sex-specific surveys because urinary and bowel structures and sex functions are very different between sexes. To examine the sex factor, we used a method developed by Crane et al36 for DIF detection by sex. Results supported clinically relevant findings in sex differences in using type of protection (item 12) and age differences in sex life (item 15). In a follow-up analysis, we inspected data from item 12 that appeared to be geared toward female participants. We did observe that female patients tended to select “underpants liners or mini-pads” (14% of female patients who responded to item 12) and that relatively few male patients (3% of male patients who responded to item 12) selected that response based on the frequency count. However, both female and male patients responded to item 12 under the predicted hypothesis that patients who have more severe urinary incontinence symptoms would rely on more protection. Although removing the response category 2 from item 12 by treating it as a missing value resulted in no DIF by sex, the current male sample size was small (only 48 male patients responded to item 12). Therefore, we should be cautious in generalizing our results to the male population, and we will continue monitoring item 12 in the future.
To account for the DIF effect, we split item 15 into 3 new items by age group. There seems to be a general tendency that the impact of urine leakage on sex life decreases by age, where the younger group feels sex life has been affected by urine leakage the most. However, there was no perceptible impact on urinary function estimates when adjusting for DIF; the correlation between the unadjusted and fully adjusted ability estimates was 0.999, similar to the finding by Crane et al,45 suggesting no practical DIF.
We used the GRM measurement model to perform the initial examination of the psychometric properties of the UIQ items because it is a model for polytomous ordinal data39 and it is a 2-parameter model containing both item difficulty and discrimination parameters. In a follow-up analysis, we analyzed the same data set using Masters' one-parameter partial credit model (PCM)32 and WINSTEPS software.33 We found that most results were similar. The item hierarchical structure remained, except item 12 became an easier item compared with the estimate using the GRM. Similarly, the distribution of ability estimations was normally distributed, with no obvious ceiling or floor effect. Findings suggested that the UIQ data fit the PCM well, with no items showing misfit (all infit or outfit values were <1.4 and >0.6). The results of the TIF analysis also showed a bell-shaped TIF with one peak located at the middle ability level and indicated that the UIQ was reliable and precise for measuring most patients at different levels of urinary function. Lastly, with the person-separation index (G) equal to 0.95, these UIQ items separated person ability into 1.6 (ie, [4 × 0.95 + 1]/3) statistically distinct strata, indicating the need to add more challenging or easier items to distinguish patients into different levels of urinary function. As a result, the PCM measurement model seemed to be a better choice, although the item discrimination parameters were varied among UIQ items (0.48–1.18).
There were several limitations of this study. First, because this study was a secondary analysis of prospectively collected data via a proprietary database management company (FOTO), we were not in control of the data collection procedure, and there was no specific timetable for patients to be assessed, as no training was given to therapists prior to the data collection. Additionally, generalizability of results may be limited because differences between participating clinics and clinics that do not collect data using FOTO may exist.
Because data were collected in routine, busy outpatient rehabilitation clinics, PFD items were selected from the computer-based administrative branching algorithm to reduce the respondent burden. By utilizing this type of data collection approach, the presence of missing data due to unanswered items makes statistical analyses challenging. In this data set, there were 1,628 patients who took the UIQ at rehabilitation admission. The number of patients who responded to a specific item ranged from 294 to 1,028, providing a sufficient sample size even for items with low response rates. Additionally, based on the fact that the UIQ was administered in 91 outpatient physical therapy clinics in 24 states, we believe the impact of potential patient selection bias was reduced simply by sampling from a wide variety of clinics in many locations.
To run certain analyses, we used imputed data to replace missing values. We acknowledge that data sets with imputed values produce artificially more ideal results. Although we did not test the impact of using imputed responses versus complete original responses on the factor analytic results, preliminary results studied by Hart et al31 showed that the patient ability estimates were similar and highly correlated across data sets using original responses with missing values, original responses with imputed values for missing responses, and entirely imputed values. Similarly, due to the challenges in analyzing responses with 2-digit width (ie, item 17 with response categories of 1–11), we collapsed 2 of the lowest and highest responses. Although the real impact is unknown, we did monitor the potential influence on the item calibration of item 17 by comparing the results derived from the 2-parameter GRM using Parscale and the results derived from the PCM using WINSTEPS. The results were similar, with item 21 the most challenging item and item 17 remaining one of the 3 most difficult items.
Last, we did not use medical terminology to classify patients. For instance, urinary incontinence is divided into stress urinary incontinence, urge urinary incontinence, and overflow urinary incontinence. Because data were collected from patient self-report surveys, we used general descriptions with the intention of avoiding self-judgments from patients. Future studies should endeavor to reduce the potential for misclassifying patients by collecting more complete medical information. Classifying patients correctly should assist researchers developing PFD CATs that can discriminate patients by stress, urge, overflow, or mixed urinary incontinence, if appropriate.
Conclusion
The preliminary analyses supported sound psychometric properties of the UIQ items and their use in patients with PFD seeking treatment in outpatient physical therapy services. Findings from this study will be used to develop an initial pelvic-floor, body part–specific CAT application to be used in outpatient physical therapy services.
Appendix.
Urinary Incontinence Questionnaire (UIQ)a
a The Urinary Incontinence Questionnaire may not be used or reproduced without written permission from the authors.
Footnotes
Dr Wang and Dr Hart provided concept/idea/research design. Dr Wang, Dr Hart, and Dr Yen provided writing. Mr Mioduski provided data collection, project management, and study participants. Dr Wang provided data analysis. Dr Hart, Dr Deutscher, and Dr Yen provided consultation (including review of manuscript before submission).
The institutional review boards of Focus On Therapeutic Outcomes, Inc and the University of Wisconsin–Milwaukee approved the study procedures.
This research, in part, was presented at the Combined Sections Meeting of the American Physical Therapy Association; February 8–12, 2012; Chicago, Illinois.
- Received March 27, 2012.
- Accepted April 4, 2013.
- © 2013 American Physical Therapy Association