I enjoyed reading the article by Stanton et al,1 and I believe it is timely and well written. I have been a strong advocate for the development of clinical prediction rules (CPRs) for treatment selection in physical therapist practice, as I believe that, ultimately, they can improve the precision of our clinical decision making. Perhaps it is a good time for us to revisit our approach to developing and validating CPRs, and this study raises some important issues for us to consider as we move forward. Three particular issues that I would like to address in this commentary are: (1) the experimental design for developing CPRs, (2) when it is necessary to have a CPR, and (3) the proliferation of single-group CPR derivation studies in our literature without follow-up validation.
The main purpose of a CPR is to inform the clinician under what circumstances, based on items from the patient's clinical history and examination, a particular treatment may be an effective option. In essence, we are trying to match the right patients with the right treatments. From an experimental design point of view, Stanton et al are correct in stating that we are really trying to identify modifiers of treatment effect, or what some may term “treatment moderators,” when we develop CPRs. Treatment moderators specify for whom or under what conditions the treatment is likely to work.2 There is an excellent article by Kraemer et al2 that describes important theoretical and experimental design issues for identifying treatment moderators. According to Kraemer et al, when an investigator examines the association of baseline variables with treatment response using a single group, it really is not possible to distinguish whether any associated variables are moderators of treatment response, nonspecific predictors of treatment outcome, or even correlates of change in response due to artifact (eg, statistical regression to the mean). In order to really identify a treatment moderator, a control or comparison group would be needed so that a moderator variable × treatment group interaction could be examined, thus confirming whether the variable is really a moderator of treatment outcome.2 This approach is directly in line with the recommendations provided by Stanton et al.
It would appear that we have 2 options to consider in developing CPRs. One option is to use a 2-stage process, where a single-arm trial is done first to identify an algorithm of responsiveness (which has been considered the CPR development stage) and then follow up with a second study, which is a randomized clinical trial (RCT) comparing the target treatment with a control or comparison group while performing secondary analyses to examine whether the CPR developed in the first study really does moderate treatment outcome. The second option would be to forgo the single-arm study and derive the CPR directly from an RCT that would be designed specifically to test whether candidate variables moderated the treatment outcome effect, by powering the study to identify candidate variable × treatment interactions. Stanton and colleagues prefer this second approach. My opinion is that there are advantages and disadvantages to both approaches and researchers may select one approach over the other after the advantages and disadvantages are weighed in the context of a number of factors such as the level of knowledge on likely candidate variables or resources available to conduct larger randomized trials.
If the investigator has a fairly good idea of what the likely candidate variables would be to moderate treatment outcome and has the resources (adequate source of participants, adequate amount of funding) to conduct an RCT that could be powered for subanalyses to examine the candidate moderator variable × treatment group interactions, then it would seem best to develop the CPR starting with an RCT. Two clear advantages would be: (1) only one study would be needed to develop and provide some preliminary validity for the CPR, and (2) it would be possible to directly determine whether the candidate variables were indeed moderators of treatment outcome, and we could be fairly confident that the CPR developed from these moderator variables would be likely to hold up under subsequent validation studies. One potential disadvantage of this approach is that if there were a fairly large number of candidate variables to assess and it was somewhat uncertain whether they would pass as moderators of treatment outcome, it would be necessary to explore each variable, which may require a very large sample size for adequate power to detect the significant interactions. If the resources in terms of adequate sample source and funding to conduct such a study were not available, then the ability to identify true moderators would be limited, thus limiting the ability to successfully develop the CPR.
It might be better to select the 2-step approach in developing the CSR in the case where there is not enough knowledge to limit the number of candidate variables to a few variables that we are reasonably certain would be likely moderators of treatment outcome and limited resources would preclude conducting an RCT that would be large enough to include all the candidate variables in subanalyses for examining their potential as treatment moderators. By first examining which variables from the larger group of candidate variables are likely to be predictors of treatment outcome, it may be possible to narrow the field of candidate variables so that hypotheses for testing treatment moderators in the subsequent RCT may be more precise and the required resources to conduct the RCT may be more manageable. Stanton et al have explained that a serious potential risk of this approach is that predictors of treatment outcome are not always moderators of treatment outcome and it is possible that the variables identified in the first step may not be validated in the second step as treatment moderators. I agree with this point, but I believe there are steps that can be taken to minimize this risk.
The success of the 2-step process for developing the CPR is largely dependent on how well the initial single-arm observational study is conducted. Stanton et al have addressed many of the factors that need to be considered in designing the initial observational study in order to increase the probability of success. The sample size in the single-arm study should be consistent with the number of variables that will be explored so that there are enough observations per candidate variable to ensure a reliable result. Sound theoretical rationale (biomechanical, biological, psychological plausibility) for including candidate variables in the analysis should be provided so that the risk of identifying spurious variables is minimized or eliminated. The length of the follow-up periods to determine whether the outcome of treatment is successful should be long enough to be relevant to the patient population under examination (ie, follow-up periods for people with acute injury may not be the same as follow-up periods for people with chronic conditions such as arthritis). I share a similar frustration with Stanton et al that many of the current single-group CPR derivation studies have not met these requirements. We do not know at this point whether the variables identified in these studies are true treatment moderators, because the subsequent RCTs have not been reported. The increasing development of larger patient databases in health care systems, where the data collection procedures for these databases are standardized, may improve the quality of preliminary single-arm observational studies to develop hypotheses to test candidate treatment moderators in a subsequent RCT. One may still argue that when looking at large numbers of candidate variables, there is always a chance that spurious variables will be identified as significant predictors. However, if faced with the “risk” of having a spurious variable identified versus an underpowered study that is likely to not identify an important predictor variable, I would take the former because I am confident that the validation process will ultimately identify the spurious variable.
Regardless of the approach an investigator selects to pursue development of a CPR, ultimately an RCT with subanalyses to test for the rule status × treatment interaction must be performed to qualify that a CPR has indeed been developed. Although the results of a single-arm derivation study may begin to inform clinicians of factors that might improve their decision making for a given treatment approach, it should be recognized that the CPR is really not a CPR until the RCT confirming treatment effect moderation has been completed.
There is one small point of disagreement I have with Stanton et al in their discussion of validation of CPRs. They use a previous study3 from their group as an example of a “broad” validation study for the CPR for spinal manipulation developed by Flynn et al4 and “narrowly” validated by Childs et al.5 They conclude that the validation of the CPR was not reproduced in this broader validation study. They acknowledge that a possible explanation was that the intervention used in the broader validation study was modified from that used in the original CPR. The original CPR was specifically for high-thrust manipulation. In the broader validation study, 97% of the participants received a low-thrust manipulation,3 or what some people may consider a mobilization intervention. In my opinion, this “modification” of intervention is a serious design flaw of that study, and I do not accept the notion that this study was indeed a broad validation study of this CPR. If we put it in the context of treatment dosage, it would not be surprising to find that a CPR designed for a high-dose treatment does not work when it is applied for a low-dosed or underdosed treatment because, in effect, they are probably no longer the same treatment. If we are going to critically review the study designs used to validate CPRs, then perhaps we should be as critical of our own work as we are of other investigators.
The second issue I would like to address has to do with whether we really need a CPR in certain circumstances. Fritz suggested, “Prediction rules have the greatest potential to favorably impact physical therapy care when they are developed for clinical conditions that are by nature heterogeneous, with several viable yet discrete treatment approaches, and some degree of risk associated with an incorrect choice. In these circumstances, clinical decision making is complex, uncertain, and most likely to benefit from tools such as CPRs.”6(p160) Stanton et al suggest that when the proportion of patients who are appropriate for application of the CPR is low in the population, then we should question whether the CPR really has much impact or relevance. This logic would preclude almost any approach to studying target disorders of low prevalence. I do not necessarily disagree with this point in some instances, but we also should consider that at times there may be patient groups that are of disproportionately low prevalence in our clinic populations but also are challenging where treatment decision making is concerned. A CPR that could guide effective treatment decision making for these patients could still be very relevant, despite the fact that the prevalence in the population is low.
Another instance where I believe we should really question whether a CPR is needed is when we start with a pretest or pretreatment probability of success that is already pretty high. For example, there are 2 studies on the development of a CPR for treatment selection that reported pretreatment probabilities of success on the order of 61%7 and 75%.8 Acknowledging that in both studies application of the CPR improved posttreatment probability of success (89%7 and 95%8), I would argue that the pretreatment probabilities of success alone would induce me to try these treatment approaches without even considering use of the CPR derived in these studies. If you give me a treatment approach where the probability of success is likely to be greater than 60% and the probability of doing harm is low, I would not need a CPR to guide my decision to try the treatment approach.
The final issue I would like to address is the point by Stanton et al that physical therapist researchers appear to be inundating our literature with new single-group CPR derivation studies without validating existing ones. I agree that this is a concern. Without advancing to the validation step, we are really no better off than if all of our evidence-based decisions were made based on case reports or case series. I think I have made it clear that I agree with Stanton et al that using randomized trial designs at the derivation stage would be the preferred way to go. On the other hand, I can see that CPRs that are derived from single-group studies could eventually become validated, provided that they ultimately confirm a real rule status × treatment interaction in sound randomized trials. I urge my colleagues who have already developed CPRs to go the extra step and complete the validation process.
I also have some suggestions for our journals that will review and publish future studies concerning CPR development and validation to consider: (1) limit the publication of CPR derivation studies to those that examine the moderator variable × treatment interaction in randomized trials using a control or comparison group, or (2) for those studies using a single-group design to derive a CPR, delay publication until at least a narrow validation study also has been performed, or (3) if it is believed that even the single-arm observational study has important information to disseminate, then publish it, not as the derivation of a CPR, but rather as a report on predictors of outcome that need further exploration as treatment moderators that could be used to develop a CPR. I see the present situation of the proliferation of CPR derivation studies as similar to the proliferation of reliability studies in the 1980s and early 1990s. Currently, PTJ is unlikely to publish a study concerning the reliability of a measurement unless there also is information concerning measurement validity and clinical utility. I believe my second suggestion where CPRs are concerned would parallel this philosophy.
I believe in the pursuit of developing and validating CPRs to enhance our treatment decision making in physical therapy, and I commend all of my colleagues who have dedicated themselves to this pursuit. I want to thank Stanton et al for their thought-provoking and timely report on the state of the science in this area, and I thank the Editorial Board for giving me the privilege to participate in the dialogue on this exciting and important topic.
- © 2010 American Physical Therapy Association