Abstract
Background History taking is an important component of patient/client management. Assessment of student history-taking competency can be achieved via a standardized tool. The ECHOWS tool has been shown to be valid with modest intrarater reliability in a previous study but did not demonstrate sufficient power to definitively prove its stability.
Objective The purposes of this study were: (1) to assess the reliability of the ECHOWS tool for student assessment of patient interviewing skills and (2) to determine whether the tool discerns between novice and experienced skill levels.
Design A reliability and construct validity assessment was conducted.
Methods Three faculty members from the United States and Australia scored videotaped histories from standardized patients taken by students and experienced clinicians from each of these countries. The tapes were scored twice, 3 to 6 weeks apart. Reliability was assessed using interclass correlation coefficients (ICCs) and repeated measures. Analysis of variance models assessed the ability of the tool to discern between novice and experienced skill levels.
Results The ECHOWS tool showed excellent intrarater reliability (ICC [3,1]=.74–.89) and good interrater reliability (ICC [2,1]=.55) as a whole. The summary of performance (S) section showed poor interrater reliability (ICC [2,1]=.27). There was no statistical difference in performance on the tool between novice and experienced clinicians.
Limitations A possible ceiling effect may occur when standardized patients are not coached to provide complex and obtuse responses to interviewer questions. Variation in familiarity with the ECHOWS tool and in use of the online training may have influenced scoring of the S section.
Conclusion The ECHOWS tool demonstrates excellent intrarater reliability and moderate interrater reliability. Sufficient training with the tool prior to student assessment is recommended. The S section must evolve in order to provide a more discerning measure of interviewing skills.
As noted in the Guide to Physical Therapist Practice 3.01 of the American Physical Therapy Association (APTA), patient interviews or patient histories comprise a major component of the initial examination process during nearly every patient/client encounter. Additionally, APTA lists history taking and communication competencies as part of the Minimum Required Skills of Physical Therapist Graduates at Entry Level.2 In a recent editorial in the Journal of Physical Therapy Education (JOPTE),3 Jan Dwyer and Laurita Hack presented the “JOPTE Editorial Board Recommendations for an Educational Research Agenda.” It included the following items, among others:
Develop valid outcomes tools for all aspects of physical therapy education, including readiness to practice (ie, professionalism, clinical outcomes).
Develop tools to assess with good discrimination to properly measure student trajectory with respect to patient outcomes.
Develop measurement tools to measure student readiness for clinical education.
This list speaks to the need for creation of student assessment tools, including those for appraisal of competency in patient interviewing skills. In Australia, the Australian Physiotherapy Council includes collecting client information via effective communication skills sensitive to individual needs and diversity as a core competency for physical therapist practice standards.4 Refshauge and Gass stated that taking an effective patient history “is arguably the most important part of the examination process because it is from this that we decide the nature of the patient's problem and the possible interventions that might consequently be used.”5(p117) Davis6 argues for a “helping interview” based on a “healing attitude” of the practitioner. The healing attitude emanates from a practitioner who assists patients in resuming some control over their situation, listens and attends, communicates well, both verbally and nonverbally, and avoids judgment of the patients or their situation.6 Patient interviewing skills are initially learned in the didactic component of professional physical therapist education programs and then honed during clinical education experiences.
Components of a physical therapist's patient interview are similar to those of other health care disciplines, including investigation of the patient's chief complaint, medical and surgical history, and medications. For physicians, the result may be diagnosis of a pathological process, whereas for the physical therapist, the outcome may be diagnosis of a movement disorder, a patient referral to a physician, or identification of biomechanical factors associated with loss of function. There are categories of information that physical therapists tend to emphasize more than physicians, such as environmental factors (eg, whether a patient's living environment has stairs or floor coverings) and detailed postural, movement, and daily function assessment, that assist in goal setting and intervention strategies.1 In summary, the physical therapy interview findings provide important information concerning the unique circumstances and context of the patient's condition and the therapist's understanding of the condition and ultimately assist in the development of a diagnosis, a prognosis, precautions and contraindications to treatment, and a plan of care.7,8
Communication and establishment of rapport between practitioner and clinician are at the center of effective interviewing skills. Patient communication has been noted to be of particular importance in promoting effective patient outcomes and efficient patient encounter time management.9–11 Effective interviewing skills include attentive listening, attention to cultural congruency, and ability to convey information at appropriate medical literacy levels.12 Educators routinely use checklists, grading rubrics, and other formative and summative tools to assess students on a variety of competencies and skills.13 Although there are multiple assessment tools in the medical literature to score medical student and physician history taking skills, until recently there has been no validated tool in the literature created for the purpose of assessment of physical therapist student competence in patient interviewing.14
The ECHOWS tool and the “Guide to ECHOWS” were developed at the University of Wisconsin–Madison in 2010 (see Appendixes 1 and 2). The ECHOWS tool consists of 2 sections: the ECHOW section and the summary of performance (S) section. The ECHOW section comprises the following elements: E (establishing rapport), C (chief complaint), H (health history), O (obtain psychosocial perspective), and W (wrap-up). The S section is more of a skills assessment, whereas the ECHOW section is more a listing of necessary components. For the ECHOW section, the specific aspects of each element are scored 0 (not observed) or 1 (observed), with a maximum score of 22 points. As the S section is more complex and likely requires skill and perspective of a physical therapist versus a student, the 10 items related to overall performance are scored on an ordinal scale as 0 (needing improvement), 1 (satisfactory), or 2 (superior), with a maximum score of 20 points. The ECHOWS tool scored as a whole has a potential total of 42 points. Pilot work established content validity and preliminary intrarater reliability but did not establish interrater reliability or construct validity.14 The purposes of this study were: (1) to confirm intrarater reliability and establish interrater reliability of the ECHOWS tool, (2) to determine the limits of agreement for the ECHOWS tool, and (3) to determine whether there are differences in history taking scores between novice and experienced practitioners using standardized patient (a person carefully recruited and trained to take on the characteristics of a real patient) encounters.
Method
Participants
Participants included entry-level students (novice practitioners), experienced clinicians, and faculty who served as the evaluators. Students were recruited from the University of Wisconsin–Madison (UW–Madison) Doctorate in Physical Therapy (DPT) Program and the Griffith University, Gold Coast Campus (GU), Masters in Physiotherapy (MPT) Program. Both are postgraduate, entry-level programs, with the UW–Madison program lasting 3 years and the GU program lasting 5 semesters over 2 years. Ten Australian and US clinicians were individually recruited. The clinicians had a minimum of 3 years of full-time-equivalent, clinical orthopedic physical therapy experience over the prior 5 years. The students and clinicians were videotaped interviewing simulated patients, all utilizing the same scenario.
Three assessors, whose roles were to review and score the aforementioned tapes, were recruited by the US and Australian researchers, respectively. In the United States, a posting of the project and a request for a notification of interest to become 1 of the 3 assessors was placed on a listserve hosted by the Section on Education of APTA. Curricula vitae were requested, and the assessors were chosen based on familiarity with history taking pedagogy, faculty status in professional physical therapy curricula, and geographic diversity. Australian faculty members from 3 different universities, who were known to the researchers as having experience in clinical education and in teaching orthopedic and musculoskeletal skills, were invited to participate in the study.
Procedure
ECHOWS tool training.
Students from both programs were trained in history taking via their respective curricula, which included familiarization with the ECHOWS tool. At UW–Madison, students complete a standardized patient interview during their first year of study as part of a unit on history taking and documentation. All of the patient interviews are taped. At GU, history taking is taught in progressive iterations for each area of practice. In the first semester, when this project was undertaken, students had received generic content in history taking and content specific to seeing inpatients having orthopedic surgery. This study utilized 10 taped patient interviews from UW–Madison and 10 taped patient interviews from GU. Students from the UW–Madison DPT class of 2015 were asked to sign a consent form granting utilization of their taped standardized patient interview from their physical therapy course titled “Foundations of PT Examination, Evaluation, and Diagnosis.” Ten student tapes were randomly selected from the 34 students who granted consent. At GU, students are not routinely videotaped using standardized patients, but all students were offered the opportunity to participate in the present study, and the first 10 volunteers were accepted and signed consent forms prior to being videotaped.
The experienced clinician cohort was given information on the ECHOWS tool and the Guide to ECHOWS as part of their preparation for conducting the interviews of the standardized patients. They were instructed to introduce themselves to the standardized patients as physical therapist students so that the videotape assessors could not distinguish them from students. The assessors were trained in the use of the ECHOWS tool via an online module. The module contained information on the organization and scoring of the tool and provided a videotaped patient interview for practice scoring purposes. The viewer scored the interview using the ECHOWS tool and then could compare his or her scoring with that of the website instructor. The practice tape instructor scoring explained scoring for each item of the tool. Copies of the ECHOWS tool and the Guide to ECHOWS were available as PDF downloads on the training site.
Assessment of recordings.
Each assessor evaluated 20 student tapes and 20 clinician tapes. Assessors were free to review the tapes as many times as needed to score them and were asked to complete this task within 3 weeks. A stamped, preaddressed envelope was provided to each of the assessors to use to return the 40 ECHOWS score sheets to the research team in their country. Three weeks after the receipt of the 40 ECHOWS score sheets, the assessors were sent a second CD loaded randomly with 10 student interviews and 10 clinician interviews and a second stamped, preaddressed envelope. They were asked to complete the reviews within a 2-week time frame.
Reimbursement.
The experienced clinicians were given a gift card for $10 (US) or Australian equivalent, respectively, for their time involved in preparation and in performing the interview. The assessors were paid $25 per hour for their work in assessing the tapes. It was expected that it would take approximately 30 minutes to review and score each tape, with total time worked per assessor estimated at approximately 30 hours. A grant from the Department of Orthopedics and Rehabilitation at the School of Medicine and Public Health at UW–Madison covered these costs.
Data Analysis
Reliability was assessed by calculating intraclass correlation coefficients (ICCs) as defined by Shrout and Fleiss.15 Intrarater reliability was determined by calculation of an individual ICC (3,1) for each assessor, and interrater reliability was calculated using ICC (2,1). Ninety-five percent confidence intervals (CIs) also were calculated for the ICC types. Reliability was assessed separately for overall ECHOWS score, total ECHOW score, and total S score. Reliability was categorized according to Goldstein et al,16 with ICCs <.40 classified as poor, .40 to .60 as moderate; .61 to .80 as good, and >.80 classified as excellent. Ninety-five percent limits of agreement17 also were calculated for intrarater and interrater assessments for the ECHOWS overall and for the ECHOW and S components separately.
Additional analyses were conducted to assess whether there was any bias on the part of assessors in relation to students and clinicians from their home country or the foreign country. For example, Australian assessor scoring on Australian participants' tapes was compared with US assessor scoring on US participants' tapes, and Australian assessor scoring on US participants' tapes was compared with US assessor scoring on Australian participants' tapes. We also assessed whether there were significant differences between the Australian and US assessors on their scoring of the tapes as a whole.
The student cohort was compared with the experienced clinician cohort to see if there was a significant difference in performance on the ECHOWS tool. Comparisons were assessed using Student t tests and repeated-measures analysis of variance models.
Role of the Funding Source
Research funding was provided by a grant from the Department of Orthopedics and Rehabilitation at the University of Wisconsin–Madison School of Medicine and Public Health.
Results
The average intrarater reliability values for the ECHOWS overall and the ECHOW sections (E, C, H, O, and W sections combined) were excellent, with ICC (3,1) values for all individual raters ranging from .74 to .89 for the ECHOWS overall and from .83 to .95 for the ECHOW sections. The overall intrarater reliability for the S section was good, with ICC (3,1) values ranging from .64 to .85 (Tab. 1). The 95% CI values overlapped for all assessors and subscales, indicating no significant differences in reliability between subscales or raters. The intrarater limits of agreement are: ECHOWS overall, ±7 points; ECHOW sections, ±3 points; and S section, ±5 points. The interrater reliability values with ICC (2,1) for the ECHOWS overall and the ECHOW sections were moderate (ICC=.55) and excellent (ICC=.82), respectively. The S section, however, had poor interrater reliability (ICC=.27). For interrater agreement, the limits of agreement were: total, ±8 points; ECHOW section, ±4 points; and S section, ±7 points. Further analysis of the intrarater data shows there was a significant difference between first and second scorings only for reviewer 3 in the ECHOW sections (P=.018), with an average difference of 0.85 points (SD=1.46). Looked at as a whole group across reviewers, there was a significant difference in ECHOW sections between the 2 readings (−0.39; 95% CI=−0.64 to −0.13; P=.003) but not for ECHOWS overall or for total S score. However, the difference was of less than 1 point and is not likely clinically relevant.
Intrarater Reliabilitya
Additional analyses were conducted to assess whether there was any bias on the part of assessors in relation to students or clinicians from their home or foreign country. No such biases were found (Tab. 2).
Interrater Reliabilitya
With regard to construct validity, there was no statistical difference in performance on the tool between novice and experienced clinicians (P=.59). The results of the construct validity analysis are presented in Table 3.
Construct Validity Analysisa
Discussion
This study builds upon previous pilot work14 and provides further validation of the ECHOWS tool for evaluating history taking by entry-level physical therapist students. Limitations of the pilot study included the small numbers of reviewed videotapes and tape reviewers, making assessment of interrater reliability difficult. In addition to increasing the numbers of tapes and reviewers in the current study, the student and reviewer pools also were expanded to include participants from both the United States and Australia. These changes provided a more powerful design for assessment of intrarater and interrater reliability and construct validity and speak to the validity of use of the tool in Australia in addition to its use in the United States.
Compared with findings in the pilot study, intrarater reliability values were higher in the current study. In the pilot study, intrarater reliability was moderate for the ECHOW sections and good for the S section and the ECHOWS tool overall. In the current study, ratings were excellent for the ECHOWS overall and ECHOW sections and good for the S section. Interrater reliability results for the ECHOWS overall improved from poor in the pilot study to moderate in the current study and demonstrated excellent interrater reliability for the ECHOW sections. The S section interrater reliability remained low. There was no indication of cultural bias influencing the repeatability of the instrument, as no differences were detected between the assessors from the 2 nations or when assessors rated interviewers from their own country or the other country.
The following discussion will focus on the properties of the ECHOW sections and the S section before concluding how the 2 sections together might be used to provide both summative and formative feedback for students. It is perhaps not surprising that the items in the ECHOW sections had greater reliability than the S section, as this section requires only a dichotomous decision of whether a particular behavior is observed or not. The S section, on the other hand, requires a more subjective, value judgment (ie, “needs improvement,” “satisfactory,” or “superior”) of more global items such as logical sequencing or communication strategies.
The intrarater and interrater limits of agreement for the ECHOW sections were 3 and 4 points, respectively, out of a possible 22 points, with 95% of the values in the ECHOW sections (mean±2 standard errors) between 7 and 20 points. These results suggest that, for any given individual, it would be possible to detect 3 or 4 significant changes in the score. For people with scores below 10 or 11 points, it would be difficult to detect a real deterioration due to a floor effect, and for those with scores over 16 or 17 points, it would be difficult to detect improvement due to a ceiling effect. The intrarater and interrater limits of agreement for the S section were 5 and 7 points, respectively, out of a possible 20 points, with 95% of the values between 3 and 19 points. These results suggest that, for a given individual, it would be possible to detect 1 to 3 significant changes in the score. For people with scores below 8 or 10 points, it would be difficult to detect a real deterioration due to a floor effect, and for those with scores over 12 or 14 points, it would be difficult to detect improvement due to a ceiling effect.
For the ECHOWS overall, the intrarater and interrater limits of agreement were 7 and 8 points, respectively, out of a possible 42 points, indicating the magnitude of a difference between repeated measures that is necessary to be confident of a real difference. With 95% of the scores between 13 and 36 points, the results suggest that, for any given score, it would be possible to detect 2 or 3 significant changes in the score. For people with scores below 20 or 21 points, it would be difficult to detect a real deterioration due to a floor effect, and for those with scores over 28 or 29 points, it would be difficult to detect improvement due to a ceiling effect.
For an evaluation or assessment tool to be useful, it is necessary for it to be able to discriminate between different levels of performance. The ECHOW sections fulfill this criterion, with the ability to discriminate 3 significant differences in a student's level of performance if the same educator is repeating the measure or 2 significant differences if a different educator is doing the evaluation. The S section, on the other hand, is less sensitive to change but is still able to distinguish at least between 2 extremes of performance. We suggest that a reasonable way of using the ECHOWS tool is to use the ECHOW sections for both summative and formative feedback and to use the S section primarily for formative evaluation. In other words, both sections can be used as a framework to provide feedback to assist student learning, but only the ECHOW sections are appropriate to be included in student marks.
Because the S section is more subjectively scored than the ECHOW sections and is not as sensitive to change, looking at the developed training module may be warranted to limit variability. The interrater reliability results support this need. Assessors were given training online, but the training may have been insufficient to ensure reliability in scoring this section of the tool, or there may have been variation in use of the online module. Additional practice scoring taped interviews prior to use in grading students or in formative assessment, combined with emphasizing the relative complexity of the S section to assessors and the importance of understanding the criteria for scoring, may enhance an assessor's reliability with this section of the tool. Additionally, there may be a need for refinement and additions to the training to promote consistency for scoring the S section. No significant differences were found between the novice and experienced practitioners. Although unexpected, there are a number of possible explanations for this finding. The ECHOWS tool was developed to evaluate students, and the characteristics of a history taken by a student may not correspond to that of an experienced practitioner. For example, Jensen et al18 noted that although master clinicians were able to deviate from the patient examination framework when it was deemed necessary to gather more detailed patient-focused information, novice clinicians organized their examination scheme based on following standard examination routines. Students implement a more scripted flow to ensure all areas are discussed, whereas experienced practitioners are flexible in their approach, leading to greater or lesser prioritization for some elements of the patient history. As for the S section of the tool, designed to capture interviewing skill level,14 one would expect more experienced clinicians to perform at higher levels than students; however, as noted above, the relatively small point total for the S section makes detecting a difference difficult.
Limitations and Future Studies
Further research should be conducted for assessment of entry-level students to establish instrument validity and applicability of use during clinical internships. The live clinical setting carries more uncertainty regarding patient responses, clinical interruptions, and time constraints—items that can be controlled for in an artificial setting. Academic programs would encourage clinical instructors to incorporate the ECHOWS tool and Guide to ECHOWS into the clinical education experience to begin determination of the tool's applicability in this setting. Such use would provide a resource for clinical instructors to reinforce and further develop previously learned interviewing skills and could be implemented at the beginning of the clinical education experience to identify areas in need of improvement. The students could then use the familiar tool for continued self-assessment.
The results of this study suggest the need for the S section to evolve in order to provide a more discerning measure of interviewing skill level. Utilization of the S section to provide feedback to assist student learning provides value in an academic setting, but the difficulty in discerning different levels of skill is a limitation of the tool as it is currently formatted. Besides providing a mechanism to monitor student progress, rectifying this issue may allow the tool to be of greater value in self-assessment and peer assessment and for enhancement of these skills in physical therapist practitioners at various stages of their careers. Such tool refinement also would provide researchers with another instrument to measure interviewing skill levels.
Regarding the inability of ECHOWS, in its current format, to discern novice from advanced practitioners, future research could utilize patient scenarios with greater complexity and range of responses, along with concomitant training in more complex and obtuse responses for the standardized patient, allowing for enhanced clinical decision-making opportunities for the student in a videotaped situation. Additionally, this study did not assess whether student or clinician race or ethnicity influenced scoring on the ECHOWS tool other than the comparison of scores between US and Australian participants. This is an area that could be assessed in future studies. Lastly, the relatively small sample, particularly of assessors, limited the information that could be gained from the current study. Despite these limitations, this study provides further support for the reliability of a tool consistent with the recently proposed Educational Research Agenda.
In conclusion, intrarater and interrater reliability have been established for the ECHOWS tool in assessing physical therapist student history-taking skills. This is the first study of this type of educational assessment tool that included physical therapist students from 2 different countries. As there were no differences in ratings completed by US and Australian assessors, nationality-based bias in rating does not appear to be a factor in scoring of students and clinicians with the ECHOWS tool. The ECHOWS tool may be valid in assessing Australian physical therapist students' patient interviewing skills in addition to assessing skills in US DPT students. Assessor training with the ECHOWS tool appears to be important in maintaining reliability in student assessment.
Appendix 1.
ECHOWS Toola
a Property of Jill S. Boissonnault and William G. Boissonnault. Developed at the University of Wisconsin–Madison. Version dated May 11, 2011. OTC=over the counter.
Appendix 2.
Guide to ECHOWSa
a Property of Jill S. Boissonnault and William G. Boissonnault. Developed at the University of Wisconsin–Madison. Version dated May 10, 2011. VAS=visual analog scale, PT=physical therapy, CAD=coronary artery disease, CA=cancer, HTN=hypertension, ESL=English as a second language.
Footnotes
All authors provided concept/idea/research design and contributed to writing. Dr J. Boissonnault, Dr W. Boissonnault, Dr Evans, and Dr Tuttle provided data collection. Dr Tuttle, Mr Hetzel, Dr J. Boissonnault, and Dr W. Boissonnault provided data analysis. Dr J. Boissonnault provided project management and fund procurement. Dr J. Boissonnault and Dr Tuttle provided participants. Dr J. Boissonnault, Dr W. Boissonnault, Dr Evans and Dr Tuttle provided facilities/equipment. All authors provided institutional liaisons and consultation (including review of manuscript before submission).
The authors acknowledge the assistance of Nicholas Conte, DPT, LAT, for his help in development of the ECHOWS training module, and Sarah Stream, MS, DPT, for her assistance in data input. Additionally, they recognize and thank the University of Wisconsin–Madison and Griffith University students, and the US and Australian faculty and clinicians who assisted the project as participants and assessors.
The study received UW–Madison Social Science Institutional Review Board approval and Griffith University Institutional Review Board approval.
Research funding was provided by a grant from the Department of Orthopedics and Rehabilitation of the University of Wisconsin–Madison School of Medicine and Public Health.
- Received March 23, 2015.
- Accepted August 18, 2015.
- © 2016 American Physical Therapy Association