Abstract
Background Play is vital for development. Infants and children learn through play. Traditional standardized developmental tests measure whether a child performs individual skills within controlled environments. Play-based assessments can measure skill performance during natural, child-driven play.
Purpose The purpose of this study was to systematically review reliability, validity, and responsiveness of all play-based assessments that quantify motor and cognitive skills in children from birth to 36 months of age.
Data Sources Studies were identified from a literature search using PubMed, ERIC, CINAHL, and PsycINFO databases and the reference lists of included papers.
Study Selection Included studies investigated reliability, validity, or responsiveness of play-based assessments that measured motor and cognitive skills for children to 36 months of age.
Data Extraction Two reviewers independently screened 40 studies for eligibility and inclusion. The reviewers independently extracted reliability, validity, and responsiveness data. They examined measurement properties and methodological quality of the included studies.
Data Synthesis Four current play-based assessment tools were identified in 8 included studies. Each play-based assessment tool measured motor and cognitive skills in a different way during play. Interrater reliability correlations ranged from .86 to .98 for motor development and from .23 to .90 for cognitive development. Test-retest reliability correlations ranged from .88 to .95 for motor development and from .45 to .91 for cognitive development. Structural validity correlations ranged from .62 to .90 for motor development and from .42 to .93 for cognitive development. One study assessed responsiveness to change in motor development.
Limitations Most studies had small and poorly described samples. Lack of transparency in data management and statistical analysis was common.
Conclusions Play-based assessments have potential to be reliable and valid tools to assess cognitive and motor skills, but higher-quality research is needed. Psychometric properties should be considered for each play-based assessment before it is used in clinical and research practice.
Play provides infants and young children with the ability to practice skills and support all domains of development: motor, cognitive, social-emotional, communication, and adaptive.1–4 Play has been variably defined in the literature given different disciplines and reasons for assessing development through play.5 In this systematic review, play is defined as a pleasurable, active, self-motivated developmental phenomenon1,6 by which infants and young children learn about the world through interactions with objects and people.5,7
Play fosters both motor and cognitive development.1,2,7 Play is common to all infants, and it is a primary arena within which domain-specific and global aspects of development occur.1,8,9 Early play helps to prepare infants and young children to learn in school.10 Children learn through the repetition of behaviors during play within typical environments and routines.11
Play is the basis for many developmental interventions used with children with disabilities.12 Play, however, is often not a part of traditional standardized developmental tests used by pediatric physical therapists and other early intervention providers to determine the need for intervention or the efficacy of intervention.13 Traditional standardized developmental assessments typically involve a child performing a specific task within a controlled environment that is outside the context of everyday routines.13 Some assessments require the examiner to elicit behaviors by altering the context or moving the child.14,15 Behaviors assessed in this way are not authentic child-directed behaviors, and the child may not perform optimally.16 Furthermore, traditional standardized developmental assessments are designed to determine whether a child can perform a specific skill, not whether the child performs the skills in his or her normal routine.15
Play-based assessments are standardized measures designed to quantify changes in one or more of the 5 developmental domains during self-motivated, child-driven play.14,17,18 Some literature suggests that play-based assessments may be an effective and efficient means of assessing a child's developmental level,19 evaluating change over time, and evaluating the efficacy of intervention.18,20 Play-based assessments are often adjuncts to other assessment procedures,21,22 although some authors argue that they also can serve as a basis for discriminative decisions and planning.15,17,18,23 In this review, play-based assessment is differentiated from an assessment of play, which interprets the type of play in which a child is engaged relative to a hierarchical developmental theory of play.18 Assessments of play are not discussed in this review.
Play-based assessments focus on child-directed activities. During play-based assessment, the child directs the interaction and experience, increasing the likelihood of observing behaviors that the child typically performs.24 This assessment results in a rich description of a child's domain-specific strengths and weaknesses.14 Using the arena of play provides the practitioner with not only the ability to assess current skills but also the added benefit of previewing emerging skills in a functional context.2,25 Play-based assessments add authenticity and contextual benefits to the assessment of motor and cognitive development because they measure objective behaviors during child-driven activities within a normal environment. This approach allows examination of cross-domain relationships by integrating findings.24
Play-based assessments can be contrasted with traditional standardized assessments. First, play-based assessment takes place within a naturalistic environment and context, whereas traditional standardized developmental tests require specific responses to an examiner-provided stimulus.14 Second, play-based assessments typically quantify if and how often a child performs specific types of skills during a naturalistic observation rather than just assessing if the child can perform the skill.14,21,26 Third, these assessments are child-driven14 rather than examiner-driven, giving the practitioner insight into the child's ability to explore and learn.6 Fourth, play-based assessments can document limitations commonly seen in children with developmental delays such as decreased attention to toys, using fewer toys and less variety of active play skills, and being more passive during play.6
Although the theoretical value of play-based assessments is clear, the reliability and validity of play-based assessments need to be considered before they are used in clinical practice or research. The first aim of this systematic review is to determine the interrater and test-retest reliability of play-based assessments of motor and cognitive skills for infants and children aged 0 to 36 months. The second aim is to identify the content and structural validity of play-based assessments of motor and cognitive skills for infants and children aged 0 to 36 months, as well as the responsiveness of these measures.
This article focuses on the assessment of infants and toddlers, from birth to 36 months of age, who, based on their age, could be eligible in the United States for early intervention services under Part C of the Individuals With Disabilities Education Improvement Act (IDEIA).27 Play-based assessments allow for assessment in a variety of cultures and countries and at a variety of ages. As in many countries, the goal of providing intervention in young children in the United States is to support early development and improve readiness to learn in children with or at risk for developmental delays. Intervention programs with similar goals around the world may find play-based assessments an option for assessing the needs and progress of children if these tools are reliable and valid.
The results of this study provide information on the reliability, validity, and responsiveness of play-based assessments. This information may help to determine if play-based assessments can be used for research and clinical purposes. In addition, this information will help clinicians to determine which play-based assessments are best to supplement traditional standardized developmental tests that are currently used to evaluate the need for and efficacy of early developmental intervention services.
Method
Search criteria were developed to identify studies that met inclusion and exclusion criteria specified prior to the study. Studies were required to evaluate one or more of the following measurement properties of a play-based assessment of motor and cognitive skills: interrater reliability, test-retest reliability, structural validity, content validity, and responsiveness to change over time. Participants' ages were fully or partially 0 to 36 months. Participants could have a diagnosed disability or delay or could be developing typically. Studies that did not include play-based assessment of motor and cognitive skills, did not include children from birth to 36 months of age, were not available in English, or were a review of previous research or theory without new data were excluded.
Data Sources and Searches
A literature search was performed using the PubMed interface from MEDLINE (late 1940s–May 2013), ERIC (1966–May 2013), CINAHL (1937–May 2013), and PsycINFO (1894–May 2013). Search terms were developed with the help of a research librarian using MeSH headings, key words, and phrases. Terms were purposefully broad to capture all publications that met the inclusion criteria for this systematic review. The full search strategy is described in the Appendix.
Study Selection
Consistent with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement,28 results from the literature search were reviewed for duplicates prior to screening for inclusion. The title and abstract of all identified publications were screened using the inclusion and exclusion criteria. Any publication that clearly did not meet the exclusion criteria was moved to the screening process. During screening, 2 reviewers independently reviewed the full-text publication to determine eligibility for the systematic review. Any question of inclusion between the 2 reviewers during eligibility was resolved through discussion. The bibliographies of included papers also were reviewed by both authors to determine if additional studies warranted inclusion.
Data Extraction and Quality Assessment
The interrater and test-retest reliability, content and structural validity, and responsiveness data for each included study were extracted independently by each reviewer using data collection forms developed for this systematic review. Any discrepancy in the extracted reliability or validity data was discussed between reviewers, and a consensus was reached. No statistical analysis or meta-analysis was conducted given the limited number of included studies. A priori, a correlation ranging from .00 to .50 was considered weak, .50 to .75 was considered moderate, and .75 to 1.00 was considered to be strong.29 The strength of the correlation presented as a measure of reliability or validity was used to categorize the degree of reliability and validity documented for each play-based assessment and for the group of play-based assessments. Therefore, the terms “weak,” “moderate,” and “strong” are used to describe the results for reliability and validity of each included paper.
The COnsensus-based Standards for the selection of health status Measurement INstruments (COSMIN) was used as a measure of methodological quality of the measurement properties.30,31 The 5 measurement properties assessed for this study were defined by the COSMIN.32 Interrater reliability is a measure of whether different raters can score the same testing occasion and obtain the same score.32 Test-retest reliability is the extent to which the scores for patients who have not changed are the same for repeated measurements over time.32 Structural validity is the degree to which scores of a health-related instrument adequately reflect the same construct as a validated assessment.32 Content validity is a judgment about whether the content of a test adequately reflects the construct to be measured.32 Responsiveness is the ability of the measurement tool to measure change over time in the focal construct.32
The COSMIN can be used to measure methodological quality with a 3-step process. First, the measurement properties assessed in the paper are identified. Second, reviewers score each measurement property. Each measurement property on the COSMIN has a rating box containing 5 to 18 individual items specific to that measurement property. Each item within the rating box is scored based on specific scoring criteria: from 1 to 4 possible answers representing excellent, good, fair, and poor quality for that item. An item is scored as excellent when there is adequate evidence provided for that item. When information is not provided but it is reasonable to assume information regarding that item, the item is rated as good. Fair indicates that methodological quality for the item is doubtful, whereas poor is scored when there is evidence that the methodological quality pertaining to a specific item is inadequate. For example, clear evidence of patient stability between test and retest in a study receives a score of excellent on that item in the rating box for the measurement property test-retest reliability. If it was unclear if patients were stable during the time between test and retest, however, that item is marked as fair. The third step to score each rating box of the COSMIN involves determining the overall rating of methodological quality of each measurement property. This overall rating is determined by the lowest score for all items in the rating box for that measurement property.33 For example, if all responses in the test-retest rating box are excellent except for one judged to be fair, the quality of the test-retest reliability measurement of that paper is considered to be fair.
Each author of this systematic review independently rated the methodological quality of each included study using the COSMIN. Any discrepancies in scoring between the raters that resulted in different measurement properties being scored, a change in the overall rating of methodological quality of any measurement property, or a discrepancy of 2 or more ordinal levels for any single item within a measurement property rating box were discussed, and a consensus was reached. The overall methodological quality for each measurement property included in a paper was recorded and is reported in this systematic review.
Results
The titles and abstracts of 2,133 studies were screened for possible inclusion. Forty studies could not be excluded during screening and were reviewed in full text for eligibility. Eight of these studies matched the inclusion criteria and were included in the systematic review, whereas 32 studies were excluded (Figure).
PRISMA diagram.
Studies including 4 separate play-based assessments currently available for commercial use were included in this systematic review: Play in Early Childhood Evaluation System (PIECES); Transdisciplinary Play-Based Assessment, 2nd edition (TPBA-2); Assessment, Evaluation, and Programming System, 2nd edition (AEPS); and the Individual Growth and Development Indicators (IGDI). Related assessments or precursors to these play-based assessments also were identified in the literature. The Play-Based Assessment (PBA)16 was the initial form of the PIECES.23 The PBA also was the cognitive portion of the Transdisciplinary Play-Based Assessment (TPBA),14 which was never tested for reliability or validity with a young population. The TPBA is the previous version of the TPBA-2.19 Psychometric properties of the Evaluation and Programming System for Infants and Young Children (EPS-I)34 were presented by Bailey and Bricker35 and Bricker et al.36 The EPS-I is the predecessor of the AEPS.37 Part of the AEPS was used for the experimental Assessment, Evaluation, and Programming System for Eligibility (AEPS:E) as reviewed herein.20 Two other play-based assessments, a general outcome measure of growth in movement for infants and toddlers21 and the Early Problem Solving Indicator (EPSI),26 met the inclusion criteria. These last 2 assessments are predecessors to the movement and cognitive sections of the IGDI38: the Early Movement Indicator (EMI-IGDI) and the Early Problem Solving Indicator (EPSI-IGDI), respectively (Tab. 1).
Description of Play-Based Assessments Included in This Systematic Reviewa
Interrater Reliability
Interrater reliability was measured in 5 studies.20,21,26,35,36 The Pearson correlation coefficient for interrater reliability of motor assessments ranged from .86 to .98 (Tab. 2).20,21,35,36 Interrater reliability of cognitive assessments ranged from .23 to .90 (Tab. 3).20,26,35,36 One study of cognition had interrater reliability coefficients for individual skills but not for the aggregate of cognitive behaviors displayed.26
Play-Based Assessments and Studies With Motor Psychometric Propertiesa
Play-Based Assessments and Studies With Cognitive Psychometric Propertiesa
The methodological quality in each of these studies was rated fair except for the study by Bailey and Bricker,35 which was poor (Tab. 4). Main reasons for fair ratings included use of a Pearson correlation coefficient rather than an intraclass correlation coefficient (ICC)20,21,26,36 and missing items from the sample.26,36 The study by Bailey and Bricker35 was rated poor based on the small sample size and major flaws in the study design, including that items not observed by one or both observers were omitted from analysis. It is possible that one observer missed several test items that actually occurred. This omission would have reduced the variability of the sample and artificially inflated the correlation coefficient.
Methodological Quality of Measurement Properties Using the COSMIN for All Included Studiesa
Test-Retest Reliability
Test-retest reliability was assessed in 5 studies.21,23,26,35,36 Test-retest reliability of motor assessments had a range of Pearson correlation coefficients of .88 to .95 (Tab. 2),21,35,36 and cognitive test-retest reliability had a range of Pearson correlation coefficients of .45 to .91 (Tab. 3).23,26,35,36
Three of the studies23,26,36 had fair methodological quality ratings, whereas 2 studies21,35 were rated poor (Tab. 4). Areas in which these studies received low ratings included using a Pearson correlation coefficient rather than an ICC21,26,36 and small sample size23 and important flaws in the study design, including different observers for the first and second test observations, different situations between observations, and items not scored by either observer being omitted from the test.35
Structural Validity
Structural validity was assessed in 7 studies by comparing the scores on a play-based assessment with scores on traditional standardized developmental tests.16,19–21,26,35,36 Three of the studies had both motor and cognitive components (Tabs. 2 and 3).19,20,36 Two studies assessed solely motor skills during play (Tab. 2),21,35 and 2 studies were solely cognitive assessments (Tab. 3).16,26 The Pearson correlations between the play-based assessments of motor skills and validated traditional standardized developmental tests of motor skills ranged from .62 to .90. Correlations between the play-based assessments of cognitive skills and validated traditional standardized developmental tests ranged from .42 to .93. Several different validated traditional standardized developmental tests of motor and cognition were used as comparisons (Tabs. 2 and 3).
Methodological quality for structural validity was fair to poor (Tab. 4). Four studies earned fair ratings. All of these studies had missing items, and the methods used for handling missing items were unclear.16,21,26,36 Two of these studies also had methodological flaws.16,36 In one of these studies,16 the age equivalents on the play-based assessment were converted to standard scores for comparison with the standard scores on the Bayley Scales of Infant Development, second edition (BSID-2).39 This comparison has not been validated as statistically sound. The other study36 was validated with 2 traditional standardized developmental tests: the Revised Gesell and Amatruda Developmental and Neurologic Examination (Gesell scale) and the Bayley Scales of Infant Development (BSID).40,41 The disability levels and ages of each of the samples were unclear. The sample compared with the BSID also was small. The other 3 studies that assessed structural validity were marked as poor due to a small sample size.19,20,35
Content Validity
Content validity of motor skills was assessed using the EMI-IGDI.21 It was found that there was a significant increase in the frequency of movements among the 3 age cohorts (3–12 months, 13–24 months, and 25–36 months) during the 45-week study. Methodological quality was good in this study based on a moderate sample size (Tab. 4).21
Responsiveness
One study21 measured responsiveness of a play-based assessment indirectly. This study assessed only motor skills in a sample of mostly children with typical development (3–36 months of age) on a play-based assessment (ie, the EMI-IGDI) and the Peabody Developmental Motor Scale–2 (PDMS-2). The PDMS-2 locomotor and stationary subtests were responsive to change, with a statistically significant increase in raw score between 2 time points that were 45 weeks apart. A similar comparison was not made for the data from the EMI-IGDI. The movement rate on the EMI-IGDI, however, was correlated with the PDMS-2 locomotion scale at each of the 2 time points. The Pearson correlation coefficient was .77 at time 1 and .90 at a time point 45 weeks later. Methodological quality was poor due to the small sample size (Tab. 4).
Discussion
The results of this systematic review indicate that Pearson r values for interrater and test-retest reliability of play-based assessments ranged from .23 to .98 and that Pearson r values for structural validity of play-based assessments ranged from .42 to .93. As a group, both reliability and validity of play-based assessments are inconsistent. The methodological quality of measurement properties among the studies contained in this systematic review is generally poor to fair,33 with only one study having a good quality rating. With only 1 or 2 studies of reliability or validity on each play-based assessment tool and the poor to fair methodological quality of the studies, it was difficult to draw conclusions about any individual assessment or the group of play-based assessments as a whole. Therefore, reliability and validity for each play-based assessment need to be considered carefully before research or clinical application.
Interrater reliability of play-based assessments for both motor and cognitive skills was generally strong. One of 8 studies had a weak interrater reliability correlation, but the majority of the studies had interrater reliability correlations of r≥.86. These interrater reliability findings indicate that the definitions of terms and scoring used in the assessments are clear to raters. Of the tests reviewed for both motor and cognitive skills, interrater reliability findings are highest for the AEPS:E20 and its predecessor, the EPS-I.35,36 The assessment with the best interrater reliability for motor skills is the EMI-IGDI.21 Traditional standardized developmental tests of motor and cognitive development, such as the PDMS-242 and the BSID-2,39 have similar strong interrater reliability.43
Test-retest reliability scores varied by construct measured. Test-retest reliability was strong for all 3 studies of motor development.21,35,36 Two of the studies used the same test (EPS-I) with different age groups.35,36 The EPS-I measured motor tasks using a criterion-referenced, curriculum-based assessment with a modified developmental checklist. The fact that environments, objects, and checklist questions were controlled may have improved reliability. The other play-based assessment of motor skills, the EMI-IGDI, measured skills in a longitudinal fashion with 3 to 8 measurements per child, separated by at least 3 weeks.21 The researchers used a split-half reliability method in which they averaged the odd trials and the even trials before comparing the average of the odd and even trials. Although this is an acceptable way of measuring test-retest reliability, it reduced variability, which otherwise may have resulted in lower test-retest reliability. The age of the children did not affect the reliability of either the EPS-I or the EMI-IGDI. The Bayley Scales of Infant and Toddler Development, 3rd edition (BSID-3) and the PDMS-2 have similar strong test-retest reliability for motor skills.44,45
Test-retest reliability was lower for cognition than for motor skills. This finding was evident with 2 tests, the PIECES and the EPS-I for children with and without disabilities.23,35,36 The reliability of the PIECES was assessed using a strict test-retest method (1–3 weeks between assessments) of the children's most advanced cognitive skill level used during play.23 Test-retest reliability correlations of the PIECES were similar to those of the initial study with the EPS-I.35 The test-retest reliability of the EPS-I, however, varied substantially between the 2 studies that evaluated this measure.35,36 Two different methods were used to assess test-retest reliability, which might account for this discrepancy. Bailey and Bricker35 used different observers and different situations during test-retest measurements, which expanded the opportunity for error. In the study during which test-retest reliability was stronger, several items were not tested due to issues of privacy during videotaping (self-care, dressing), and several items from the gross motor scale were omitted due to constraints of videotaping.36 This approach creates a smaller pool of scored behaviors, which may not represent the reliability of the test as a whole. In light of these contrasting findings, we suggest that the complete EPS-I test-retest reliability cannot be identified. The EPSI-IGDI26 had strong test-retest reliability using the split-half reliability method. Given the moderate or unclear results of the other play-based assessments of cognition, the EPSI-IGDI has the best test-retest reliability, although it must be noted that single-session test-retest studies of the EPSI-IGDI (no split-half reliability) may show lower reliability due to changes in play behaviors that commonly occur from one session to the next.
Structural validity of play-based assessments that measure motor skills ranged from moderate to strong compared with traditional standardized developmental tests. The lowest structural validity was found for the AEPS:E20 compared with the Gesell scale's gross motor portion.40 All other play-based assessments had strong structural validity (correlations greater than .76). The EPS-I, an earlier version of the AEPS:E, had a strong correlation with the Gesell scale's gross motor test.35 The EPS-I items are arranged in a hierarchical developmental progression, similar to the neuromaturational construct that the Gesell scale tests. The AEPS:E20 uses different standardization procedures, activities, and materials appropriate for toddlers, which are not as strongly aligned with the hierarchical model of development. Although the lack of hierarchical examination in the AEPS:E is more consistent with current theoretical approaches, it reduces the relationship between the AEPS:E and the Gesell scale. Comparison with a different traditional standardized developmental test might have increased the structural validity. The TPBA-2 motor section had a strong correlation compared with the BSID-3,19 and the EMI-IGDI likewise had a strong correlation compared with the PDMS-2.21 The BSID-3 and PDMS-2 assess developmental constructs similar to play-based measures, whereas the Gesell does not. There is no published study regarding the structural validity of the AEPS:E motor portion compared with a more current standardized test of motor development than the Gesell scale. Therefore, we suggest that the TPBA-2 and EMI-IGDI are the best play-based assessments of motor skills to assess a construct similar to traditional standardized tests of motor development.
Structural validity of play-based assessments of cognitive skills was generally lower than that of play-based assessments of motor skills. Structural validity of the AEPS:E was moderate compared with the Battelle Developmental Inventory.20 It is interesting to note that the correlation between the AEPS:E and the Battelle Developmental Inventory was lower in children older than 24 months. We hypothesize that this finding may have been due to the fact that the older children did not display the full range of their cognitive skills during a play session with a limited set of toys and space in contrast to a traditional standardized developmental assessment, which tests the child on specific skills of all difficulty levels. Structural validity was weak to moderate for the EPSI-IGDI26 and the PBA16 compared with the BSID-2. Child-driven free play tends to decrease validity because the child may or may not show his or her full repertoire of cognitive skills during a given test session. Validity was strong when more structure was part of the play-based assessment such as in the EPS-I.36
Content validity of the EMI-IGDI, measuring change with increasing age, was significant (P<.01) when using a hierarchical linear modeling level 2 design across 3 age cohorts.21 The results indicate that age affects movements during this play-based assessment, which supports the validity of the EMI-IGDI for measuring movement. Having only one study, however, limits generalization of the findings to other play-based assessments.
The methodological quality of measurement properties as defined by COSMIN32 in each of the studies in this systematic review was poor to fair, with the exception of content validity in one study that was rated as good. One measurement property for 3 studies was downgraded due to small sample size.19,23,35 Other methodological issues for these studies included no evidence of patient stability on test-retest,23 methodological flaws in study design,19 and unclear handling of missing data.35 Use of a Pearson correlation coefficient instead of an ICC for reliability reduced the methodological quality rating for all included studies. Only one study, however, had a reliability score that would have been upgraded had the ICC been used.20 Although no one methodological issue affected all the studies, each included study had some methodological problems. Future studies that adhere to rigorous methods will provide more detailed information about using play-based assessments for research and clinical measurement.
Play-based assessment tools are designed to assess a child's ability to use motor and cognitive skills during self-motivated play within contextually relevant environments. Although play-based assessment allows the child to select the activities, this systematic review demonstrates that the overall reliability of these measures is similar to more traditional standardized developmental tests. The slightly lower test-retest reliability is likely the result of the varied responses children display when given toys in the naturalistic environment during play but not specifically prompted to react to the toy in a certain manner, such as during a traditional standardized developmental test. The validity results of this review of play-based assessments suggest that as a group, play-based assessments measure a construct that is similar, but not identical, to traditional standardized developmental tests. Similar to reliability findings, we suggest that slightly lower validity findings of play-based assessments may be acceptable because of the naturalistic context of activities during the assessment. Studies of individual play-based assessments, however, indicate varied structural validity correlations. As a result of varied validity and poor to fair methodological quality, individual tests need additional research to document reliability and validity in high-quality studies. At present, results using play-based assessments should be interpreted with caution.
Limitations
This systematic review has several limitations. The broad nature of the search terms resulted in a very large number of titles and abstracts, which required screening of title and abstract. Although only a single author reviewed each title and abstract, the criteria to eliminate a study based on title and abstract were designed to retain any study that might meet the inclusion criteria. Two reviewers completed all other eligibility determination and data extraction. These search criteria could have resulted in missing other potentially relevant studies of the reliability or validity of play-based assessment tools. Also, COSMIN was originally developed for assessing the quality of health-related patient-reported outcome measures in order to assess complex subjective health changes over time.29 Although play-based assessments do not fit the type of studies usually assessed using the COSMIN, the majority of measurement quality assessment tables could be completed without difficulty.
In conclusion, a standardized assessment of skills used during play is critical to determining the need for and efficacy of developmental intervention, yet therapists are challenged to find tools to meet this objective.46 Although the challenge continues, the results of this systematic review demonstrate that play-based assessments have the potential to be reliable and valid tools. Researchers must continue to assess reliability and validity of specific play-based assessment tools and reassess psychometrics as adaptations are made to the tools. Before play-based assessments can be used as evaluative measures, responsiveness to change must be evaluated. Changes in skills in response to therapeutic intervention, as measured on play-based assessments, would provide not only evidence that therapy can teach a child a new skill but also evidence that the child can spontaneously use the new skills in daily activity. Determining adequate responsiveness of play-based assessments would give early developmental intervention therapists an opportunity to use play not just as a process of intervention but also as a reliable and valid method of assessing development. A primary therapeutic goal in all cultures is to enhance a child's use of functional skills for participation in age-appropriate activities such as play. Play-based assessments improve the ability of clinicians and researchers to measure the impact of therapeutic interventions during these age-appropriate activities for children.
Appendix.
Database Searches
Footnotes
Both authors provided concept/idea/research design, writing, data collection and analysis, and project management. Mr O'Grady provided consultation (including review of the manuscript before submission).
The authors acknowledge the contributions of Virginia Commonwealth University Health Sciences librarian Jennifer McDaniel for her assistance with defining search terms.
The data were presented as a poster at the Virginia Commonwealth University Graduate Student Symposium; April 22, 2014; Richmond, Virginia. The data also have been submitted as an abstract for the American Physical Therapy Association's Combined Sections Meeting; February 4–7, 2015; Indianapolis, Indiana.
- Received March 6, 2014.
- Accepted August 14, 2014.
- © 2015 American Physical Therapy Association