Assessments Related to the Physical, Affective and Cognitive Domains of Physical Literacy Amongst Children Aged 7–11.9 Years: A Systematic Review

Background Over the past decade, there has been increased interest amongst researchers, practitioners and policymakers in physical literacy for children and young people and the assessment of the concept within physical education (PE). This systematic review aimed to identify tools to assess physical literacy and its physical, cognitive and affective domains within children aged 7–11.9 years, and to examine the measurement properties, feasibility and elements of physical literacy assessed within each tool. Methods Six databases (EBSCO host platform, MEDLINE, PsycINFO, Scopus, Education Research Complete, SPORTDiscus) were searched up to 10th September 2020. Studies were included if they sampled children aged between 7 and 11.9 years, employed field-based assessments of physical literacy and/or related affective, physical or cognitive domains, reported measurement properties (quantitative) or theoretical development (qualitative), and were published in English in peer-reviewed journals. The methodological quality and measurement properties of studies and assessment tools were appraised using the COnsensus-based Standards for the selection of health Measurement INstruments risk of bias checklist. The feasibility of each assessment was considered using a utility matrix and elements of physical literacy element were recorded using a descriptive checklist. Results The search strategy resulted in a total of 11467 initial results. After full text screening, 11 studies (3 assessments) related to explicit physical literacy assessments. Forty-four studies (32 assessments) were relevant to the affective domain, 31 studies (15 assessments) were relevant to the physical domain and 2 studies (2 assessments) were included within the cognitive domain. Methodological quality and reporting of measurement properties within the included studies were mixed. The Canadian Assessment of Physical Literacy-2 and the Passport For Life had evidence of acceptable measurement properties from studies of very good methodological quality and assessed a wide range of physical literacy elements. Feasibility results indicated that many tools would be suitable for a primary PE setting, though some require a level of expertise to administer and score that would require training. Conclusions This review has identified a number of existing assessments that could be useful in a physical literacy assessment approach within PE and provides further information to empower researchers and practitioners to make informed decisions when selecting the most appropriate assessment for their needs, purpose and context. The review indicates that researchers and tool developers should aim to improve the methodological quality and reporting of measurement properties of assessments to better inform the field. Trial registration PROSPERO: CRD42017062217 Supplementary Information The online version contains supplementary material available at 10.1186/s40798-021-00324-8.


Background
The concept of physical literacy has attracted significant attention from researchers, policymakers and practitioners within education, sport and public health sectors and features prominently within current national and international sport and physical activity policies and strategic plans [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16]. While physical literacy is a term that has been around since the late 19 th Century [17], current interest stems from the work of Whitehead [18][19][20], who first introduced the concept as a way forward to address low levels of physical activity around the world and as a reaction to a perceived focus on high performance and elitism within physical education (PE), to the detriment of the health and well-being of less-abled students. Whitehead most recently described physical literacy as "the motivation, confidence, physical competence, knowledge and understanding to value and take responsibility for engagement in physical activities for life" ( [21], p8), though her original conceptualisation of physical literacy [18,19], grounded in the philosophical traditions of phenomenology, existentialism and monism, has evolved into an increasingly fluid concept subject to varying levels of abstraction and alignment in deployment by researchers and practitioners [3]. Indeed, physical literacy is a contested term [1,22], with various contextually sensitive definitions and interpretations of the concept proposed internationally [1-3, 6-8, 17, 23-26]. Nevertheless, taken together, these diverse definitions seem to reflect a holistic view of physical literacy that emphasises affective, physical and cognitive attributes and predispositions necessary to participate in physical activity across the life course [3,4,25]. Furthermore, most researchers and practitioners advocating for physical literacy agree that such an approach is inclusive and encourages more diverse forms of engagement in physical activity, and so would be more likely to lead to life-long safe, committed engagement in physical activity, and better health, well-being and quality of life for all [6,7,17,27,28].
The majority of existing physical literacy research has focussed on children and youth populations within school settings [1]. Across the majority of Western countries, school attendance within the 7-11-year-old age range is compulsory, thus making primary schools an optimal setting for physical activity promotion. While physical literacy is recognised as a lifelong concept, the heightened attention on childhood reflects the fact that this is seen as a critical stage for the development of important physical literacy attributes necessary for lifelong physical activity, health and well-being [29]. Schools are considered to be nurturing environments where children have opportunities to be active, learn about physical activity and develop positive physical activity behaviours [30][31][32]. As a result, physical literacy has been identified as a guiding framework and overarching goal of quality PE and a major focus of PE curriculum internationally [33][34][35][36]. In England, the National Curriculum for PE aims to ensure that all pupils develop competence to excel in a broad range of physical activities, are physically active for sustained periods of time, engage in competitive sports/activities and lead healthy, active lives [37]. These ambitions align with the concept of physical literacy. As such, a cross-government action plan positioned physical literacy as a core element of early learning and stated that physical literacy should be a fundamental part of every child's school experience [38].
Throughout compulsory education, assessment -both formative and summative -is a critical aspect of pedagogical practice and accountability systems [39][40][41]. For the purposes of this review and in accordance with Edwards et al. [42], we define assessment as it is widely understood and used within educational contexts: as an umbrella term for measurement, charting, monitoring, tracking, evaluating, characterising, observing, indicating, and so on. Appropriate assessment of childhood physical literacy in PE on both an individual and population level could improve standards and expectations, and raise the profile of both PE and physical literacy [43,44]. Primary teachers report that assessment in PE provides a structure and focus to planning, teaching and learning, which positively impacts on both the teacher and child [45]. Thus, the classroom teacher, utilising the close relationship formed between teacher and pupil, should be empowered to implement an assessment of physical literacy, fulfilling roles such as charting progress, providing feedback, and highlighting key areas for how a child may develop their physical literacy over time [46][47][48][49][50]. Teachers themselves have, however, cited barriers to implementing assessment in PE such as the lack of priority given to PE within the curriculum; limited time, space and expertise [51,52]; difficulty in assessment differentiation and limited availability of comparator samples [45]; and varied beliefs, understandings and engagement regarding assessment [39,40], alongside limited knowledge of physical literacy [53]. Thus, considering the feasibility of a physical literacy assessment tool is of vital importance when determining appropriate use within educational contexts [54].
Effective assessment of physical literacy in PE will enable funders, policymakers, researchers and educators to understand what teaching, learning and curriculum strategies are most effective in helping support physical literacy [27,44]. Despite this assertion, divergent approaches to understanding the concept of physical literacy have led to tensions in the research literature surrounding whether physical literacy can and should be assessed, with implications for how assessment has been operationalised in practice [5,17,18,42,47,55,56]. Edwards et al. [42] suggested that idealist approaches to the concept of physical literacy, and therefore assessment, view physical literacy as holistic with inseparable dimensions and as a complex and dynamic process unique to each individual. Assessment can therefore only be captured through subjective, qualitative, interpretivist methods and is centred on an assessment-for-learning approach to monitor progress relative to the individual student's physical literacy journey [17,42,48]. At the other end of the debate are pragmatic approaches that view physical literacy as a concept that can and should be assessed for the purposes of evidence-based practice and accountability, with positivist, reductionist measurement methods typically utilised [42]. Barnett et al. [54] suggested that these approaches do not need to be mutually exclusive: while acknowledging the holistic nature of physical literacy, they suggested that existing measures of physical literacy elements should not be dismissed if they do not capture the entirety of the concept; rather PE teachers should be encouraged to recognise this limitation and evaluate the completeness of their assessment approaches. Similarly, Essiet et al. [57] proposed that a comprehensive quantitative assessment of physical literacy for teachers can be possible through an aggregate measure of all the elements and domains identified within the corresponding definition. Thus, identifying assessments of physical literacy and/or its affective (motivation and confidence), physical, and cognitive (knowledge and understanding) domains, inclusive of idealist and pragmatic approaches to the concept, can inform physical literacy assessment efforts within primary (elementary) PE. Barnett and colleagues [54] produced a decisionmaking guide for researchers and teachers for the assessment of physical literacy within the context of school PE and within the parameters of the Australian definition of physical literacy [16]. This guidance outlined key considerations to inform what assessment approach to choose, including factors such as the physical literacy elements of importance (what is being measured and what is being missed), the purpose of conducting the assessment, the assessment context and the target age range. Barnett et al. [54] recognised that there was not an "ideal" approach to measurement and therefore the guidance was aimed at empowering teachers and researchers to make informed decisions on how to assess physical literacy based on their intentions, needs and resources. It was beyond the remit of the study to review all potential assessments that could align with physical literacy domains and consider whether existing assessments/measures were reliable, valid, and trustworthy. Edwards et al. [42] conducted a systematic review of the literature and identified 52 assessments of physical literacy and related constructs evaluating these in relation to age group, environment, and philosophy. While several qualitative and quantitative tools were identified for the assessment of affective, cognitive and physical domains as well as the related construct of physical activity for use with children under 12 years old, few assessments captured the entire range of domains [42]. Within their review, Edwards and colleagues used the global search term "physical literacy" to identify assessments. There is scope to expand this review through the use of wider search terms related to the elements within affective (e.g. motivation and confidence), cognitive (e.g. knowledge and understanding) and physical (e.g. motor skills) domains of physical literacy, which could identify other relevant assessment options for consideration in assessment discourses. Furthermore, since this review was published, a number of explicit assessments of physical literacy have been developed, such as the Passport for Life [58] and Physical Literacy Assessment for Youth [59], that warrant further consideration. It was outside of the scope of the Edwards et al. [42] review to consider the measurement properties (i.e. validity, reliability, trustworthiness) and feasibility of each assessment. We believe that providing researchers and teachers with information in a single point of reference on the theoretical development, measurement properties and feasibility of assessments of physical literacy and its elements within PE contexts will further empower them to make informed decisions on selecting an appropriate assessment. Such information could assist with the development of a bank of assessment resources and guide potential physical literacy assessment development in the field.
The aim of this study, therefore, is to systematically review the scientific literature for tools to assess physical literacy and its physical, cognitive and affective domains within children aged 7-11.9 years. We selected this age group as it represents the lower and upper ages for children within Key Stage 2 of the National Curriculum in England [37] with the aim of informing PE assessments within this block of education (i.e. school years 3 to 6). This paper will explore and critically discuss each assessment tool to appraise its (a) measurement properties, (b) physical literacy elements assessed and (c) feasibility for use within a primary school setting.

Methods
This study followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [60]. The protocol information for this review was registered with PROSPERO, reference: CRD42017062217.

Inclusion Criteria
The full PICOS statement can be found in Additional file 1. Studies were included if they: 1. Sampled typically developing children with a reported mean age or age range between 7-11.9 years (including overweight and obese children and children from deprived areas). 2. Reported on a field-based assessment tool (i.e. not measured through laboratory methods) within PE or related contexts (such as physical activity, sport, active play, exercise or recreation) with an outcome relating to physical literacy (see PICOS statement for the list of outcomes (Additional file 1). Other contexts were considered in order to capture assessments that could be suitable for use in school settings and PE. 3. Cross-sectional, longitudinal or experimental study design. 4. Reported a measurement method (qualitative or quantitative) relevant to physical literacy and/or an element of physical literacy. 5. Reported information on measurement properties (quantitative assessments) or theoretical development (qualitative assessments). 6. Published in English and in a peer-reviewed journal.

Exclusion Criteria
Studies identified through the literature search were excluded if: 1. Included special populations (i.e. children with developmental coordination disorder, diagnosed with learning difficulty). 2. Lab-based assessment. 3. Book chapters, case studies, student dissertations, conference abstracts, review articles, meta-analyses, editorials, protocol papers and systematic reviews. 4. Full text articles were not available.

Information Sources
Relevant studies were identified by means of electronic searches on EBSCOhost and through scanning reference lists of included articles. The EBSCOhost platform supplied access to MEDLINE, PsycINFO, Scopus, Education Research Complete and SPORTDiscus databases. Each of the databases was searched independently. Publication date restrictions were not applied in any search with the final search conducted on 10th September 2020.

Search Strategy and Study Selection
Search strategies used in the databases included combinations of key search terms which were divided into four sections: tool (Assessment OR Measurement OR Test OR Tool OR Instrument OR Battery OR Method OR Psychometric OR Observation OR Indicator OR Evaluate OR Valid Or Reliable) AND context ("Physical Activity" OR "Physical Literacy" OR Play OR Sport OR "Physical Education" OR Exercise OR Recreation) AND population (Child OR Youth OR Adolescent OR Paediatric OR Schoolchild OR Boy OR Girl OR Preschool OR Juvenile OR Teenager) AND physical literacy elements (Motivation OR Enjoyment OR Confidence OR Self* Or "Perceived Competence" OR Affective OR Social OR Emotion* OR Attitude* OR Belief* OR Physical* OR Fitness OR Motor OR Movement* OR Skills* OR Technique* OR Mastery OR Ability* OR Coordination OR Performance OR "Perceptual Motor" OR Knowledge OR Understanding OR Value OR Cognition* OR Health OR Wellbeing*). Boolean searches were carried out using "AND" to combine concepts (tool, context, population, element) and narrow the search to only capture articles in which all relevant concepts appear (see Additional file 2 for an example search strand). Following the initial search, all records were exported to Covidence (Covidence systematic review software, Veritas Health Innovation) for screening (Covidence data/reports are available from the contact author upon reasonable request). Duplicates were removed using Endnote and the two lead authors (CS and HG) screened all titles and abstracts. Only articles published or accepted for publication in peer-reviewed journals were considered. A third author (LF) checked decisions on what to include based on the inclusion/exclusion criteria (i.e. age range, typically developing population, field-based assessment, study design, physical literacy element, measurement properties and peer-reviewed status) and any disagreements were resolved by discussion and collaboration with all authors. Full-text articles were further evaluated separately for relevance by the two lead authors (CS and HG) and labelled "yes", "no", or "maybe". The two reviewers conferred and, following discussion on any inconsistencies, agreement was reached on all articles. A third reviewer (LF) checked all of the studies that met the inclusion criteria and 10% of studies that were excluded to ensure accuracy in the study selection process. All decisions were made in closed meetings with no recorded minutes and are attributable to the authors. Where a manual was available for an assessment that met the inclusion criteria, these were accessed if the manual was freely available online or, alternatively, through contacting the study authors where possible.

Data Collection Processes
Due to the large number of studies included after full text screening, the studies were divided into explicit physical literacy assessments and related physical, affective, and cognitive domains in accordance with definitions and conceptualisations of physical literacy [1, 2,6,16,20,26]. This categorisation of assessments of elements into domains was undertaken in order to position assessments into familiar categories known to potential assessment users (e.g. coaches, researchers and teachers in physical literacy and physical education) and for ease of interpretation. The lead authors (CS physical and physical literacy; HG affective and cognitive) independently extracted individual study data relating to study information (authors, publication date, country and study design), sample description, purpose of study, the physical literacy element being assessed (as described by the study authors themselves), measurement technique (i.e. interviews, questionnaires, practical trial), outcome variables, measurement properties/theoretical development and utility information (reliability, validity, responsiveness and feasibility). Data extraction was checked for accuracy for the first three studies across each domain by a third reviewer (LF) and any inconsistencies were resolved following discussion with the lead authors.

Quality Appraisal
The COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) checklist was used to evaluate the methodological rigour of assessments [61,62]. The COSMIN checklist has been developed by a team of international multidisciplinary researchers and is of a modular design, which enabled flexibility to suit the needs of the current systematic review. Using the COSMIN risk of bias checklist [61] each measurement property (content validity, construct validity, internal consistency, cross-cultural validity, test retest reliability, intra-rater reliability, inter-rater reliability, criterion validity) was appraised for methodological quality and subsequently given a rating of "very good", "adequate", "doubtful", or "inadequate" or, if not reported, "NR". This 4-point rating scale and worst score counts method were used throughout. Where the reporting of measurement properties received a rating of "very good", the validity and reliability of the tool can be appraised using established thresholds [63] (see Additional file 3). The lead authors (CS physical and physical literacy assessments; HG affective and cognitive assessments) independently appraised measurement properties; a third reviewer checked 10% of measurement quality ratings and threshold scoring for accuracy and any uncertainties were discussed and agreed upon in face-to-face meetings with all three reviewers (CS, HG, LF). The COSMIN guidelines were updated during the review process and new guidance regarding the importance of each measurement property was detailed [62]. According to the updated guidelines, if neither the original study, an associated paper or the tool manual adequately describes the measurement development process and/or aspects of content validity, then the tool should not be appraised by researchers further in relation to wider measurement properties. We elected to follow the previous guidelines and made a conscious decision to appraise all the available measurement properties within all the eligible studies in order to be inclusive and present a detailed overview of what assessments are available. As qualitative assessments were also eligible for inclusion, the National Institute for Healthcare and Excellence (NICE) Quality Appraisal Checklist for qualitative studies [64] was identified as a tool to appraise the methodological rigour of these assessments.
The feasibility of each assessment tool, including factors such as cost-efficiency (time, space, equipment, training and qualifications required) and acceptability (participant understanding, completed assessments), was appraised using a utility matrix developed from previous research [65,66] (see Table 1). Each dimension of feasibility was independently scored on a 1* (low feasibility) to 4* (high feasibility) scale using information reported within included studies and manuals. An overall feasibility utility matrix score was also calculated by summing the scores from each of the seven feasibility items to allow comparisons between assessments (maximum feasibility score = 28).
A physical literacy element checklist was developed to highlight which aspects of physical literacy each assessment captured, as explicitly stated within the included studies and manuals. The checklist was developed by the research team through discussion in a closed meeting following an overview of international physical literacy literature [2] and utilised elements captured within various conceptualisations of physical literacy [1, 20,26,[67][68][69]. The definitions adopted internationally were collated and cross-referenced, identifying distinctive characteristics of physical literacy referred to in research and policy. This process resulted in a checklist that included 10 affective, 20 physical and 11 cognitive physical literacy elements ( Table 2).
Each of the included studies was independently scored for feasibility and checked for physical literacy elements by the two lead authors (CS and HG). As above, tools were divided into domains and scored separately by the lead authors (CS: physical and physical literacy; HG cognitive and affective). Each lead author (CS an HG) checked 10% of studies from the other lead author to ensure consistent methodological rigour of the feasibility and physical literacy element scoring. Any discrepancies were discussed and resolved in face-to-face meetings with the third reviewer (LF).

Assessment Characteristics
You can do it."), or to incite them to perform the skill/task, place a tick in the "Prompt" column.

Mimic:
If the child waited for one of their peers to perform the skill first, place a tick in the "Mimic" column. Describe: If the child asked the assessor to describe the skill/task, place a tick in the "Describe" column. Demo: If the child asked the assessor to demonstrate the skill/task, place a tick in the "Demo" column.
Affective domain AGSYS USA Cumming et al. [81] N = 1675 NR (9-12, NR) Use the 2x2 achievement goal framework to assess goal approach orientations 12 items related to mastery/ego X approach/ avoidance goal framework  N = 230 51% (9-10; 9.5 ± 0.7) Assess psychosocial variables as part of a 3year randomised controlled trial aiming to prevent obesity through an after-school programme 16 items: PA task self-efficacy (1 item), PA barriers self-efficacy (4 items), PA enjoyment (2 items Physical Activity Self-efficacy, enjoyment, social support [87] interests and intentions) and living skills (21 items relating to feelings, thinking and interacting skills). The fitness and movement assessments are scored by teachers using detailed rubrics that examine the technique and outcomes of the movements, with children placed into one of four categories: emerging, developing, acquired, accomplished. CAPL-2 was developed for monitoring and surveillance of physical literacy [67]. The CAPL-2 protocol integrates the measurement of physical competence (PACER test, Plank [70] and the Canadian Agility and Movement Skills Assessment: CAMSA [71]), which is worth 30 points, motivation and confidence (30 points), daily physical activity behaviour as assessed by self-report and daily pedometer step count (30 points) and knowledge and understanding (10 points). The knowledge and understanding component includes four questionnaire items and a missing word paragraph activity. Scores from domains are summed to create a CAPL-2 total score out of 100, which is used to classify the children into one of four interpretative categories (beginning, progressing, achieving or excelling) based on age and sex-specific cut points. Within the physical domain, assessments were typically administered within the gym hall or an onsite sports facility within the school setting (n = 15); only one tool (PARAGON) utilised an outdoor garden setting. Additionally, each physical tool utilised a form of product scoring (i.e. ALPHA, AST, BOT-2 SF, EUROFIT, FITN ESSGRAM, MABC-2, MOBAK-3, MUGI, OP, SEBT, YBT), which focuses on the outcomes of the movements (e.g. distance jumped) or process scoring (i.e. GSPA, SS, TGMD-3), which focuses on the technical quality of the movement (e.g. arms extending upwards and outwards during jump). Assessments within the affective and cognitive domain were typically administered via a pen and paper or online questionnaire, with picture/photo support for some. All questionnaires used Likert scale rating systems or structured alternate response formats to score responses. One affective domain assessment, the RCS, consisted of the observation  of a child's completion of a physical activity obstacle course, where observers were asked to score the child's self-regulation and response to challenge using a 7point bipolar adjective scale [118]. The two assessments solely included within the cognitive domain were reported in intervention studies [163,164].

Physical literacy Elements
Each tool within the review assessed an element of physical literacy (see Tables 4, 5 Table 7 shows the risk of bias scores (i.e. the methodological quality of the included studies for each measurement property). The data extracted from the studies in relation to validity and reliability can be found in Additional file 4 and Additional file 5, respectively. In general, evidence was limited with few studies reporting across the range of COSMIN measurement properties. Studies reporting the measurement properties of explicit physical literacy assessments tended to have higher methodological quality scores, with all three tools receiving ratings of "adequate" or "very good" for the measurement properties reported. Overall, CAPL-2 was assessed in the most robust methodological studies. CAPL-2 and PFL received quality scores of "very good" for content validity due to studies reporting methods which provided opportunities for experts and child participants to feedback on the assessment such as Delphi consultations and pilot testing. Construct validity was also well reported within research studies for the physical literacy tools, with all three assessments receiving a score of "very good" due to undertaking a confirmatory factor analysis within an adequate sample size and reporting an "acceptable fit" to the data provided. Although all explicit assessments included reliability information, only PLAYfun and PFL reported on internal consistency, while only CAPL-2 had good evidence for test-retest reliability. For PLAYfun, the specific physical subscales scores ranged from poor-to-good for internal consistency (α = 0.47-0.82), though only of the subscales was below a good level (< 0.7). For PFL, ICC values ranged from 0.61 to 0.87 across subscales, indicating moderate to good internal consistency. CAPL-2 provided intra-rater reliability results for the plank hold (ICC = 0.83), skill score (ICC = 0.52) and completion time (ICC = 0.99). Inter-rater reliability was good for PLAYfun (ICC, 0.87), and moderate for CAPL-2 in the plank hold (ICC = 0.62) and skill score (ICC = 0.69) though excellent for completion time (ICC = 0.99), though the methodological quality of studies in this regard was only adequate. PLAYfun was the only tool to report information for criterion validity (methodological rigour scored as "very good"), with a moderate to large correlation between PLAYfun and the CAMSA (r = 0.47-0.60). CAPL-2 received a score of "very good" for crosscultural validity, with Dania et al. [76] and Li et al. [77] reporting confirmatory factor analysis procedures that  confirm the four-factor structure as a good fit within Greek and Chinese populations, respectively. Within the affective domain, 87% of included studies provided detail surrounding content validity. This typically included reviews of the literature and contributions from an expert panel. A large number of the affective assessments were originally developed for adolescent or adult populations and were adapted for use with children. As a result, these studies received an "inadequate rating" for content validity. Only 36% of studies involved children in assessment development: ATCPE and CAPA used children to generate items while other studies involved children in pilot assessment or cognitive interviewing (AGYS, ATOP, FHC-Q, MAAP, PACES, PAHFE, PASES, RCS, Self-efficacy Scale, TAGM). The majority of affective related studies reported construct validity (66%), which was commonly determined through confirmatory factor analysis, although the use of other methods and lower sample size downgraded the methodological quality of some of these studies for other tools (CPAS, PASE, PASES, TAGM). The studies of very good methodological quality generally reported that the factor analysis supported the proposed model structure (AGYS, BREQ, CY-PSPP, DPAPI, NAS, PABM, PACES, PAHFE, PAS, PA self-efficacy enjoyment and social support scale, PLOC in PE, SPPC). Cross-cultural validity was reported for CAPA [90] and PASES [112] as both studies provided satisfactory evidence that no important differences were found between language versions in multiple group factor analysis. Only 31% of studies included within the affective domain reported information relating to reliability (AGSYS, ATCPE, CATPA, CY-PSPP, FHC-Q, LEAP, PAHFE, PASES, PA self-efficacy enjoyment social support, RCS). The majority of studies reported internal consistency (91%). With the exception of the DPAPI, all of the tools that did report internal consistency were considered of very good methodological quality as they presented Cronbach's alpha coefficient for each subscale. The Cronbach's alpha coefficients generally reported were > 0.7 and therefore deemed acceptable. Only one affective tool was assessed for test-retest reliability within a very good quality study  (LEAP). Median kappa agreement scores varied significantly from 0.22 to 0.74 by construct, ranging from fair to substantial agreement [102]. The RCS scored "inadequate" for construct validity, and "doubtful" for interrater reliability methodological quality. Within the physical domain, 13 tools (86%) reported information relating to content validity, however, no assessments received a score of "very good" for methodological quality. Despite the majority of tools utilising "widely recognised or well-justified methods" [61] (i.e. literature reviews, consulting experts, Delphi polls etc.), there was a lack of clarity regarding the implementation of these methods and how/if any findings were analysed. This included information concerning researcher involvement, data collection process, recording of consultations/meetings and who led the analysis of collected information. Nine tools had studies that reported construct validity, with studies of the MABC-2, MOKAB-3, SS and TGMD-3 displaying "very good" methodological rigour and reporting a good fit between each conceptual model and the provided data. In addition, AST, MABC-2 and the TGMD-3 reported "very good" criterion validity protocols. Specifically, moderate correlations were reported between AST and the KTK (r = 0.47 to 0.50) and between TGMD-2 and MABC-2 (r = 0.30). Internal consistency was reported for 6 assessment tools (BOT-SF, FITNESSGRAM, MACB 2, MUGI, TGMD-3 and YBT) with only the MABC-2 and TGMD-3 receiving scores of "very good" methodological quality due to studies reporting the relevant statistics for each unidimensional scale. MABC-2 showed good reliability across three subscales (α = 0.78), alongside the standard scores on each subtest independently (manual dexterity: α = 0.77; ball skills: α = 0.52; balance: α = 0.77). Similarly, the TGMD-3 reported excellent internal consistency: locomotor skills α = 0.92; ball skills α = 0.89; and object control α = 0.92. Finally, the TGMD-3 had very good evidence for cross-cultural validity, with two studies using confirmatory factor analysis to indicate a good factor structure within Spanish and Brazilian populations [151,154].

Measurement Properties
Both tools within the cognitive domain, BONES PAS [163] and PHKA [164], were developed as part of a wider intervention. In relation to the content validity of tool development, BONES PAS researchers reported the use of focus groups and literature reviews, while PE specialists were also consulted by the research team to identify common weight-bearing activities that children engage in on a regular basis. The authors noted that the need to quantify knowledge and understanding of weight-bearing physical activity was balanced against the cognitive limitations of children (i.e. short attention span, inability to accurately estimate time). No other details on validity were reported. Both tools (BONES PAS, PHKA) included in the cognitive domain reported test-retest reliability. However, methodological flaws resulted in "inadequate" scoring. BONES PAS was administered by trained research assistants once to each child on the same day, but only 1-2 h apart. PHKA readministered the questionnaire after a 2-week interval, however, ICC or weighted kappa was not reported. Neither tool within the cognitive domain reported details relating to other measurement properties and therefore these could not be appraised.
Feasibility Table 8 provides the utility matrix ratings of each assessment (maximum score possible=28). All of the explicit physical literacy assessments could be completed using the space and resources available in a typical primary school environment. CAPL-2 (feasibility score=16), PLAYfun (14) and PFL (20) all provide a catalogue of resources online, which can be accessed and used by a class teacher (or any other engaged stakeholder) to prepare for, administer and score all portions of the assessment. PFL, designed for PE teachers, scored highly in qualification requirements, training and participant understanding. PLAYfun is, however, designed to be used by trained professionals (e.g. coach, physiotherapist, athletic therapist, exercise professional or recreation professional) and therefore was deemed less feasible for use by PE teachers in terms of qualifications required, though specific training for the aforementioned professionals is not required. Stearns et al. [79] reported that graduate assistants undertook 3 h of training for PLAYfun, suggesting good feasibility. PLAYfun also records child comprehension; as a result, it scored highly in relation to participant understanding. CAPL-2 scored best for training requirements and time out of the explicit physical literacy assessments. CAPL-2 is reported to be completed in approximately 30-40 min per individual (not including the pedometer assessment of daily PA behaviour across a week), with the knowledge questionnaire taking up to 20 min depending on the child. Teachers are encouraged to conduct the assessment components over separate days if this is more feasible for larger group class sizes. Teachers reported conducting PFL took between 2.5 and 6 classes [58], while four assessors completed PLAYfun assessments with 20 children or less in 3 h, evaluating each child individually in an isolated portion of the gymnasium (remaining students played supervised games and other assessments) [79].
Within the affective domain, the highest scoring feasibility tools were PACES (19), PAHFE (18), LEAP (16) and CAPA (16). Within the cognitive domain, BONES PAS scored 11, and PHKA 10, with neither assessment reporting information on time required to complete or training required to administer the questionnaire. Feasibility relating to space and equipment scored highly across the affective and cognitive domain as many of these assessments are pen and paper questionnaires that could be completed in a small space with equipment typically available in a primary school. Studies included within these domains often failed to report further details in relation to feasibility. Only 31% of cognitive and affective assessments had information in relation to the time needed to complete an assessment (ASK-KIDS and only 8% of assessments had information on the training required to administer these assessments (CAPA, CPAS, Physical Activity Self-efficacy enjoyment social support, RCS). BONES PAS was slightly higher scoring within the cognitive domain, primarily as the assessment scored highly for participant understanding, as children were involved in the development of the scale and statements. Manios et al. [164] reported little detail in relation to feasibility, simply stating the PHKA portion of their data collection "was completed in the presence of a member of the research team".
Within the physical domain, feasibility scores ranged from 9 (BOTMP-SF) to 17 (YBT, SEBT), with SS (15) also scoring highly. The feasibility findings highlight that typically an appropriate time for a school PE lesson (approximately 50 min) was required to complete an assessment. Specifically, 4 assessments (AST, GSPA, SEBT, YBT) reported taking less than 15 min to complete, with a further 3 tools (BOT-SF, MOKAB-3 and SS) requiring between 15-30 min. Additionally, the equipment needed to conduct assessments was scored positively for the majority of tools, as most required equipment would likely be present in a typical primary school setting, e.g. balls, cones, and skipping ropes. Some tools (40%) did require additional or specialised equipment (OP, GSPA, BOT-SF) such as sport-specific equipment (i.e. junior-sized gold club [GSPA]), or equipment to measure specific elements such as manual dexterity (e.g. pegs and a pegboard [BOT-2 SF]). Furthermore, the majority of assessments required either a PE/Sports specialist/researcher to administer (80%), with only two tools (PARA GON and MUGI) being appraised as "Able to be administered by qualified teacher".

Discussion
The aim of this systematic review was to identify and appraise tools to assess physical literacy and related affective, physical and cognitive elements within children aged 7-11.9 years old for use in a primary school PE setting. From 88 studies, a total of 52 unique quantitative assessments were identified and subsequently examined for validity, reliability, feasibility and physical literacy elements being measured. In contrast to Edwards et al. [42], our search did not find any qualitative assessments of physical literacy within this age group. Only three explicit physical literacy assessments were represented in studies that met the inclusion criteria (CAPL, P4L, PLAYFun), though there were a number of assessments within affective (32 assessments) and physical (15 assessments) domains that could be used within a pragmatic physical literacy assessment approach. Far fewer assessments were found within the cognitive domain (two assessments). Our check for assessment of 41 different elements of physical literacy (10 affective, 20 physical and 11 cognitive), contained in various conceptualisations of the concept [1, 20,26,[67][68][69], highlighted elements that were consistently measured across tools and those not yet measured through existing assessments. Our analysis revealed that while some tools have established validity and reliability, and are feasible, the quality of reporting in studies concerning many measurement properties are mixed, indicating that more robust methodological work is required to support tool development. Nevertheless, taken together, the results suggest that there are a number of measurement options available to researchers and PE teachers to assess physical literacy and/or its affective, physical and cognitive domains that are feasible for administration within upper primary PE (7-11.9 years old in the UK).

Study Quality
To be included in this review, studies of quantitative assessments of physical literacy and related domains had to report data for at one least measurement property from the properties assessed using the COS-MIN risk of bias checklist. Overall, the methodological quality of studies reporting this information was inconsistent. Studies tended to examine and report on one or two measurement properties (typically an aspect of reliability and/or validity), but rarely addressed all relevant measurement properties within the risk of bias checklist. Reliability was most frequently assessed across all domains, echoing the findings of recent reviews investigating motor skill assessments [167][168][169]. The majority of studies within the affective domain reported information related to internal consistency (i.e. the interrelatedness of items on a scale) and in the required level of detail (87% of studies receiving a score of "very good"). Similarly, within the cognitive and physical domains, 83% and 80% of assessments provided information relating to tool reliability, respectively. Physical domain assessments were more likely to report inter-and (to a lesser extent) intra-rater reliability due to the assessments being administered and scored by researchers or teachers, whereas cognitive and affective domain assessments typically employed questionnaire methods, and therefore, these reliability dimensions are not relevant. Though test-retest reliability was rarely reported, the wider reporting of a measurement property relating to other aspects of reliability (i.e. internal consistency, intra-and inter-rater reliability) may suggest that, to date, researchers in physical activity, exercise, sport and health fields have prioritised assessing and reporting the reliability of an assessment tool above other measurement properties.
Recent guidance from COSMIN outlines that tool development and content validity are the most important measurement properties to be considered for assessments [61,62]. We found that 43 tools reported information relating to content validity, however, only 5 tools (TGMD-3, FitnessGram, Self-Efficacy Scale, CAPL-2 and PFL) received a study quality score of "very good"; notably, two of these assessments (CAPL-2 and PFL) were developed specifically as physical literacy tools. This is particularly concerning as if researchers do not provide sufficient evidence that assessments are valid for use within the targeted population, then arguably the assessments are not appropriate for use [61,62]. COSMIN guidance states that in order to achieve a "very good" score for tool development/content validity, the relevance, comprehensiveness and comprehensibility of assessments should be considered in detail, i.e. "ensuring that included assessment items are relevant and understood by the target population" [61]. This can be achieved by tool developers including participants in the tool development process and encouraging the sharing of experiences and opinions regarding assessment. For tools that received an "inadequate" or "doubtful" score for tool development/content validity, the associated studies failed to provide adequate detail on concept elicitation, i.e. the methods used to identify relevant items and/or how these items were piloted and refined. It is unclear whether this information was not considered by study authors within the tool development process or whether it was just not reported. Our findings around the poor methodological quality of studies reflect those found within recent reviews of motor competence assessments [167,168]. Taken together, the mixed standards of reporting of information relating to measurement properties indicate that researchers should be encouraged to utilise the COS-MIN checklist to improve the methodological quality of assessment development and the reporting of the measurement properties of assessments.

Explicit Physical Literacy Assessments
There have been significant efforts towards physical literacy in Canada for over a decade [12,44]. Each of the three explicit physical literacy assessments identified was developed by Canadian organisations who have embraced the concept. These include the Healthy Active Living Research Group's (HALO) Canadian Assessment of Physical Literacy (CAPL-2: see www.capl-eclp.ca/) [71,72], Canadian Sport for Life's Physical Literacy Assessment for Youth (PLAY, specifically PLAYfun, see https://play.physicalliteracy.ca/) [59], and Physical and Health Education Canada's Passport for Life (PFL, see https://passportforlife.ca/) [58]. These assessments are suitable for ages 8-12 years, 7+ years and 8-18 years, respectively, and supported by a wide range of online resources and training materials, including information and feedback guides for children, parents and teachers. Their stated purposes differ somewhat with CAPL-2 being developed for monitoring and surveillance of physical literacy in children, PFL for formative assessment in PE, and PLAYfun for programme evaluation and research in sport, health and recreation.
We found that CAPL-2 (affective, n = 4; physical, n = 11; cognitive, n = 3) and PFL (affective, n = 8; physical, n = 9; cognitive, n = 4) assessed more physical literacy elements noted within our checklists than the PLAYfun (affective, n = 1; physical, n = 5; cognitive, n = 1) assessment. These tools are anchored within somewhat different evolutions of physical literacy definitions, which may explain the different elements assessed. In 2015, many organisations across sport, health and education sectors in Canada joined together to generate the Canadian Physical Literacy Consensus Statement [7], which endorsed the IPLA/Whitehead definition of physical literacy [7,21]. As such, CAPL-2 assesses the elements stated within the IPLA definition using a points-based modular system with assessments of motivation and confidence (30 points), physical competence (30 points), knowledge and understanding (10 points), as well as physical activity behaviour (30 points), which can be aggregated to determine a physical literacy score out of 100. The remaining Canadian assessments (PFL, PLAYfun) more closely align with the previous definition put forward by Canadian Sport for Life and PHE Canada in accordance with Whitehead's earlier work [18]: "Individuals who are physically literate move with competence and confidence in a wide variety of physical activities in multiple environments that benefit the healthy development of the whole person". PFL has four distinct assessment domains that are intended to be viewed in isolation including movement skills, fitness, living skills (described as feeling and thinking skills), and active participation (diversity, interests and intentions). PLAYfun focuses on assessing movement competence in 18 tasks, respectively. The child's confidence and comprehension of each movement task can also be simultaneously assessed but are not accounted for in the scoring, indicating a hierarchy of focus on physical competence. PLAY [59] includes a number of other assessment resources including PLAYparent, PLAYcoach, and PLAYself, with the latter being a self-report questionnaire for children that assesses affective and affective elements, but, at the time of this review, no studies were found that reported measurement properties for the wider PLAY tools.
Despite using variations of Whitehead's conceptualisations of physical literacy, these Canadian explicit physical literacy assessments appear to have distinct assessment hierarchies (i.e. prioritising one domain over another), strong yet different classifications (referring to what is being assessed and what is not, and within fixed chronological age ranges) and diverse scoring criteria [170]. The prioritising of one domain over another within an explicit physical literacy assessment is problematic as it is inconsistent with holistic perspectives that view all domains as equal [48]. Furthermore, while both CAPL-2 and PFL assess across affective, physical and cognitive elements of physical literacy, these are modular assessments, and thus, domains are assessed in isolation, reflective of more pragmatic approaches to physical literacy assessment [42]. Each tool uses selfreported questionnaires to capture affective, cognitive or behavioural domains of physical literacy, thus allowing the participant to portray their own capabilities. Yet assessments within the physical domain are primarily framed as teacher-led and assessed through process and product criteria interpreted against age and sex-specific norms (CAPL-2), or detailed rubrics (PFL) and rating systems (PLAYfun) based on the quality of movement [170]. The latter provide a more individualised focus for the assessment and reduce comparisons with others, which some may consider more reflective of agreed conceptualisations of physical literacy [48]. PFL and PLAYfun tools show promise in capturing important aspects of physical literacy, but more validity, reliability and feasibility evidence are required. CAPL-2 demonstrated the strongest methodological quality of the three explicit physical literacy assessments, with good validity and reliability reported across several studies. Furthermore, CAPL-2 is the only one of the three tools that has provided evidence of cross-cultural validity, supporting its potential use with other countries and cultures [76,77]. Accordingly, to date, we suggest that the CAPL-2 is currently the most robust explicit physical literacy assessment tool available to PE teachers and researchers to assess children aged 8 to 12. Of course, each explicit physical literacy assessment can be aimed at different purposes, so practitioners are encouraged to reflect on the most appropriate tool that fits their needs [54].

Assessments of the Affective Domain
The affective domain of physical literacy includes elements such as confidence, motivation, emotional regulation and resilience [1,20,26,[67][68][69]. In total, we found 32 assessments within this domain (35 including CAPL-2, PFL and PLAYfun), with enjoyment being the most frequently assessed affective element (13 assessments), followed by motivation (11 assessments), confidence (10 assessments) and perceived competence (8 assessments). Enjoyment is not explicitly included in definitions of physical literacy [2], though Edwards et al.
[1] did identify "engage, enthuse, enjoy" as a core category of physical literacy and "engagement and enjoyment" is listed as an element within the psychological domain of the Australian Physical Literacy Framework [16]. Previous research has linked enjoyment to intrinsic motivation and more autonomously regulated behaviour in relation to PE and PA [11,171,172], as well as meaningful experiences in PE [173]. The importance of enjoyment indicates that researchers and PE teachers may wish to consider the construct within a physical literacy assessment approach within PE. Further research and consensus are needed, however, on whether enjoyment should be a more prominent (i.e. core) element of physical literacy due to its relevance in fostering meaningful movement experiences-perhaps likened to the ongoing considerations concerning the inclusion of social and behavioural elements in relation to physical literacy [6,17,28].
Considering the explicit physical literacy assessment tools, PLAYfun records two affective elements (confidence and willingness to try new things), yet these do not contribute to the PLAYfun scoring (NB. PLAYself [59] does assess wider affective items, but no studies reporting measurement properties were located at the time of this review). CAPL-2 includes questionnaire items stated to assess confidence, intrinsic motivation, enjoyment, and perceived physical competence, though the confidence items more closely relate to perceived competence (e.g. "When it comes to playing active games, I think I'm pretty good") and adequacy (e.g. "Some kids are good at active games, Other kids find active games hard to play"), than confidence or selfefficacy per se, which corresponds with capability beliefs about whether the movement or physical activity behaviour can be achieved [174,175]. The PFL questionnaire items assessed eight elements of the affective domain and therefore was the most comprehensive; the only element it did not assess was the willingness to try new activities. As a result, and in consideration of the reported measurement quality, properties and feasibility, this could be an appropriate questionnaire-based method to assess the affective domain of physical literacy in this age group (7-11.9 years), though this questionnaire is lengthy (21 items) and would take longer for children to complete.
We identified 32 other tools that assessed affective related elements of physical literacy and could therefore be useful in a physical literacy measurement approach. Several of these tools reported good evidence for construct validity and internal consistency (AGSYS, BREQ, CY-PSPP, NAS, PASES, PAHFE, PAS, PASSEESS, PLOC in PE, SPPC), indicating that they were theoretically sound in their measured outcomes. Eight of these additional tools measured at least three affective elements in our checklist (ATCPE, BREQ, CPAS, HOP'N, MOSS, PABM, PASE, PASES). For example, the PABM (motivation, confidence and enjoyment, persistence), ATCPE (emotional regulation, enjoyment, self-esteem, perceived physical competence) and PASE (confidence, autonomy, self-esteem and perceived physical competence) each include items to assess four affective elements. There were 13 tools that only assessed one element: ATOP (emotional regulation), DPAPI (motivation), EnjoyPE (enjoyment), FAPM (emotional regulation), LEAP (enjoyment), MAAP (enjoyment), PAHFE (confidence), PLOC in PE (motivation), PMSC (motivation), RCS (emotional regulation), Self-efficacy scale (confidence), TAGM (motivation) and TEOSQ (motivation). While many affective measures were found, these individual elements are frequently assessed as multi-dimensional constructs and as such include a large number of questions/items per attribute. Thus, regardless of their feasibility, methodological quality and measurement properties, these tools only provide a narrow picture of the affective domain of physical literacy and would therefore need to be combined with other affective assessments if a more comprehensive assessment was sought by PE teachers or researchers.
The majority of the affective (and cognitive) assessments included within this review were questionnaire based. The systematic review by Edwards et al. [42] on physical literacy measurement identified a number of qualitative assessments including interviews, reflective diaries, and participant observation used amongst children under 12. These findings suggest that alternative methods are available, though these studies were not identified in the current review using our search terms and inclusion criteria. Although these qualitative assessment methods can be individualised, ipsative, holistic and thus aligning with idealist perspectives of physical literacy [48], these methods are perhaps not appropriate to effectively assess the affective/cognitive domains of physical literacy in children when used in isolation due to the (in)stability of children's thoughts and feelings [42]. Thus, regular observations of children would be important to chart progress in relation to an individual's attitudes, beliefs, emotions and understanding in relation to movement and physical activity. Yet the feasibility of time-poor primary school PE teachers undertaking these qualitative assessments with a class of approximately 30 children is unclear. Thus, more research is needed to develop rigorous qualitative methods that align with the stated definition adopted for physical literacy and its corresponding elements and are feasible for use in school contexts by primary school teachers.

Assessments of the Physical Domain
Physical competence is a fundamental component of physical literacy and as such is represented in every contemporary definition of the concept available [2,42]. Within the physical domain, there is some overlap between physical competence and common terminology used within well-established research fields, i.e. motor competence, motor control, motor proficiency, and health-and skill-related fitness [13][14][15]. This was further supported by the findings of this review as a high proportion of existing tools assessed fundamental movement skills (AST, BOT-2 SF, MABC-2, MOBAK-3, MUGI, OP, TGMD-3) and fitness components (ALPHA, EUROFIT, FITNESSGRAM). Similar to recent reviews on motor competence assessments [167,168], we found that the TGMD-3 [149][150][151][152][153][154][155]162] and MABC-2 [136][137][138][139] had the best methodological quality studies for measurement properties of the movement skill-specific assessments, while FITNESSGRAM [132][133][134]160] had the best methodological quality studies for the broader health and skill-related fitness test batteries. All tools within the physical domain provided assessments for land-based movement skills, though we did not examine whether assessments were suitable for assessing the use of such skills within different terrains (e.g. rocky-terrain, forest, sand). None of the tools assessed water-based activities, despite swimming being the only compulsory physical activity within the UK, Australian and American primary PE curriculums [37,176]. Similarly, through our search terms and inclusion criteria, we did not identify any assessments of cycling, which is an important foundational movement for physical activity across the lifespan [177], nor did we identify tools designed to explicitly assess the elements of aesthetic/expressive movement, sequencing, progression and application of movement specific to the environment. This could be a limitation of our search strand (e.g. we did not include dance as a search term, but did include "coordination" and "performance") or a consequence of the lack of assessments of these elements in this age group and/or associated studies not reporting information on measurement properties to meet the inclusion criteria. Given that the capability to move within different environments, regardless of weather, season, or terrain, will likely influence a child's safety and opportunities to be physically active, the appropriateness of land-based assessments to assess competence in moving across different terrains warrants further study. Similarly, the identification and appraisal or development of assessments of dance and foundational movement skills for lifelong physical activity such as cycling, and swimming should be a focus for future research.
Of the self-titled physical literacy assessments, CAPL-2 explicitly assessed 11 elements within the physical domain-the most comprehensive assessment in this regard, PFL 9 elements, while PLAYfun assessed 5 elements. PLAYfun only assessed skill-related aspects of physical competence and did not include any measures of strength or endurance, which have been found to be important markers of health and functional living across the life course [178][179][180]. The assessments within the physical domain utilised a form of product scoring (i.e. ALPHA, AST, BOT-2 SF, EUROFIT, FITNESSGRAM, MABC-2, MOBAK-3, MUGI, OP, SEBT, YBT), which focuses on the outcomes of the movements (e.g. distance jumped, time to completion) or process scoring (i.e. GSPA, SS, TGMD-3), which focuses on the technical quality of the movement (e.g. arms extending upwards and outwards during jump). Some researchers have argued that the use of product-based scoring does not consider the quality of the movement and therefore potentially provides an opportunity for children to draw comparisons between peers, which they consider problematic as physical literacy is a concept concerned with the unique individual [42,48]. On the other hand, researchers advocating for nonlinear perspectives on movement competence argue that assessing the technical quality of movement is less important than the functional effectiveness of the movement, which can be achieved through a range of different movement solutions [181]. Moreover, product scoring does require less training and expertise than observing the quality of movement [182,183], and so therefore may have a place in primary school assessment providing it is administered in an appropriate, non-competitive manner.

Assessments of the Cognitive Domain
For individuals to value and take responsibility for maintaining an active lifestyle, knowledge and understanding of the benefits of involvement in physical activity and of the nature of different activities and their particular challenges is important [20,184,185]. The cognitive domain checklist therefore included 11 elements related to the knowledge and understanding of factors related to physical activity [1, 20,26,[67][68][69]. We found two assessments that solely related to elements within the cognitive domain of physical literacy (BONES PAS, PHKA), though the methodological quality of these studies [163,164] was inadequate and therefore we do not recommend these tools for use at this time. Some cognitive aspects are also captured in the explicit physical literacy assessments (CAPL-2, PFL and PLAYfun). BONES PAS, PHKA, CPAL-2 and PFL included an assessment for knowledge and understanding of the benefits of PA, an element which is associated with improved PA behaviours [185] and a defining element within Whitehead's interpretation of the cognitive domain [21]. BONES PAS, CAPL-2 and PFL also assessed the importance of PA, while BONES PAS and CAPL-2 both assessed the effects of PA on the body. Considering these five tools together in relation to the cognitive domain, there remains a lack of assessments relating to the subelements of sedentary behaviour, safety considerations, reflection, creativity and imagination in application of movement, and knowledge and understanding of tactics, rules and strategy. The original CAPL assessment [67] did include items related to safety, activity preferences, and screen time guidelines, but they were removed from CAPL-2 following a Delphi survey with experts and because of their weak factor loadings onto higher order constructs [73]. Movement creativity is a perceptual ability that requires emotional regulation and critical thinking, with a high degree of knowledge and understanding required to achieve a task goal [186,187]. Assessing movement creativity could be an important outcome for PE teachers within a physical literacy assessment approach as children that can create and modify movement actions within different physical activity environments can also identify opportunities to engage in physical activity [188]. Furthermore, knowledge of tactics, rules and strategy are likely to be important outcomes for the primary educational curriculum wherein children are introduced to competitive games and sports and asked to apply basic principles of attacking and defending [189]. Thus, working with PE educators to establish assessments in this regard would be useful to chart developmental progress in cognitive domains of physical literacy.
The cognitive domain is the least frequently assessed domain of physical literacy in children aged 7-11.9 years old, and the least represented domain in the explicit physical literacy assessments. This is problematic for holistic considerations of physical literacy. Identifying stage-appropriate knowledge and understanding in relation to physical activity, and the subsequent assessment of this competency, and its relationship to physical activity behaviour, is an area for ongoing development. The development of the Physical Literacy Knowledge Questionnaire for children aged 8-12 years old in CAPL-2 by Longmuir et al. [73] followed robust methodological work. This included content analysis of the educational curriculum, contributions from expert advisors and the piloting of open-ended questions with children, to generate the closed-ended format. Again, it may be beneficial for physical literacy researchers to examine educational curriculums and explore other fields such as physical activity or health literacy, to identify what is stage-appropriate knowledge in this age group, and how this is assessed. Health literacy, defined as the ability of an individual to find, understand, appraise, remember and apply information to promote and maintain good health and wellbeing [190][191][192], includes similar core outcomes to physical literacy. Therefore, the potential links between health and physical literacy warrant further study [193]. Taken together, the cognitive domain is understudied and perhaps not widely understood. Therefore, more research is needed to identify and clarify the key cognitive elements that are important to the concept of physical literacy and enrich assessments of this domain.

Feasibility
Teachers have noted significant barriers to implementing assessment in PE [34, 35, 40. 46-48] [194]. Therefore, considering the feasibility of each physical literacy assessment tool in relation to a primary school context was an important aspect of this review. The results of this review suggest that many of the included assessments could be suitable for a primary school setting. The explicit physical literacy assessments (CAPL-2, PLAYfun, PFL) scored relatively high for feasibility, though PLAYfun required more qualified staff to administer the tool, suggesting that this tool may not be feasible for a generalist teacher. These explicit tools generally scored higher as a result of more comprehensive reporting of feasibility information within studies. This is likely because they have been designed with practitioners in mind, reflecting a growing demand for assessments within applied rather than research or clinical settings [66]. Both CAPL-2 and PFL assess affective, physical and cognitive elements of physical literacy but the assessment process can be lengthy in terms of time, with the assessment of large groups of children necessitating assessment activity to run across several classes. This indicates the feasibility challenges of using separate domain-level assessments of physical literacy to paint an overall "holistic" picture of a child's physical literacy.
Klingberg et al. [66] conducted a systematic review of the feasibility of motor skill assessments for preschool children and their findings revealed weak reporting of feasibility-related information. Similarly, we found that the quality of reporting of some aspects of feasibility information was lacking for many assessments. For example, a large number of affective and cognitive domain assessments did not report information on the training and qualifications required to administer and score the assessment, nor the time it would take for children to complete the assessment (see Table 8). Furthermore, across domains, only around a third of tools reported information on participant understanding of the assessments, which is particularly important if an assessment is to be used as assessment for learning, as feedback is a crucial part of the assessment process [195]. Affective and cognitive assessments were mostly questionnaires and therefore scored excellent for space and equipment required. Some of the physical assessments scored poorly for space requirements due to needing over 20 m of space for some aerobic or locomotor tasks (e.g. 20-m shuttle run in EUROFIT), which would not be possible indoors in a primary school within a UK context. Studies associated with assessment tools within the physical domain better reported the training and qualification skills required to administer assessments, though most tools rated as "fair" as they generally needed to be conducted by a PE/ sports specialist, or a researcher with additional qualifications. Typically, physical domain assessments using product-based scoring which focuses on quantifying the outcome of the movement (e.g. EUROFIT, MOBAK) scored slightly higher for feasibility in terms of expertise required than assessments that assessed the technical quality of the movement (e.g. TGMD-3). Although not included within the matrix, the equipment costs of many of the assessments should not be a barrier to assessment and could easily be met within primary school budgets. Many of the assessments are freely available, while the cost of the resources for physical assessments, which require sports equipment, is typically under $1000 (e.g. full equipment kits for MABC2 $976, TGMD-3 $300, YBT $260, respectively).
Feasibility findings suggest that there is insufficient attention given to reporting the expertise, confidence and competence of individuals required to administer assessments, particularly in assessments within the affective and cognitive domains. Therefore, an effective assessment would need to consider who would be conducting it to determine any potential training needed, ultimately, this would be an influential factor in the overall cost of the assessment. Edwards et al. [42,53] and Goss et al. [194] highlighted the need to support teachers with continuous professional development in order to ensure that pedagogical processes regarding assessment, teaching and learning were appropriate. Thus, assessments aimed towards educators should ensure that appropriate training and resources, designed at a level to be understood by generalist primary school teachers, should be offered. This could include written guidance for how to administer questionnaires, model videos of how to score physical competence assessments [52,194], and the creation of communities of practice to support the ongoing development of physical literacy assessment. While it may require additional resources to effectively prepare classroom teachers to administer assessments, enabling the teacher to conduct and interpret the results of a physical literacy assessment is particularly important as a classroom teacher will relate to and understand their pupils on a deeper level than that of a researcher [46].

Future Considerations in Physical Literacy Assessment
Goss et al. [194] recently examined stakeholder perceptions of physical literacy assessment in a qualitative study involving children, teachers, academics and practitioners. In the study, children themselves highlighted that assessment should be a fun and enjoyable experience. Participants across stakeholder groups indicated that being active, working with peers, providing optimal challenges, and positive teacher feedback would contribute to a fun assessment. Scholars have also argued that assessment in PE should be an enjoyable and motivating learning experience [195,196], particularly given, as noted above, the importance of enjoyment for autonomous motivation and meaningful experiences in PE [171][172][173]. Therefore, whatever measure/assessment is used, researchers and practitioners should monitor children's acceptability, satisfaction, and enjoyment of the assessment process. This is important as poor experiences of assessment could generate negative memories of PE, which could have implications for lifelong enjoyment and motivation for physical activity [197,198]. This review has identified a range of assessments of learning within physical literacy and related domains, yet it is unclear how these assessments help to support children's learning per se. Learning is a critical concept within physical literacy [1, 15,20,21,26] and many teachers and educators would argue that assessment should be a learning experience [194][195][196]. Future research should therefore explore the learning potential of physical literacy assessments, for example in developing children's knowledge and understanding of movement and physical activity concepts. Moreover, researchers could evidence how an assessment helps children to chart and reflect on their own physical literacy journey, setting goals and optimal, realistic challenges [48]. In relation, more evidence is needed concerning if and how results from physical literacy assessments are returned to learners, as well as if and how learners utilise this feedback. In order for an assessment to inspire learning and have educational impact, participants should feel empowered [195,199]. To achieve this, physical literacy assessment results could be discussed by teachers/researchers with each individual child and their parents, with constructive and encouraging feedback offered in terms of areas where the child is progressing well on their physical literacy journey and areas for development [39,194,195,200,201]. Therefore, assessment developers and manuals should include guidance on how to facilitate a meaningful discussion concerning progress with individual learners and key stakeholders. Future researchers could examine the subsequent implementation and effectiveness of these feedback guidelines by the assessment users.
Our findings suggest that there is scope for more research developing and examining rigorous qualitative methods of physical literacy assessment for use in primary school contexts. Such methods might include interviews, verbal discussions, pupil diaries, portfolios, photographs, video, text, drawing tasks and storytelling [42,48,202]. Given teacher time constraints [51,52], future studies could also explore the development of selfassessment and reflective strategies and the use of technology [194]. Self-assessment aligns with the personcentred philosophy of physical literacy [48] and has been found to promote self-regulated learning and selfefficacy [203]. Self-assessment could also provide an opportunity for children to evaluate and reflect on their progress and help to develop their self-awareness of meaningful experiences [202]; in turn, empowering children to take ownership of their relationship with physical activity [48,202]. Few of the assessments identified within our review utilised technology. Nevertheless, the importance and use of technology in PE assessment were highlighted within a recent position statement from the International Association for Physical Education in Higher Education (AIESEP) [204]. Technology has been successful within an assessment for the learning process that enhanced knowledge and understanding [205] and has been shown to provide an engaging and learning experience for students of all abilities [206]. Furthermore, technology can be used to support students to document their learning experiences and physical literacy journey through pictures and videos, which can be uploaded to mobile and web-based platforms and shared for discussion with wider stakeholders, including teachers and parents [52]. Thus, further research examining how technology can be used to support physical literacy assessment in PE is warranted.

Strengths and Limitations
The strengths of this systematic review include: (i) The use of wider search terms encompassing physical literacy elements identified 52 physical literacy or related affective, physical and cognitive assessments that can be used to inform assessment approaches in PE. (ii) An assessment of the methodological quality of included studies through the COSMIN risk of bias checklist enabled a robust, transparent and systematic appraisal of the validity and reliability standards of the identified quantitative assessments. (iii)The reporting of the feasibility of assessments provided pragmatic information that can be used by teachers, coaches and researchers to decide whether a tool is appropriate for use in PE and educational contexts.
The limitations of this systematic review include: (i) Only papers published in the English language were considered. Thus, the identified assessment tools were primarily derived from the US, the UK, Australia, Canada and Western Europe and relevant assessments developed within non-English language countries may have been missed. (ii) To be included in the review, articles had to be published in a peer-reviewed journal and written in the English language. Therefore, tools developed by practitioners and used currently within schools may not have been captured. (iii)Although we used "assessment" related search terms in our search strand, we did not capture any qualitative assessments of physical literacy. Had we used more specific qualitative methods as search terms (e.g. interviews, focus groups) then we might have captured more assessments better aligned with an idealist perception of assessment of physical literacy. (iv) The developed search strand did not include sportspecific search terms such as, "swimming", "dance" and "gymnastics". Inclusion of these terms may have better captured water-based assessments and tools assessing elements such as rhythm, coordination and expressive/aesthetic movement. (v) The physical literacy elements checklist reflects commonly identified elements and was developed by the research team through discussion in a closed meeting after an overview of international physical literacy literature was conducted [1, 20,26,[67][68][69]. Some elements identified within international definitions and various conceptualisations of the concept were not included in our checklist and therefore not checked for, but this should not diminish their respective importance. In addition, assessments of elements were categorised within physical, affective and cognitive domains in accordance with different definitions and conceptualisations of physical literacy in order to position assessments into familiar categories for assessment users [1, 2,6,16,20,26]. Arguably, many physical literacy elements and therefore assessments could span across different domains. For example, confidence is commonly classified within the affective domain within physical literacy conceptualisations, but confidence could also be classified within the cognitive domain as it is influenced by social-cognitive means [207]. Consequently, our checklist should not be taken as the definitive list of key elements within the concept. Researchers should check and appraise the tools for the elements in accordance with their stated definition of physical literacy. (vi) Each assessment tool was appraised for physical literacy elements in accordance with the explicit information provided within the associated studies and manuals. It is therefore possible that some tools may assess wider elements than those appraised within our results and this should be explored in future research.

Conclusions
There is demand amongst primary school children and wider stakeholders in England for assessments to chart progress in physical literacy [194]. This systematic review has identified three explicit physical literacy assessments and a number of assessments within affective and physical domains that could be used within a pragmatic physical literacy assessment approach. The review provides information that can help researchers and PE teachers understand what elements of physical literacy are being assessed and what elements are being missed. Our findings highlight that the methodological quality and reporting of measurement properties in the assessment literature require improvement. Furthermore, while many assessments are considered feasible within a school context, further empirical research is needed to consider the feasibility of the scoring and administration of assessment tools by teachers as opposed to researchers. Nevertheless, this review provides information that can be used by researchers and PE teachers to inform the selection or development of tools for the assessment of physical literacy within the 7-11.9-year-old age range.

Funding
All of the work included within this paper has been funded by Liverpool John Moores University. The funding body was not involved in the design of the study or collection, analysis and interpretation of data.

Availability of Data and Materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Declarations
Ethics Approval and Consent to Participate Not applicable.

Consent for Publication
Not applicable.