Performance Analysis Practices in Team Sports
Like any high-performance environment, successful performance in elite-level sport requires skilled functionality, such as working out ways to offload the ball in Rugby League [19], or ways of serving to various regions of a tennis court to exploit opponent positioning [20]. These sports-specific functional components are often referred to as ‘technical skills’ [21] and are typically captured by performance analysts to help coaches understand various aspects of game play as it unfolds. For example, capturing and analysing information related to how a player obtains and then disposes of the ball in Australian football (AF) can assist coaches with the design of training activities intended to promote the development of offensive behaviour [13, 22]. Further, application of similar notational analyses at a team level could lead to information that resolves collective behaviour—manifest in styles or common patterns of play—which can be modelled relative to outcomes like match success. This is noted in the work of Lago-Peñas et al. [23], who identified five factors (i.e. groups of performance indicators) that explained various styles of play across an elite soccer competition, information which they argued could be strategically used by coaches to counter an opposition. As we now go onto discuss, an integral component of the analysis used by Lago-Peñas et al. [23] was data reduction and clustering—whereby large multidimensional datasets were reduced to factors and clustered based on their similarity, allowing practitioners to make decisions with reference to a select few (important) variables [8, 24, 25].
Data Reduction and Clustering
While sports technology has unquestionably assisted performance analysts [3], it has resulted in a large quantity of data to be filtered, analysed, and reported in actionable ways [15, 26]. This has likely led to uncertainty with regard to variable selection—defined as which variables (or groups of variables) are important in supporting practitioners in making decisions guided by sports performance data [15, 27]. In light of this, performance analysts have sought to apply various data reduction techniques—common to other quantitative disciplines [27,28,29]—to hone in on (combinations of) performance indicators most important for explaining an outcome of interest [15, 16, 30]. In its broadest sense, data reduction is a process by which large—often multidimensional—datasets can be reduced into smaller, more manageable sets, while ensuring the integrity of the data is not compromised [26]. In high-performance sport where the quantity of data is expanding given the automation of various sports technologies, such reduction techniques can be vitally important.
While there are a variety of data reduction techniques, two of the more common seen in team sports, like Rugby League, are principal component analysis and multidimensional scaling [25, 31, 32]. Both principal component analysis and multidimensional scaling produce a series of factors which represent groups of similar variables [33,34,35]. These techniques differ, though, with respect to the processes involved with the creation of these factors. For example, principal component analysis resolves linear, uncorrelated sets of variable combinations—achieved by resolving the eigenvalue, a scaling factor which determines the magnitude and number of principal components (factors) to be used [26, 33, 36]. Conversely, multidimensional scaling relies on nonparametric regression to determine a dissimilarity ranking matrix to produce a series of dimensions, iteratively searching for least squares fit based on the rank order of the dissimilarities [25, 34, 37]. The rank order of dissimilarities and subsequent factors obtained via principal component analysis can then be used to explain various aspects of performance, such as what performance indicators are important for winning a match of Rugby League [8, 31, 38].
But how (or why) might we choose to use one technique over another? The key characteristics in each of these analyses are important to consider prior to selecting and utilising one over the other. To exemplify, as principal component analysis assumes a linear relationship within the data and the latent variables represented as factors, applying this technique to a nonlinear dataset may struggle to appropriately represent the distance measures between factors. Multidimensional scaling, on the other hand, assumes nonlinearity and strives only to optimise the fit between the dissimilarity of objects and the rank order of dissimilarities. Thus, understanding dataset properties is an important initial step in determining which technique is most appropriate in reducing its multidimensionality for sports performance analysts.
The use of these data reduction techniques has grown within Rugby League research. Notably, Woods et al. [25] highlighted the utility of multidimensional scaling for explaining the evolution of game play within the Australian National Rugby League over an 11-year period. These authors reduced a multidimensional dataset (dataset containing multiple different variables), visualising the ranked dissimilarities to show how the game evolved in a ‘follow-the-leader’ type manner (whereby the competition leaders evoke a successful style of play which other teams try to emulate in order to similarly succeed), postulating how coaches could use such insights to develop innovative styles or principles of play ‘beyond their time’. Comparatively, Parmar et al. [8] highlighted the utility of principal component analysis for the analysis of team performance in the European Super League. These authors identified that ‘making quick ground’, ‘quick play’, and ‘amount of possession’ were the most important factors for explaining match outcome [8]. Similarly, Wedding et al. [38] explored the use of principal component analysis for team performance analysis in the National Rugby League, identifying nine factors (six attacking, two defensive, and one contested) which could explain team playing styles relative to season and end of season rank—uncovering important characteristics for consideration in the design and implementation of game planning. Research in other sports such as soccer [23, 24], basketball [37, 39], and AF [40] has further exemplified the use of principal component and multidimensional scaling in identifying the performance characteristics most explanatory of team performance variance and playing style over varying time periods. Each of these studies demonstrates the value of data reduction in making actionably smaller subsets of data that maintains their underlying integrity. A further example of the utility of such a technique for servicing operational practices in Rugby League will be presented in the first Case Example, which is discussed in the second part of this review.
Clustering is another data reduction technique that is growing in popularity in sports performance analytics [41, 42]. A specific clustering technique discussed here is two-step clustering—a technique which reveals ‘natural’ clusters (or groupings) within a dataset using log-likelihood distance measures [5, 41, 43]. The utility of clustering for explaining phenomena in sport, like match outcome, has been exemplified by Gomez et al. [44] who grouped the performance of wheelchair basketball teams based on different match types (defined through score lines of ‘unbalanced’ or ‘balanced’). In being able to successfully cluster teams according to score lines, these authors demonstrated the use of this technique for reducing and visualising data into meaningful groups, which they argued was information important in supporting coaches to design game and practice strategies [44]. Further, Zhang et al. [5] utilised two-step clustering to identify five different player profiles of professional basketballers using anthropomorphic, technical, and physical variables—thereby supporting recruitment and talent selection. As an important aside, this study demonstrated the use of two-step clustering for handling data of variable properties (i.e. categorical and continuous), which is particularly critical for high-performance sport given the diverse sources of data often available to performance analysts [41, 45]. The use of two-step clustering for examining positional performance in Rugby League has been exemplified by Wedding et al. [32], who identified six positional groups (as compared to four a priori)—enabling the establishment of player performance profiles for performance assessment, player development, and recruitment.
Whilst only a snapshot of the available work, these studies do highlight the benefit of various data reduction and clustering techniques for sports performance analysts in high-performance environments. Nonetheless, to further guide developing performance analysts in adopting these data reduction techniques, the second part of this narrative review weaves in a case example demonstrating their use in practice. Before this, however, we next explore the use of decision support analysis (specifically decision trees) for sports performance analysts—showing how such a technique can support coaches and other practitioners in understanding the (nonlinear) interaction between variables, and how these interactions relate with various outcomes of practical interest.
Decision Support Analysis
Indeed, data reduction and clustering analyses are some of many increasingly adopted methods for understanding what ‘successful’ performances look like in high-performance sport [5, 8, 24]. However, to support coaches in modifying targeted features of a game style to increase the probability of attaining a successful outcome, decision support analyses can be useful. Broadly, decision support analysis can support a practitioner by sifting through large quantities of data to identify underlying interactions and their conditional control statements, with this information being used to ascertain the probabilities of certain outcomes occurring [17, 46, 47]. The probabilities of these outcomes occurring can be visually represented in various forms, like decision trees, which can be easily interpreted and presented to coaching staff [31, 48]—guiding, challenging, or informing decision making [31, 49].
A growing decision support analysis in sports performance analytics are decision trees [15, 50, 51]. As the name implies, decision trees are models of decisions grown from a root or parent node, which iteratively grow branches that visualise the interaction between key variables and their conditional statements, explaining the probability of a certain outcome [51]. There are two primary types of decision trees: classification and regression [52,53,54]. Whilst there are some similarities between them (namely that neither require data normalisation), there are some key differences related to how the data are differentiated, grown or split during the analysis [52, 54]. Specifically, these differences relate to the underlying growth algorithm of the tree [52,53,54], meaning that while decision trees can be a useful tool for analysts given their capability to visualise complex, nonlinear interactions between variables, it is important to understand the appropriateness of types based upon the question asked and data used to grow the model [51, 52]. For example, if wanting to explain a binary variable of interest (i.e. win or loss/home or away), a CART (classification and regression tree) method may be appropriate. Fernandes and colleagues [48] exemplified the use of CART as a method for explaining the likelihood of a passing or rushing play occurring at any point during a National Football League game. On the other hand, if seeking to explain a non-binary outcome, a CHAID (chi-squared automatic interaction detection) algorithm may be appropriate given that it utilises multi-way splits, which could be used to identify multiple styles or phases of play [31]. Not only are the number of splits that may occur from any given node different depending on which model is chosen, but so too is the way in which the model decides how to make these splits and when it decides to stop splitting [51,52,53]. Thus, understanding which tree to use is an important initial step for sports performance analysts—being implicated by the question seeking to be answered and the data used to answer it.
In team sports, decision trees have shown capability to explain complex interactions of performance indicators that contribute to match outcome in Australian football [9, 55], Rugby League [31, 45, 56], basketball [57, 58], and soccer [49]. Further, decision trees have been used to identify performance gaps between competition levels, with such information being critical to support talent development in sports like Rugby League [56, 59, 60]. Beyond team performance, decision support analysis has been used to explain player and playing position behaviours within team sports [5, 42, 61], with Morgan et al. [62] highlighting that attackers held a distinct advantage in one-on-one situations in hockey when moving at velocities ≥ 0.5 m s−1. However, in instances where the initial speed differential between attackers and defenders was small (< 0.5 m s−1), the attackers’ probability of winning the encounter could improve if defenders held a lateral speed > 1.4 m.s−1 [62]. This level of detail clearly supports practitioners and athletes in the design of practice tasks and establishment of various strategies intended to exploit opponents and gain a competitive advantage when coupled with their experiential knowledge. Thus, decision support analyses, like decision trees, are useful in high-performance sport, particularly regarding the identification of team performance indicators and their conditional control statements that lead to increased chances of attaining match success [9, 49, 57].
Successful application of these techniques could offer practitioners another way of analysing and visualising various interactions of key variables during a match—further supporting decisions around training and game-planning strategies. The case example detailed in the second part of this review exemplifies the practical utility of decision support analysis for the resolution of important team playing styles relative to playing at home or away within Rugby League. Prior to this, though, we next explore the use of logistic regression for sports performance analysts—highlighting how this technique could be implemented as another method to support coaches in understanding interactions that could exist within the various training and match data.
Logistic Regression
So far, this review has examined the efficacy of data reduction, clustering, and decision support analysis for the exploration of important technical and tactical characteristics in high-performance sport. Logistic regression is a technique used to exclusively model the probability of a dichotomous event (e.g. win or loss) occurring whilst accounting for one or more independent variables that influence the event [8, 58, 63]. There are many benefits of implementing this analytical technique, one being that it is able to provide magnitude (both size and direction) of the relationship for each of the given independent variables modelled [63]. Further, logistic regression has the ability to handle both continuous (e.g. height, speed, time) and categorical (e.g. win or loss and home or away) independent variables, enabling the integration of larger, diverse datasets, which is common in elite-level sport [63]. However, like many of the other methods described in this review, it does require nuanced interpretation. Additionally, logistic regression models are preferable to use with large datasets, as this reduces the likelihood of modelling error through overfitting [63].
Demonstrating its utility in high-performance sport, Gollan et al. [64] modelled the interactions between different playing styles and match contexts (match location, opposition quality, and combined effects of both) in the English Premier League. The authors identified that irrespective of match location (home or away), teams were more likely to demonstrate an established offence and set pieces when they encountered weaker opposition [64]. Conversely, weaker opposition were less likely to play this same style when competing against their stronger counterparts—emphasising the importance of understanding the tendencies of opposing teams, such that effective game plans can be designed to counter them [64]. Similarly, Parmar et al. [8] highlighted the ability of logistic regression to model the probability of team success within Rugby League using performance indicators clustered via principal component analysis. Their results noted a 91% probability of winning if a team was able to outperform their opponent in a series of grouped performance indicators. Practically, presenting such information to coaches could support the development of match strategies that attempt to exploit the styles of play most likely leading to a win. Interestingly, logistic regression has also been used to guide training planning and periodisation by modelling the difficulty of teams’ playing schedule across the course of a competitive season in rugby union [65, 66], while Woods et al. [67] demonstrated its utility for talent identification in junior Australian football—modelling the relationship between performance in various skill tests and team association. Thus, collectively, such work demonstrates the diverse use of logistic regression in the sports performance analysis literature—ranging from modelling styles of play, supporting the planning and periodisation of practice, to assisting with talent identification, while in different sports, each of these themes are important in professional Rugby League and are topics that a developing sports performance analyst can assist with. In reference to this, the next section of this paper exemplifies each of these techniques, who have been applied to key questions in Rugby League. Thus, it is hoped that these examples can offer aspiring and developing performance analysts working in Rugby League (or other sports) guidance when seeking to resolve similar questions and analyses.