Current Approaches to the Use of Artificial Intelligence for Injury Risk Assessment and Performance Prediction in Team Sports: a Systematic Review

Claudino, João Gustavo; Capanema, Daniel de Oliveira; de Souza, Thiago Vieira; Serrão, Julio Cerca; Machado Pereira, Adriano C.; Nassis, George P.

doi:10.1186/s40798-019-0202-3

Sports Medicine - Open

Table 2 AI techniques or methods descriptions

From: Current Approaches to the Use of Artificial Intelligence for Injury Risk Assessment and Performance Prediction in Team Sports: a Systematic Review

AI	Description
Absolute shrinkage and selection operator	In statistics and machine learning, least absolute shrinkage and selection operator (LASSO) is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the statistical model it produces. It was originally introduced in the geophysics literature in 1986 but was independently rediscovered and popularized by Robert Tibshirani in 1996; he coined the term and provided further insights into the observed performance. LASSO was originally formulated for least squares models. This simple case provides a vast scenario regarding the behavior of the estimator, including its relationship to ridge regression, best subset selection, besides the connections between LASSO coefficient estimates and the so-called soft thresholding. It also reveals that (as standard linear regression) the coefficient estimates need not be unique if covariates are collinear. Although originally defined for least squares, LASSO regularization is easily extended to a wide variety of statistical models including generalized linear models, generalized estimating equations, proportional hazards models, and M-estimators, in a straightforward fashion.
Artificial neural network	An artificial neural network (ANN) is a computational model based on the structure and functions of biological neural networks. Information that flows through the network affects the structure of the ANN because a neural network changes—or learns, in a sense—based on inputs and outputs. ANNs are considered nonlinear statistical data modeling tools whereby the complex relationships between inputs and outputs are modeled or patterns are found.
Bayesian logistic	In statistics, the logistic model (or logit model) is a widely used statistical model that, in its basic form, uses a logistic function to model a binary dependent variable; many more complex extensions exist. In regression analysis, logistic regression (or logit regression) means estimating the parameters of a logistic model; it is a form of binomial regression. Logistic regression is a type of regression analysis used for predicting the outcome of a categorical (a variable that can take on a limited number of categories) dependent variable based on one or more predictor variables. The probabilities describing the possible outcome of a single trial are modeled, as a function of explanatory variables, using a logistic function. Logistic regression measures the relationship between a categorical dependent variable and usually a continuous independent variable (or several), by converting the dependent variable to probability scores. Logistic regression can be binomial or multinomial. In the multinomial type, there exist different models to perform the regression, such as the Bayesian method.
Bayesian networks	Bayesian networks are a type of probabilistic graphical model that uses Bayesian inference for probability computations. Bayesian networks aim to model conditional dependence and, therefore, causation, by representing conditional dependence by edges in a directed graph. Through these relationships, one can efficiently conduct the inference of the random variables in the graph by using factors.
Decision tree classifier	A decision tree classifier (DTC) is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements. Decision trees are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal, but are also a popular tool in machine learning. Decision tree learning uses a decision tree (as a predictive model) to go from observations about an item (represented in the branches) to conclusions about the item target value (represented in the leaves). It is one of the predictive modeling approaches used in statistics, data mining, and machine learning. Tree models whereby the target variable can take a discrete set of values are called classification trees; in these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels.
Fuzzy clustering	Fuzzy clustering is an alternative method to conventional or hard clustering algorithms, which partitions data containing similar subjects. Fuzzy clustering contrasts with hard clustering by its nonlinear nature and discipline of flexibility in grouping massive data. It provides more accurate and close-to-nature solutions for partitions and herein implies more possibilities of solutions for decision-making. In the specific matter of computation, fuzzy clustering has its roots in fuzzy logic and indicates the likelihood or degrees of one data point belonging to more than one group.
K-means clustering	K-means clustering is a type of unsupervised learning, used when there is unlabeled data (i.e., data without defined categories or groups). The goal of this algorithm is to find groups in the data, with the number of groups represented by the variable K. The algorithm works iteratively to assign each data point to one of the K groups based on the features provided. Data points are clustered based on feature similarity. The results of the K-means clustering algorithm are the centroids of the K clusters, which can be used to label new data, and labels for the training data (each data point is assigned to a single cluster). Rather than defining groups before looking at the data, clustering allows finding and analyzing the groups that have formed organically. Each centroid of a cluster is a collection of feature values which define the resulting groups. Examining the centroid feature weights can be used to qualitatively interpret what kind of group each cluster represents.
K-nearest neighbor	K-nearest neighbor (k-NN) is a type of instance-based learning, or lazy learning, in which the function is only approximated locally and all computation is deferred until classification. The k-NN algorithm is among the simplest of all machine learning algorithms. Both for classification and regression, a useful technique can be used to assign weight to the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. For example, a common weighting scheme consists in giving each neighbor a weight of 1/d, where d is the distance to the neighbor. The neighbors are taken from a set of objects for which the class (for k-NN classification) or the object property value (for k-NN regression) is known. This can be thought of as the training set for the algorithm, although no explicit training step is required. A peculiarity of the k-NN algorithm is that it is sensitive to the local structure of the data. The algorithm is not to be confused with k-means, another popular machine learning technique.
Markov process	A Markov process is a random process indexed by time, and with the property that the future isindependent of the past. The present. Markov processes, named for Andrei Markov, are among the most important of all random processes. In a sense, they are the stochastic analogs of differential equations and recurrence relations, which are certainly among the most important deterministic processes.
Support vector machine	Support vector machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given the labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples. In a two-dimensional space, this hyperplane is a line dividing a plane into two parts in which each class lies on either side. The learning of the hyperplane in linear SVM is done by transforming the problem using some linear algebra. This is where the kernel plays its role.
Support vector machine + decision tree classifier	The idea of combining SVM and DTC is to provide a hybrid approach, which attempts to embed SVM within a decision tree algorithm as a decision tree pre-pruning method and resulting into a more accurate and efficient hybrid classifier.

Back to article page