Empirical Prediction of Turnovers in NFL Football – PMC

    2.3. modeling and analysis

    The incidence of turnovers as a percentage of all plays from the line of scrimmage is very low, around 1.6%. for this reason, the distribution of class labels yi in a training set t = {(xi, yi), i = 1, …n} randomly sampled from the real population is highly skewed . learning the parameters of a useful statistical estimator of the rotation probability p^(xi)=p(yi=1|xi) suggests the use of specific learning techniques to avoid the trivial prediction of “no rotation” at each decision [12 ].

    To address this, the approach taken in this study was to rebalance the class distribution in the training set, overrepresenting the minority class distribution to present enough examples to the learning algorithm. During the validation of the models, highly representative examples of the true distribution within the population were used to assess the predictive power of the model when applied outside the sample.

    Reading: What is a turnover in football

    The modeling strategy included bootstrap resampling [13], cross-validation analysis, and receiver operating characteristic (roc) curve analysis [14]. the latter technique allowed for error estimation, model comparison, and selection from the large number of hypotheses generated by the gradient boosting machines during training. roc curves are often used to trade off false positive rate (fpr) and true positive rate (tpr) for classifier evaluation. In this study, false discovery rate (FDR) was substituted for FPR for analysis. fdr is the fraction of all positive decisions (ie predicted turnover) made by a model that are incorrect [15]. fdr is a more informative metric than fpr in diagnostic or predictive applications where confidence in a positive prediction is preferred, especially when the class distribution is skewed [16]. fdr is related to the positive predictive value statistic by ppv = 1 – fdr. high ppv values ​​(low fdr) are desirable. tpr denotes the sensitivity of the model, or the probability that actual billing events will be detected within a test distribution.

    See also: College football&039s 9 winningest teams |

    In the roc space (fdr,tpr), an optimal decision threshold dtopt for a given distribution is experimentally determined. our goal is to minimize fdr for tactical reasons. a second pass through the training data with this fixed threshold is used to train and evaluate the performance of the model. the gradient-powered model generates a probability p^; the rotation prediction algorithm is then [17]

    where y^(x)=1 means that a rotation will be observed given the input x.

    model learning for the aggregate sample used “bootstrap” [13] to repeatedly draw samples from the entire training set. the data was partitioned according to the game type segment under consideration, and a stratified sample was constructed for training. validation data were randomly sampled from the entire sample, according to the natural distribution of rotations. a two-step procedure was followed, for each of b = 100 bootstrap replicates. The first step estimated the detection threshold (dt) for optimal fdr and tpr through roc analysis, training gbms comprising 1500 trees (nominally). second, the threshold was held constant so that dt = dtopt and the entire sample was re-modelled.

    See also: AFC Women&039s Asian Cup 2022: Fixtures, results and group standings | Sporting News Australia

    The learning procedure used for team samples was theoretically similar, with slight differences in numerical mechanics. stratified sampling (with respect to class labels and) of individual teams produced unsustainably small sample counts. this required an alternative sampling strategy. the decision was made to use a 10-fold cross-validation, nested within a 10-trial bagging procedure. prediction rules were developed by finally averaging the performance results. therefore, the modeling included all available instances and benefited from the packing variance reduction properties for model performance estimation [18].

    The performance statistics fdr(dtopt), tpr(dtopt) were accumulated and finally averaged over replicates b (or trials/folds k) to estimate the generalization performance of the set of trees. Sampling distributions of the sample mean and standard error values ​​for fdr and tpr observed in the out-of-sample test were recorded for each sample and segment under investigation.

    In this research, we define a “good” false discovery rate as fdr < 0.15. in other words, a positive prediction made by the model (y^(x)=1) is correct at least 85% of the time to meet this model utility criterion. this means that when a turnover is predicted on the impending play from the line of scrimmage, a high degree of confidence can be associated with that prediction.

    A pseudocode summary of the model training and evaluation procedure appears in the appendix, as algorithm a1.

    See also: Olympic Figure Skating: Women&x27s Short Program Results – The New York Times

    Related articles



    Please enter your comment!
    Please enter your name here

    Share article

    Latest articles