NFL field goal attempts

Geometry and Stan

James Bland true
2025-09-20

It is currently AFL finals (playoffs) season, and I felt the urge to re-examine my set shots analysis I did a while back. Unfortunately, while the data exists for this and I imagine has vastly improved since my last look at it, one has to pay for it. So here I am on a Saturday afternoon, wanting to do some structural modelling of sports data as one does, and I’ve got my hands on some NFL field goal attempts data. This is going to look a lot like the attempt I did at this a few years ago, but now with some newer data and a hierarchical model.

By the way, if you’re interested in this kind of stuff, check out Andrew Gelman’s post on golf putting, which is what got me into this in the first place.

Modeling field goal attempts

The data

I got the data from NFLSavant, which keeps play-by-play data from 2013-2024. I used the 2024 dataset (the most recent). I filtered the Description variable for FIELD GOAL, and classified most of it into the following categories:

There were very few short and blocked instances. I decided that because of this there wasn’t enough information in the data to reasonably learn something about blocking and making the distance. Therefore I got rid of the short and blocked observations. What follows will therefore be just a model of getting the kicking angle right. I acknowledge that not modeling these probably makes the model less interesting and useful, but these are the data I have.

The geometry problem

Because of the very small number of misses due to not making the distance, what I am left with is needing to model getting the angle right, or right enough. Fortunately, there is a fairly straightforward relationship between the distance of the field goal attempt, and the angle a player needs to kick through. That is, this angle gets smaller for longer attempts. Here is what the problem looks like for a 30-yard attempt:

The distance between the uprights is \(\frac{18.5}{3}\approx 6.17\) yards, and so for an attempt \(x\) yards from the goal, the angle \(\theta\) is:

\[ \begin{aligned} \tan\left(\frac{\theta}{2}\right)&=\frac{18.5}{3\times 2\times x}\\ \frac{\theta}{2}&=\tan^{-1}\left(\frac{18.5}{3\times 2\times x}\right) \end{aligned} \]

So the angle of tolerance that a kicker faces looks like this:

The statistical problem

Now we need to turn this geometry model into a statistical model. Suppose that the kick angle is distributed according to a normal distribution with standard deviation \(\sigma\), and centered on a straight kick. Then the probability of a successful field goal attempt will be:

\[ \Pr(\mathrm{good}\mid \mathrm{distance}=x)=\Phi\left(\frac{\theta(x)}{2\sigma}\right)-\Phi\left(-\frac{\theta(x)}{2\sigma}\right)=2\Phi\left(\frac{\theta(x)}{2\sigma}\right)-1 \]

where \(\Phi(\cdot)\) is the standard normal cdf. This is enough to write down a likelihood, but I wanted something a bit more flexible, so I went with a Student’s \(t\)-distribution instead.1 Therefore letting \(F_t(x\mid \nu,\sigma)\) be the cdf of the \(t\) distribution centered on zero with \(\nu\) degrees of freedom and spread parameter \(\sigma\), I will use: \[ \Pr(\mathrm{good}\mid \mathrm{distance}=x)=2F_t\left(\frac{\theta(x)}{2}\mid \nu,\sigma\right)-1 \]

So this will define my likelihood.

As I will be using Bayesian estimation for this, I also need to choose priors for \(\sigma\) and \(\nu\). For the degrees of freedom parameter \(\nu\) I went with:

\[ \log\nu\sim N(\log(15),1^2) \] I chose this because the Student-\(t\) distribution starts to look a lot like a normal distribution at about \(\nu>30\), so pinning down the median of the prior distribution to half this seemed to make sense.

For \(\sigma\), I explored a few prior choices (holding the prior for \(\nu\) constant) and their implications for the model’s predictions using the following Stan program, which I also use for estimation:


data {
  int<lower=0> N;
  vector[N] dist;
  int good[N];
  
  vector[2] prior_sigma;
  vector[2] prior_nu;
  
  int<lower=0,upper=1> UseData;
  
}

transformed data {
  
  
  real goalpost_width = 18.5/3.0; // in yards
  
  vector[N] angle = atan(goalpost_width/(2.0*dist));
  
  
}

parameters {
  real<lower=0> sigma;
  real<lower=0> nu;
  
}

model {
  
  if (UseData==1) {
  
    vector[N] pr_angle;
  
    for (ii in 1:N) {
      pr_angle[ii] = 2.0*student_t_cdf(angle[ii], nu, 0.0, sigma)-1.0;
    }
  
    target += bernoulli_lpmf(good | pr_angle);
  }
  
  target += lognormal_lpdf(sigma | prior_sigma[1],prior_sigma[2]);
  target += lognormal_lpdf(nu | prior_nu[1],prior_nu[2]);
  
  
}

generated quantities {
  
  // predicted probability
  
  vector[100] pr;
  for (ii in 1:100) {
    
    pr[ii] = 2.0*student_t_cdf(atan(goalpost_width/(2.0*ii)), nu, 0.0, sigma)-1.0;
    
  }
  
}

Here, I can turn off the likelihood contribution by setting UseData=0. Then, the predictions pr in the generated quantities block are draws from the prior. Eventually, I settled on:

\[ \log\sigma\sim N(-2,1) \]

which generates predictions like this:

prior_predictive <-read.csv("individual_prior_predictive.csv")

(
  ggplot(prior_predictive,aes(x=distance))
  +geom_line(aes(y=X50.,linetype="median"),linewidth=1)
  +geom_line(aes(y=mean,linetype="mean"),linewidth=1)
  +geom_ribbon(aes(ymin = X25.,ymax=X75.),alpha=0.5)
  +geom_ribbon(aes(ymin = X2.5.,ymax=X97.5.),alpha=0.5)
  +theme_bw()
  +xlab("Distance to goal (yards)")
  +ylab("Probability of success")
)

Here the shaded regions show 50% and 95% Bayesian prior credible regions. This looked about right to me (without looking too much at the data), so I settled on these priors.

Estimates

Pooled estimation

First, I estimate a pooled model that estimates one \(\nu\) and \(\sigma\) for all of the data. Here are the estimates:

Fit<-"individual.csv" |>
  read.csv() |>
  mutate(
    distance = par |> parse_number()
  )

Fit |> 
  filter(is.na(distance)) |>
  select(-distance,-X) |>
  kbl(digits=3) |>
  kable_classic(full_width=FALSE) |>
  add_header_above(c("","","","","percentiles"=5,"",""))
percentiles
par mean se_mean sd X2.5. X25. X50. X75. X97.5. n_eff Rhat
sigma 0.048 0.000 0.002 0.044 0.047 0.048 0.050 0.052 1022.299 1.003
nu 35.523 0.946 34.647 7.539 15.194 24.698 43.019 124.870 1342.192 1.002
lp__ -389.363 0.027 0.940 -391.875 -389.731 -389.096 -388.692 -388.425 1253.353 1.002

But they are quite hard to interpret, so let’s translate these into predictions:

d<-"cleaned_data.csv" |>
  read.csv()
d_averages<-d |>
  group_by(Distance) |>
  summarize(
    good = mean(Outcome=="GOOD"),
    n=n()
  )
(
  ggplot(Fit |> filter(!is.na(distance)),aes(x=distance))
  +geom_point(data=d_averages,aes(x=Distance,y=good,size=n),alpha=0.5)
  +geom_line(aes(y=mean),color="red")
  +geom_ribbon(aes(ymin = X25.,ymax=X75.),alpha=0.2,fill="red")
  +geom_ribbon(aes(ymin = X2.5.,ymax=X97.5.),alpha=0.2,fill="red")
  +geom_smooth(data=d,aes(x=Distance,y=1*(Outcome=="GOOD")),
               method = "glm", 
               method.args = list(family = "binomial"))
  +theme_bw()
  +xlab("Distance to goal (yards)")
  +ylab("Probability of success")
  +ylim(c(0,1))
)

Here the red stuff is from the model, showing the mean (line) and 50% and 95% credible regions. The data are shown as black dots, and I put the logit line of best fit (blue) through there as well. While I don’t really believe that there is such a high probability of success for an attempt from (say) 75 yards, there isn’t any data out there to inform us about it. If there were, I would suspect that there would be a lot more misses due to not making the distance, and I would have been able to explicitly include them in my model.

Hierarchical estimation

Of course, treating all observations as equal is not necessarily a good idea, and especially in this case where the data are generated by different kickers. I therefore also implemented a hierarchical model. Here I let every kicker’s \(\sigma\) and \(\nu\) be draws from a (transformed) multivariate normal distribution. In the generated quantities block, I calculate the prediction for each player’s probability of making a 40-yard field goal (as there were a lot of data around that distance). Here is the Stan program I wrote to estimate this model:


data {
  int<lower=0> N;
  int nkickers;
  int id[N];
  
  
  vector[N] dist;
  int good[N];
  
  vector[2] prior_MU[2];
  vector[2] prior_TAU;
  real prior_Omega;
  
  
  
}

transformed data {
  
  
  real goalpost_width = 18.5/3.0; // in yards
  
  vector[N] angle = atan(goalpost_width/(2.0*dist));
  
  
}

parameters {
  vector[2] MU;
  vector<lower=0.0>[2] TAU;
  cholesky_factor_corr[2] L_Omega;
  
  matrix[2,nkickers] z;
  
}

transformed parameters {
  
  vector[nkickers] sigma;
  vector[nkickers] nu;
  
  {
    matrix[2,nkickers]  theta = rep_matrix(MU,nkickers)
                    +diag_pre_multiply(TAU,L_Omega)*z;
    sigma = exp(theta[1,]');
    nu = exp(theta[2,]');
  }
  
}

model {
  
  
  vector[N] pr_angle;
  
  for (ii in 1:N) {
    pr_angle[ii] = 2.0*student_t_cdf(angle[ii], nu[id[ii]], 0.0, sigma[id[ii]])-1.0;
  }
  
  target += bernoulli_lpmf(good | pr_angle);
  
  for (pp in 1:2) {
    target += normal_lpdf(MU[pp] | prior_MU[pp][1],prior_MU[pp][2]);
    target += cauchy_lpdf(TAU[pp] | 0.0, prior_TAU[pp]);
  }
  target += lkj_corr_cholesky_lpdf(L_Omega| prior_Omega);
  target += std_normal_lpdf(to_vector(z));
  
  
  
}

generated quantities {
  
  // predicted probability
  
  vector[nkickers] pr40;
  for (ii in 1:nkickers) {
    pr40[ii] = 2.0*student_t_cdf(atan(goalpost_width/(2.0*40.0)), nu[ii], 0.0, sigma[ii])-1.0;
  }
  
}

Here are the shrinkage estimates for the probability of making a 40-yard field goal for all 51 players in the dataset:

pr40<-"hierarchical.csv" |>
  read.csv() |>
  filter(grepl("pr40",par)) |>
  mutate(
    id = par |> str_replace("pr40","") |> parse_number()
  ) |>
  full_join(
    "idlookup.csv" |> read.csv(),
    by="id"
  ) |>
  mutate(
    cumul = rank(mean)/n()-0.5/n()
  )

(
  ggplot(pr40,aes(x=mean))
  +stat_ecdf()
  +geom_errorbar(aes(y=cumul,xmin=X25.,xmax=X75.),alpha=0.5)
  +theme_bw()
  +xlab("Probability of a successful 40-yard attempt")
  +ylab("Cumulative density of posterior mean")
)

And here is a “top 10” list of field goal kickers based on the hierarchical model, ranked by their predicted probability of making a 40-yard filed goal. Here raw.mean lists each player’s fraction of successful attempts (from any distance). Note that this ranking is not simply a ranking of the raw means.

pr40 |> 
  arrange(-mean) |>
  select(OffenseTeam,Kicker,raw.mean,mean,sd,n) |>
  rename(attempts = n) |>
  head(n=10) |>
  kbl(digits=3) |>
  kable_classic(full_width=FALSE) |> 
  add_header_above(c("","","","estimates"=2,""))
estimates
OffenseTeam Kicker raw.mean mean sd attempts
PIT 9-C.BOSWELL 0.953 0.929 0.034 43
CHI 8-C.SANTOS 1.000 0.927 0.041 21
SEA 5-J.MYERS 0.963 0.924 0.038 27
WAS 3-A.SEIBERT 1.000 0.922 0.042 27
DEN 3-W.LUTZ 0.939 0.914 0.038 33
LAC 11-C.DICKER 0.932 0.913 0.035 44
TEN 6-N.FOLK 0.955 0.911 0.041 22
DAL 17-B.AUBREY 0.909 0.910 0.035 44
TB 4-C.MCLAUGHLIN 0.941 0.910 0.039 34
DET 39-J.BATES 0.931 0.904 0.041 29

In case you’re interested, here is the bottom ten list as well:

pr40 |> 
  arrange(mean) |>
  select(OffenseTeam,Kicker,raw.mean,mean,sd,n) |>
  rename(attempts = n) |>
  head(n=10) |>
  kbl(digits=3) |>
  kable_classic(full_width=FALSE) |> 
  add_header_above(c("","","","estimates"=2,""))
estimates
OffenseTeam Kicker raw.mean mean sd attempts
NYJ 9-G.ZUERLEIN 0.643 0.794 0.070 14
CLE 7-D.HOPKINS 0.692 0.798 0.060 26
GB 44-B.NARVESON 0.706 0.813 0.061 17
CIN 3-C.YORK 0.818 0.816 0.068 11
WAS 3-C.YORK 0.000 0.816 0.068 2
SF 4-J.MOODY 0.750 0.821 0.054 32
CIN 2-E.MCPHERSON 0.727 0.831 0.056 22
ATL 6-Y.KOO 0.781 0.833 0.050 32
BAL 9-J.TUCKER 0.750 0.837 0.051 32
NE 13-J.SLYE 0.839 0.841 0.050 31

Showing my working

library(tidyverse)
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

dFG<-"https://nflsavant.com/pbp_data.php?year=2024" |>
  read.csv() |>
  filter(grepl("FIELD GOAL",Description)) |>
  mutate(
    Kicker = str_split_i(Description," ",2),
    Outcome = ifelse(grepl("IS GOOD",Description),"GOOD",ifelse(grepl("IS BLOCKED",Description),"BLOCKED",str_split_i(Description,", ",2))),
    Distance = str_split_i(Description," ",3) |> parse_number()
  ) |>
  select(
    GameId,GameDate,Description, OffenseTeam,Kicker,Outcome,YardLine,Distance
  ) |>
  filter(!is.na(Distance)) |>
  mutate(
    Outcome = ifelse(Outcome=="HIT LEFT UPRIGHT","WIDE LEFT",Outcome),
    Outcome = ifelse(Outcome=="HIT RIGHT UPRIGHT","WIDE RIGHT",Outcome),
    Outcome = ifelse(Outcome=="HIT CROSSBAR","SHORT",Outcome)
  ) |>
  filter(Outcome=="GOOD" | Outcome=="SHORT" | Outcome=="WIDE LEFT" | Outcome=="WIDE RIGHT" | Outcome=="BLOCKED")


d<-dFG |>
  filter(Outcome!="SHORT" & Outcome!="BLOCKED") |>
  mutate(
    id = Kicker |> as.factor() |> as.integer()
  )

d |> 
  write.csv("_posts/2025-09-20-nfl-field-goal-attempts/cleaned_data.csv")

idlookup<-d |> 
  group_by(OffenseTeam,Kicker,id) |>
  summarize(
    n = n(),
    `raw mean` = mean(Outcome=="GOOD"),
    `av dist` = mean(Distance)
  )

idlookup |> 
  write.csv("_posts/2025-09-20-nfl-field-goal-attempts/idlookup.csv")

# prior predictive check

individual<-"_posts/2025-09-20-nfl-field-goal-attempts/individual.stan" |>
  stan_model()

dStan<-list(
  N = dim(d)[1],
  dist = d$Distance,
  good = d$Outcome=="GOOD",
  prior_sigma = c(-2,1),
  prior_nu = c(log(15),1),
  UseData=0
)

Fit<-individual |>
  sampling(
    data=dStan,
    seed=42
  )

prior_predictive<-summary(Fit)$summary |>
  data.frame() |>
  rownames_to_column(var = "par") |>
  mutate(
    distance = par |> parse_number()
  ) |>
  filter(!is.na(distance))

prior_predictive |>
  write.csv("_posts/2025-09-20-nfl-field-goal-attempts/individual_prior_predictive.csv")

(
  ggplot(prior_predictive,aes(x=distance))
  +geom_line(aes(y=X50.,linetype="median"),linewidth=1)
  +geom_line(aes(y=mean,linetype="mean"),linewidth=1)
  +geom_ribbon(aes(ymin = X25.,ymax=X75.),alpha=0.5)
  +geom_ribbon(aes(ymin = X2.5.,ymax=X97.5.),alpha=0.5)
  +theme_bw()
  +xlab("Distance to goal")
  +ylab("Probability of success")
)

#-------------------------------------------------------------------------------
# Pooled model
#-------------------------------------------------------------------------------

dStan<-list(
  N = dim(d)[1],
  dist = d$Distance,
  good = d$Outcome=="GOOD",
  prior_sigma = c(-2,1),
  prior_nu = c(log(15),1),
  UseData=1
)

Fit<-individual |>
  sampling(
    data=dStan,
    seed=42
  )

summary(Fit)$summary |> 
  data.frame() |>
  rownames_to_column(var="par") |>
  write.csv("_posts/2025-09-20-nfl-field-goal-attempts/individual.csv")


fit<-summary(Fit)$summary |>
  data.frame() |>
  rownames_to_column(var = "par") |>
  mutate(
    distance = par |> parse_number()
  ) |>
  filter(!is.na(distance))

d_averages<-d |>
  group_by(Distance) |>
  summarize(
    good = mean(Outcome=="GOOD"),
    n=n()
  )


(
  ggplot(fit,aes(x=distance))
  +geom_point(data=d_averages,aes(x=Distance,y=good,size=n),alpha=0.5)
  +geom_line(aes(y=mean),linewidth=1,color="red")
  +geom_ribbon(aes(ymin = X25.,ymax=X75.),alpha=0.2,fill="red")
  +geom_ribbon(aes(ymin = X2.5.,ymax=X97.5.),alpha=0.2,fill="red")
  +geom_smooth(data=d,aes(x=Distance,y=1*(Outcome=="GOOD")),
               method = "glm", 
               method.args = list(family = "binomial"))
  +theme_bw()
  +xlab("Distance to goal (yards)")
  +ylab("Probability of success")
  +ylim(c(0,1))
)


file<-"_posts/2025-09-20-nfl-field-goal-attempts/hierarchical.csv"

if (!file.exists(file)) {

  hierarchical<-"_posts/2025-09-20-nfl-field-goal-attempts/hierarchical.stan" |>
    stan_model()
  
  
  dStan<-list(
    N = dim(d)[1],
    nkickers = d$id |> unique() |> length(),
    id = d$id,
    dist = d$Distance,
    good = d$Outcome=="GOOD",
    
    prior_MU = list(
      c(-2,1),
      c(log(15),1)
    ),
    prior_TAU = c(1,1),
    prior_Omega = 4
  )
  
  Fit<-hierarchical |>
    sampling(
      data=dStan,
      control = list(adapt_delta = 0.99),
      seed=42,
      par = "z",include=FALSE
      )
  
  summary(Fit)$summary |>
    data.frame() |>
    rownames_to_column(var = "par") |>
    write.csv(file)
  

  
}

  1. This turned out to be a really important form of flexibility when I was modeling set shots in Australian football, where both the geometry is more complicated and the data are richer, but it didn’t really produce anything interesting here↩︎