why am I only finding out about this now?
Over the past week, I have been going down a rather unusual rabbit hole. Well, for me at least. It all started when I had to read the job market paper of an applicant to a TT position we have in the economics department at UToledo. The paper used the survey of professional forecasters, and in particular the individual forecasts available here. I then decided to have a go at modeling these data for myself. After posting this (very drafty) working paper on BlueSky, Tony Corke (@matterofstats.bsky.social) asked ChatGPT to read it and adapt my model for Australian Football margin tipping. In a subsequent email exchange, it became clear to me that footy tipping could be, and is, much more nerdy than just picking winners or predicting winning margins. Specifically, there is the Monash probabilistic footy tipping competition, where entrants have to submit a probability, rather than just a point prediction, about which team will win.
Forecasters are rewarded for accuracy as follows. Suppose that you predict that the home team will win with probability \(p\), then your score for that game will be:
\[ \begin{aligned} \mathrm{score}(p)&=\begin{cases} 1+\log_2(p) &\text{if the home team wins}\\ 1+\log_2(1-p) & \text{if the home team loses}\\ 1+0.5\log_2(p(1-p)) & \text{if there is a draw} \end{cases} \end{aligned} \]
Therefore, you are rewarded for assigning large probabilities to events that happen, and penalized for assigning large probabilities to events that don’t happen. The payoff function is substantially asymmetric. That is, if you correctly forecast the winner with \(p=1\) then you get a payoff of one, but if you get this wrong your payoff goes to negative infinity.
A couple of things occurred to me about this payoff function and the competition in general:
It is not obvious that you want to truthfully report your belief that a team will win. Especially when you are fairly certain (i.e. your belief is close to zero or one) that a team will win, then being wrong is severely penalized. Therefore, I will have to put in a lot of work into deciding which probabilities to enter, because they shouldn’t be my belief that a team will win.
Especially at the end of the competition, reported probabilities should be a function of tournament rank. That is, if I am doing very well (I suspect that this is not going to be the case), then I might want to report probabilities that preserve my rank in the tournament. For example, reporting \(p=0.5\) guarantees a payoff of \(1+\log_2(0.5)=0\), which could be enough to stay ahead of those below me. On the other hand, in the more likely scenario that I am doing badly, choosing a high-variance option will increase the chance of me rising in the rankings.
My approach to this is going to be to let my computer do as much of the decision-making as I can get it to do. In particular, I want to develop a model that (1) predicts win probabilities, and then (2) maps these into what I should report to the competition. At this point since the season has not started yet, I decided to make a start by doing part (1) of these using data from the previous season. This is really a prior calibration exercise, which is exactly what a Bayesian needs to make decisions.
First, I tried to implement my latent AR(1) process model. This would have latent team strength modeled as a time series process, with each round being a time period. Unfortunately, there is not enough information contained in the win/loss data to really learn about this process effectively. Next, I tried a team random effects model, where each team just has one parameter for the whole season describing their strength. This one worked really well, and it is what I ended up using for the prior calibration. Finally, I tried random effects with random round trends. It turns out there’s not enough information in the win/loss data to estimate these reasonably, either. So I am left with the team random effects model. Here, I assume that each team \(i\) has one parameter, say \(R_i\), representing their strength, and then I model team \(i\)’s win over team \(j\) as:
\[ \begin{aligned} \mathrm{win}_{i,j}&=\mathrm{BernoulliLogit}(R_i-R_j) \end{aligned} \]
Here is the Stan program that does this. The reason almost all variables are labeled as prior_* is that I’m going to use this as my prior calibration. The actual updating will happen as the 2026 season progresses.
data {
int<lower=0> prior_N; // number of observations in prior season (used to calibrate prior)
array[prior_N] int prior_homeID, prior_awayID; // id of home and away teams
array[prior_N] int prior_homewin; // =1 iff home team win
array[prior_N] int prior_roundID;
}
transformed data {
int nTeams = max(prior_homeID);
int prior_nRounds = max(prior_roundID);
vector[prior_N] prior_round = to_vector(prior_roundID);
}
parameters {
//real<lower =-1, upper=1> prior_y_AR;
//real<lower=0> prior_y_sigma;
row_vector[nTeams] prior_teamRE;
real<lower=0> prior_teamRE_sigma;
//matrix[prior_nRounds, nTeams] prior_y; // latent team strength
}
model {
/*----------------------------------------------------------------------------
PRIOR CALIBRATION FROM PREVIOUS SEASON
----------------------------------------------------------------------------*/
prior_teamRE_sigma ~ normal(0,5);
prior_teamRE ~ normal(0.0, prior_teamRE_sigma);
prior_homewin ~ bernoulli_logit(prior_teamRE[prior_homeID]-prior_teamRE[prior_awayID]);
}
generated quantities {
row_vector[prior_N] prior_prWin = inv_logit(prior_teamRE[prior_homeID]-prior_teamRE[prior_awayID]);
}I estimated this model using the win/loss data from the 2025 men’s AFL season. You can download these really easily using the fitzRoy library in R. One thing I get out of this model is a ladder (ranking) of teams, which we can compare to the actual ladder. But more interestingly for me, we also get a cardinal measure of how well each team did (i.e. the latent random effects). Here is this ranking (based on posterior mean):
| X | team.abbrev | mean | X2.5. | X97.5. |
|---|---|---|---|---|
| 1 | BL | 1.161 | 0.202 | 2.268 |
| 2 | GEE | 1.109 | 0.097 | 2.238 |
| 3 | ADEL | 1.033 | 0.032 | 2.125 |
| 4 | COLL | 0.950 | -0.044 | 2.025 |
| 5 | HAW | 0.832 | -0.132 | 1.837 |
| 6 | GWS | 0.776 | -0.224 | 1.844 |
| 7 | FRE | 0.731 | -0.256 | 1.759 |
| 8 | GCFC | 0.692 | -0.300 | 1.701 |
| 9 | WB | 0.489 | -0.498 | 1.587 |
| 10 | SYD | 0.124 | -0.844 | 1.116 |
| 11 | PORT | -0.310 | -1.321 | 0.658 |
| 12 | STK | -0.403 | -1.437 | 0.588 |
| 13 | CARL | -0.552 | -1.621 | 0.438 |
| 14 | MEL | -0.887 | -1.970 | 0.123 |
| 15 | ESS | -1.014 | -2.109 | 0.000 |
| 16 | NMFC | -1.086 | -2.176 | -0.077 |
| 17 | RICH | -1.400 | -2.650 | -0.309 |
| 18 | WCE | -2.383 | -3.978 | -1.094 |
But what about the win probabilities that I will actually be using for my tipping? For perspective, my model assigned a posterior mean probability of 0.488 for Geelong winning the grand final, and a 95% CR for that is \((0.23, 0.74)\), so there’s lots of posterior uncertainty there to be integrated out!