20 Application: Ranked choices and the Thurstonian model
Up to this point, I have shown you models for choosing one thing out of a choice set. This is often how we present things to our participants. In these situations, revealed preference tells us that the chosen item must be the item yielding the greatest utility of all items in the choice set. From a probabilistic choice perspective, we know that the utility of this item plus an error was the greatest. However in other experiments we might ask our participants to rank the items in a choice set. In principle, this gives a a lot more information, because not only do we know the payoff-maximizing item (i.e. the item ranked first), but we also know how all of the other items compare, too.89
While it is in principle possible to write down a (say) logit choice model where each option is one way to rank the items in the choice set, this will become computationally infeasible very quickly. Consider for example the choice data shown in Table 20.1, which is the ranking of nine different distributions of earnings made by the first participant in the first game of Alempaki et al. (2022). There are \(9!=362,880\) possible ways to rank these payoffs! In practice we will need a more tractable model for how people make decisions in environments like these.
ranking<-"Data/ACKLP2022_ranking.dta" |>
  read_dta() |>
  mutate(uid = paste("id",id) |> as.factor() |> as.numeric()) |>
  arrange(uid,game)
ranking |> 
  filter(uid==1 & game==1) |>
  dplyr::select(ranking,own,other) |>
  arrange(ranking) |>
  kbl(caption = "Ranking of payoffs in game 1 made by participant 1,") |>
  kable_classic(full_width=FALSE)| ranking | own | other | 
|---|---|---|
| 1 | 8 | 2 | 
| 2 | 8 | 0 | 
| 3 | 6 | 6 | 
| 4 | 6 | 8 | 
| 5 | 4 | 4 | 
| 6 | 2 | 2 | 
| 7 | 0 | 2 | 
| 8 | 0 | 8 | 
| 9 | 0 | 10 | 
20.1 The Thurstonian model
One way to tractably handle ordered choice data is with a Thurstonian model. This model assumes that each item \(k\) in the choice set \(j\) has a latent variable \(z_{j,k}\). The ranking we observe in the data is the ranking of the realization of the vector \(z_j\). We then assume a distribution for this latent vector, and estimate the parameters in this distribution. For example, we might assume a normal distribution:
\[ z_{j,k}\sim \text{ind }N(\mu_{j,k},\sigma^2) \]
From a structural perspective, we can interpret \(z_{j,k}\) as the utility of item \(k\) in choice set \(j\). We can let the mean of this latent variable \(\mu_{j,k}\) be a function of the data and our model’s parameters, which means that we can use this to estimate a utility function! That is, suppose we have a utility function \(U(x;\theta)\) with arguments \(x\) and parameters \(\theta\), then we can set \(\mu_{j,k}=U(x_{j,k};\theta)\).
To see how this model relates to models of choosing a single thing from a choice set, consider the special case where the choice set has just two items. In this case, the participant will rank item \(k=1\) over \(k=2\) if and only if \(z_{j,1}>z_{j,2}\), which occurs with probability:
\[ \begin{aligned} \Pr(z_{j,1}>z_{j,2})&=\Pr\left(U(x_{j,1};\theta)-U(x_{j,2};\theta)+\sigma(\xi_1-\xi_2)>0\right)&\quad \xi_1,\xi_2\sim iid N(0,1)\\ &=\Pr\left(\frac{U(x_{j,1};\theta)-U(x_{j,2};\theta)}{\sigma}>\xi_2-\xi_1\right)\\ &=\Phi\left(\frac{U(x_{j,1};\theta)-U(x_{j,2};\theta)}{\sqrt 2\sigma}\right) \end{aligned} \]
where \(\Phi(\cdot)\) is the standard normal cdf, so it reduces to a model of probit choice!
As I have described the model up to this point, it is not identifiable if each \(\mu_{j,k}\) is a free parameter. This is because adding a constant to the mean vector \(\mu_j\) will generate the same predictions. A simple solution to this is to normalize the mean utility of one of the items in the choice set to zero. However in the example I present below, we do not need to do this, because our utility function does not have a constant in it. In fact, this should be the case for all of our utility functions, becuase we can always represent the same preferences by adding a constant to all of our utilities.
The trick to estimating this model in Stan is to sort the data by the ranking, then take advantage of the ordered vector data type. That is, I pass the data about each choice set to Stan so that it is ordered from worst to best. That way, I know that the latent utilities must be ordered like this:
\[ z_{j,K}>z_{j,K-1}>z_{j,K-2}>\ldots>z_{j,1} \]
I then use data augmentation to draw these \(z_j\)s. This part is really easy once you organize the data in the right way (outcomes ranked from worst to best). You will see in the Stan files below that all you really need to do to augment the data here is to just declare the distribution (i.e. z ~ normal(U,sigma);). The ordered vector takes care of the constraints.
20.2 Computational issues
All of this data augmentation means that there will be a lot of numbers Stan needs to keep track of: every \(z_{j,k}\) needs to be augmented. For example, suppose you have \(N\) participants each do \(J\) ranking tasks, each with \(K\) things to rank in each task. This means that your model needs to augment \(NJK\) pieces of data on top of anything else that you are doing. This is a lot more than you would normally do with a hierarchical model that just needs to augment the individual-level parameters. In this case if you have \(p\) individual-level parameters per participant, you would just be augmenting \(Np\) things.90 All of this is to say, beware of how much memory this augmentation is chewing up, and consider not saving these augmented data from your estimation, because there will be thousands of realizations for each of these variables. Think about whether you would actually need them.
20.3 Example dataset and model
Alempaki et al. (2022) elicit their participants’ preferences over the outcomes in eight games by asking them to rank these outcomes. An example of this ranking for the first participant in the first game is shown in Table 20.1. Here “own” refers to the participant’s own monetary payoff, and “other” refers to their opponent’s monetary payoff. In order to incentivize truth-telling, if this ranking task was selected for payment, two of the outcomes were randomly selected. The outcome that the participant ranked higher was then implemented. For example based on the choices in Table 20.1, if outcomes \((8,0)\) and \((2,2)\) were randomly selected, \((8,0)\) would be implemented because this participant ranked this outcome higher.
As Alempaki et al. (2022) were concerned about other-regarding preferences in their games, we will estimate a simple 1-parameter utility model for other-regarding preferences:
\[ U(x;\alpha)=x^\mathrm{own}+\alpha x^\mathrm{other} \] where \(x^\mathrm{own}\) is the decisioon-maker’s own earnings, and \(x^\mathrm{other}\) is the other participant’s earnings. Here I am trying to keep things a simple as possible (while still hopefully being interesting) becuase I was having trouble estimating the full Fehr and Schmidt (1999) model of inequality-aversion. Maybe when I understand more the pathologies of this model I will extend this chapter to include this.91
20.4 A representative agent model
While at this point you know how I feel about the value of reporting representative agent models, this one was really useful in helping me to understand the Thurstoninal model, so I present it here as a Thurstonian only model without the hierarchical stuff that should really go with it. But don’t worry, we’ll put all of that back in before the end of the chapter.
This model has two parameters, \(\alpha\) and \(\sigma\), so we willset thesir priors to:
\[ \begin{aligned} \alpha&\sim N(0,1)\\ \sigma&\sim \mathrm{Cauchy}^+(0,1) \end{aligned} \]
These are fairly diffuse. Probably too diffuse if I actually calibrated them properly (especially for \(\alpha\)), but they will pin down the model enough.
Without further ado, here is the Stan program I came up with to implement the model.
data {
  
  int nparticipants;
  
  int<lower=0> N9; // total number of decisions with 9 unique payoffs
  int<lower=0> N8; // total number of decisions with 8 unique payoffs
  int<lower=0> N7; // total number of decisions with 7 unique payoffs
  
  int id9[N9];
  int id8[N8];
  int id7[N7];
  
  // payoffs from the game, ranked in ordered choice, worst to best
    // for the games with 9 uninque payoffs
    vector[9] own9[N9];
    vector[9] other9[N9];
    // for the games with 8 uninque payoffs
    vector[8] own8[N8];
    vector[8] other8[N8];
    // for the games with 9 uninque payoffs
    vector[7] own7[N7];
    vector[7] other7[N7];
  
    vector[2] prior_alpha;
    real prior_sigma;
}
parameters {
  
  real alpha; // coefficient on other's payoff
  real<lower=0> sigma; // sd of latent variable
  
  // augmented data
  ordered[9] z9[N9];
  ordered[8] z8[N8];
  ordered[7] z7[N7];
  
  
}
transformed parameters {
 
  
}
model {
  
  
  
  for (ii in 1:N9) {
    
    // utility 
    vector[9] U = own9[ii]+alpha*other9[ii];
    
    z9[ii] ~ normal(U,sigma);
    
  }
  
  for (ii in 1:N8) {
    // utility 
    vector[8] U = own8[ii]+alpha*other8[ii];
    
    
    z8[ii] ~ normal(U,sigma);
  }
  
  for (ii in 1:N7) {
    // utility 
    vector[7] U = own7[ii]+alpha*other7[ii];
    
    z7[ii] ~ normal(U,sigma);
  }
  
  
  
  alpha ~ normal(prior_alpha[1],prior_alpha[2]);
  sigma ~ cauchy(0.0,prior_sigma);
  
}As you can see, it is really an exercise in data-augmentation, and a bit of thought put in to how to pass the data to Stan. With regard to the latter, one of the quirks of the dataset is that there ranking lists are of different length. This is because some of the Alempaki et al. (2022) games have duplicate payoffs. So while there were nine outcomes of each game, in some games there were only seven or eight unique payoff profiles. I decided to handle these just as separate datasets, basically just splitting the full dataset into one that contained lists of length nine, one with lists of length eight, and one with lists of length seven. For the data-augmentation part, you can see this in the model block, for instance for the length-9 lists:
Here is Stan’s summary of the posterior simulation
summary("Code/ACKLP2022/estimates_FSranking_homogeneous.rds" |>
          readRDS())$summary |>
  data.frame() |>
  rownames_to_column(var = "parameter") |>
  kbl(digits=2) |>
  kable_classic(full_width=FALSE)| parameter | mean | se_mean | sd | X2.5. | X25. | X50. | X75. | X97.5. | n_eff | Rhat | 
|---|---|---|---|---|---|---|---|---|---|---|
| alpha | 0.15 | 0.00 | 0.01 | 0.14 | 0.15 | 0.15 | 0.16 | 0.17 | 4433.08 | 1 | 
| sigma | 1.52 | 0.00 | 0.02 | 1.48 | 1.50 | 1.52 | 1.53 | 1.56 | 2430.91 | 1 | 
| lp__ | -7161.59 | 2.42 | 75.89 | -7311.48 | -7213.40 | -7161.84 | -7110.53 | -7010.85 | 986.81 | 1 | 
20.5 A hierarchical model
While going to a hierarchical version of this model in principle should be fairly straightforward, this proved to be a long hot walk in the sun. In particular, for many of the specifications I tried, I kept getting BFMI warnings from Stan. In the end, the model I got to run successfully without warnings did the following relative to what I usually do with hierarchical models:
- I set the correlation matrix equal to the idenitiy matrix. That is, I assumed no correlation between \(\alpha_i\) and \(\sigma_i\)
- I used a half-normal prior for the standard deviation parameters rather than a half-Cauchy. This puts less mass in the tail of the distribution
- I used much more informative priors than I usually would, especially for the standard deviation parameters.
Specifically, the hierarchical model specification is:
\[ \begin{aligned} \alpha_i&\sim N(\mu_\alpha,\tau_\alpha^2)\\ \log\sigma_i&\sim N(\mu_\sigma,\tau^2_\sigma)\\ \text{hyper-priors: } \mu_\alpha &\sim N(0,0.25)\\ \mu_\sigma&\sim N(0,1)\\ \tau_\mu &\sim N^+(0,0.01)\\ \tau_\sigma &\sim N^+(0,0.01) \end{aligned} \]
Here is the Stan program that implements it:
data {
  
  int nparticipants;
  
  int<lower=0> N9; // total number of decisions with 9 unique payoffs
  int<lower=0> N8; // total number of decisions with 8 unique payoffs
  int<lower=0> N7; // total number of decisions with 7 unique payoffs
  
  int id9[N9];
  int id8[N8];
  int id7[N7];
  
  // payoffs from the game, ranked in ordered choice, worst to best
    // for the games with 9 uninque payoffs
    vector[9] own9[N9];
    vector[9] other9[N9];
    // for the games with 8 uninque payoffs
    vector[8] own8[N8];
    vector[8] other8[N8];
    // for the games with 9 uninque payoffs
    vector[7] own7[N7];
    vector[7] other7[N7];
  
    vector[2] prior_MU[2];
    vector[2] prior_TAU;
    real prior_OMEGA;
}
parameters {
  
  vector[2] MU;
  vector<lower=0>[2] TAU;
  //cholesky_factor_corr[2] L_OMEGA;
  
  matrix[2,nparticipants] z_pars;
  
  ordered[9] z9[N9];
  ordered[8] z8[N8];
  ordered[7] z7[N7];
  
  
}
transformed parameters {
  
  vector[nparticipants] alpha;
  vector<lower=0>[nparticipants] sigma;
  
  {
    
    matrix[2,nparticipants] pars = rep_matrix(MU,nparticipants) 
      + //diag_pre_multiply(TAU, L_OMEGA) * z_pars;
      diag_matrix(TAU)*z_pars;
      
    alpha = pars[1,]';
    sigma = exp(pars[2,]');
    
  }
  
}
model {
  
  
  
  for (ii in 1:N9) {
    
    // utility 
    vector[9] U = own9[ii]+alpha[id9[ii]]*other9[ii];
    
    z9[ii] ~ normal(U,sigma[id9[ii]]);
    
  }
  
  for (ii in 1:N8) {
    // utility 
    vector[8] U = own8[ii]+alpha[id8[ii]]*other8[ii];
    
    
    z8[ii] ~ normal(U,sigma[id8[ii]]);
  }
  
  for (ii in 1:N7) {
    // utility 
    vector[7] U = own7[ii]+alpha[id7[ii]]*other7[ii];
    
    z7[ii] ~ normal(U,sigma[id7[ii]]);
  }
  
  
  // hierarchical structure
  
  to_vector(z_pars) ~ std_normal();
  
  // priors
  
  for (pp in 1:2) {
    
    MU[pp] ~ normal(prior_MU[pp][1],prior_MU[pp][2]);
    TAU ~ normal(0.0,prior_TAU[pp]);
  }
  
  //L_OMEGA ~ lkj_corr_cholesky(prior_OMEGA);
}And here are the population-level estimates from it:
Fit<-summary("Code/ACKLP2022/estimates_linearranking_hierarchical.rds" |>
  readRDS())$summary |>
  data.frame() |>
  rownames_to_column(var = "par")
population<-Fit |>
  filter(grepl("MU",par) | grepl("TAU",par)) |>
  mutate(
    par = c("$\\mu_\\alpha$","$\\mu_\\sigma$","$\\tau_\\alpha$","$\\tau_\\sigma$")
  )
population |>
  kbl(digits = 2) |>
  kable_classic(full_width=FALSE)| par | mean | se_mean | sd | X2.5. | X25. | X50. | X75. | X97.5. | n_eff | Rhat | 
|---|---|---|---|---|---|---|---|---|---|---|
| \(\mu_\alpha\) | 0.15 | 0 | 0.01 | 0.12 | 0.14 | 0.15 | 0.15 | 0.17 | 2208.84 | 1 | 
| \(\mu_\sigma\) | 0.00 | 0 | 0.03 | -0.05 | -0.01 | 0.01 | 0.02 | 0.05 | 1612.34 | 1 | 
| \(\tau_\alpha\) | 0.09 | 0 | 0.00 | 0.08 | 0.08 | 0.09 | 0.09 | 0.09 | 4569.19 | 1 | 
| \(\tau_\sigma\) | 0.16 | 0 | 0.01 | 0.15 | 0.16 | 0.16 | 0.17 | 0.17 | 2321.31 | 1 | 
and here are the shrinkage estimates for \(\alpha\)
FitAlpha<-Fit |>
  filter(grepl("alpha",par)) |>
  mutate(
    cumul = rank(mean)/n()
  )
(
  ggplot(FitAlpha,aes(x=mean))
  +stat_ecdf()
  +theme_bw()
  +geom_errorbar(aes(y=cumul,xmin = X2.5.,xmax = X97.5.),alpha = 0.2)
  +xlab(expression(alpha[i]))
  +ylab("Empirical cumulative density of posterior mean")
)
and for \(\sigma\):
FitAlpha<-Fit |>
  filter(grepl("sigma",par)) |>
  mutate(
    cumul = rank(mean)/n()
  )
(
  ggplot(FitAlpha,aes(x=mean))
  +stat_ecdf()
  +theme_bw()
  +geom_errorbar(aes(y=cumul,xmin = X2.5.,xmax = X97.5.),alpha = 0.2)
  +xlab(expression(sigma[i]))
  +ylab("Empirical cumulative density of posterior mean")
)
For perspective, if \(\sigma_i=1\) then a selfish person ranking two options with a dollar difference between them will rank them with the highest payoff first with probability:
\[ \Phi\left(\frac{1}{\sqrt 2}\right)\approx 0.76 \]
20.6 R code used to run this
library(tidyverse)
library(haven)
library(rstan)
  rstan_options(auto_write = TRUE)
  options(mc.cores = parallel::detectCores())
  
ranking<-"Data/ACKLP2022_ranking.dta" |>
  read_dta() |>
  mutate(uid = paste("id",id) |> as.factor() |> as.numeric()) |>
  arrange(id,game,-ranking)
d<-ranking |>
  group_by(id,game) |>
  mutate(
    nuniquepayoffs = n(),
    idg = paste0(id,"-",game),
    k = 1:n()
  ) 
own9<-list()
other9<-list()
own8<-list()
other8<-list()
own7<-list()
other7<-list()
for (gg in (d$idg |> unique())) {
  
  
  dg<-d |> filter(idg==gg)
  
  if (dim(dg)[1]==9) {
    
    own9[[paste(gg)]]<-dg$own |> unlist() |> as.vector()
    other9[[paste(gg)]]<-dg$other |> unlist() |> as.vector()
    
  } else if (dim(dg)[1]==8) {
    own8[[paste(gg)]]<-dg$own |> unlist() |> as.vector()
    other8[[paste(gg)]]<-dg$other |> unlist() |> as.vector()
  } else {
    own7[[paste(gg)]]<-dg$own |> unlist() |> as.vector()
    other7[[paste(gg)]]<-dg$other |> unlist() |> as.vector()
  }
}
#-------------------------------------------------------------------------------
# Pooled model
#-------------------------------------------------------------------------------
model<-"Code/ACKLP2022/linearranking_homogeneous.stan" |>
  stan_model()
dStan<-list(
  
  nparticipants = d$uid |> unique() |> length(),
  
  N9 = length(own9),
  N8 = length(own8),
  N7 = length(own7),
  
  id9 = (d |> filter(nuniquepayoffs==9 & k==1))$uid,
  id8 = (d |> filter(nuniquepayoffs==8 & k==1))$uid,
  id7 = (d |> filter(nuniquepayoffs==7 & k==1))$uid,
  
  own9 = own9,
  other9=other9,
  own8 = own8,
  other8=other8,
  own7 = own7,
  other7=other7,
  
  prior_alpha = c(0,1),
  prior_sigma = 1
)
Fit <- model |> 
  sampling(
    data=dStan, 
    #chains=4,iter=10000,
    seed=43,
    pars = c("z9","z8","z7"),include=FALSE
  )
Fit |>
  saveRDS("Code/ACKLP2022/estimates_FSranking_homogeneous.rds")
###############################################################################
# Hierarchical model
###############################################################################
model<-"Code/ACKLP2022/linearranking_hierarchical.stan" |>
  stan_model()
dStan<-list(
  
  nparticipants = d$uid |> unique() |> length(),
  
  N9 = length(own9),
  N8 = length(own8),
  N7 = length(own7),
  
  id9 = (d |> filter(nuniquepayoffs==9 & k==1))$uid,
  id8 = (d |> filter(nuniquepayoffs==8 & k==1))$uid,
  id7 = (d |> filter(nuniquepayoffs==7 & k==1))$uid,
  
  own9 = own9,
  other9=other9,
  own8 = own8,
  other8=other8,
  own7 = own7,
  other7=other7,
  
  prior_MU = list(c(0,0.25),
                #  c(1,0.25),
                  c(0,1)
                  ),
  prior_TAU = c(0.01,0.01),
  prior_OMEGA = 4
)
Fit <- model |> 
  sampling(
    data=dStan, 
    chains=8,iter=2000,
    seed=42,
    pars = c("z_pars","z9","z8","z7"),include=FALSE
  )
Fit |> 
  saveRDS("Code/ACKLP2022/estimates_linearranking_hierarchical.rds")References
- This of course comes with the caveat that a ranking task may be more confusing to participants, and so decisions in a ranking task may be more noisy. I do not want to weigh in on the methodology here, instead we will just proceed with the understanding that we have ranked data. ↩︎ 
- I struggle to think of a tractable model/experiment combination where \(p>JK\).↩︎ 
- Specifically, Stan was giving me some BFMI warnings.↩︎