Yesterday I had a really interesting conversation with Grand Theft Eigenvalue on BlueSky about preferences over bias and efficiency of estimators for making high-stakes decisions with a limited sampling budget. We quickly agreed that Bayesian estimators were probably the way to go in this situation because (in my opinion, at least):

Bayesian estimators give you a full posterior distribution for you to integrate over your loss/value function
Bayesian estimators don’t make a distinction between small and large samples, and with a limited sampling budget, we are likely in the small sample realm, and
Bayesian estimators make it easy to make decisions when there are more than two things to choose between. I.e. again, just integrate over the loss/value function for each option

This conversation also led to a discussion about how a Frequentist would approach the same problem. At this point Rachel Leah Childers and some others chimed in with some references to e-values. Specifically, Rachel pointed us to “stopping robust ‘anytime-valid’ inference” and a simpler case study of it here. It looks like I have some reading to do!

In all of this, I couldn’t help formulating what I thought was a very simple model for Grand Theft Eigenvalue’s original question: how should I make decisions with a limited sampling budget? So what follows is a statement of what I think is a minimally interesting problem, and how to solve it in Stan.

I also think this problem in general has a lot to do with the design of economic experiments, which I have thought about a lot from a structural econometrics standpoint (although I am still think there is a long way to go here for it to be fully Bayesian). Furthermore, Alex Alekseev has a cool paper on designing experiments with the budget in mind.

A model for paying for information

I originally posted this problem with the question “how would a frequentist approach this problem?” I think it is a minimally interesting problem to capture Grand Theft Eigenvalue’s original question about estimators and high-stakes decisions with a limited budget to acquire more information.

A coin flips heads with unknown probability $\theta$. You are shown $N_1$ independent coin flips. You can then do one of three things: (1) Take a bet on the next flip. If it is heads you win $W$. If it is tails you lose $L$, (2) Do nothing, (3) Pay $m$ to see another $N_2$ independent flips, then choose between (1) and (2) (with no option of choosing (3) again)

While I’m still working on understanding the answer to my question about the frequentist, I do know how I would solve this as a Bayesian armed with Stan (and some understanding of conjugate priors), so at this point this is what I will show you.

Approaching the problem as a Bayesian

As a Bayesian, in order to help someone make a decision in this situation, I need two things:

I need a prior for $\theta$. That is, I need your belief about the coin’s probability of flipping heads before observing any data.
I need to know a value or loss function. For example, how good does someone actually feel about winning $W$? How badly do they feel about losing $L$? and so on.

Item (1) will help with the belief-updating problem. (2) will help with deciding between the options once I have solved the belief-updating problem.

I will assume a Beta prior for (1). This is reasonable because it respects the support of $\theta$ (i.e. it has to be a number between 0 and 1), and has the added bonus of being a conjugate prior for the binomial likelihood. I will lean heavily on this in my working. To summarize:

\[ \theta \sim \mathrm{Beta}(a, b) \]

with $a>0$ and $b>0$ given. For my computations I will just assume that $a=b=1$, so the prior is uniform over the unit interval, but I will leave $a$ and $b$ as data inputs in the Stan program so that you can play around with this if you really want to.

For the value function, I will assume expected utility preferences with the constant absolute risk aversion (CARA) utility function:

\[ u(x)=\frac{1-\exp(-rx)}{r} \]

Here I went with CARA over the more popular CRRA because I need to be able to deal with negative payoffs (CRRA won’t do that). $r$ measures risk aversion, with $r>0$ representing risk-averse preferences, and $r<0$ representing risk-loving preferences.

From here it is a matter of evaluating the expected utility under each of the three possible choices, and picking the best option.¹

To begin with, let’s evaluate the first two options, which don’t involve paying for more data. Suppose that we first observe $H_1$ heads (and $N_1-H_1$ tails). Our posterior distribution for $\theta$ is now:

\[ \theta \mid H_1\sim \mathrm{Beta}(a+H_1, b+N_1-H_1) \]

Noting that an expected utility maximizer in this problem only cares about the posterior mean of this, we can write the posterior prediction of the next coin flip as:

\[ p_1=E[\theta\mid H_1] = \frac{a+H_1}{a+b+N_1} \]

We can therefore evaluate the expected utilities of the first and second options as:

\[ \begin{aligned} \text{option 1: }\ & p_1u(W)+(1-p_1)u(-L)\\ \text{option 2: }\ & u(0) \end{aligned} \]

That is, if we did not have the third option, my advice to you would be to choose option 1 if and only if $p_1u(W)+(1-p_1)u(-L)>u(0)$.

Now we come to evaluating option 3. This is a little more tricky, because we need to think about the possible samples of $N_2$ flips we might observe given our (now intermediate) posterior belief of $\theta\mid H_1$. Intuitively, sometimes we will will observe a lot of heads in the second sample, and we update our belief that the next flip will likely be heads. In these cases we will probably choose to take the bet. On the other hand if we observe a lot of tails, then we will conclude that the probability of the next flip being heads is sufficiently low that we choose not to take the bet. Let’s first put ourselves in the shoes of someone who has observed the first sample and the second sample, so they know how many heads were in the second sample. Let this number be $H_2$. Then their problem becomes almost the same as in the first stage (without the option to buy more data), and so they will choose to make the bet if and only if:

\[ \begin{aligned} p_2&u(W-m)+(1-p_2)u(-L-m)>u(-m)\\ \text{where: } p_2&=\frac{a+H_1+H_2}{a+b+N_1+N_2} \end{aligned} \]

Which is all very well and good, except that when we are deciding to pay for the second sample pf $N_2$ coin flips, we don’t know what $H_2$ is. What we need to do is integrate out the uncertainty in this associated with $H_2$. To do this in Stan, I simulate the posterior of $\theta$ after observing the first sample, then draw $H_1$ from the binomial distribution conditional on $\theta$. That is:

\[ \begin{aligned} \text{prior: }\ \theta&\sim \mathrm{Beta}(a, b)\\ \text{likelihood: }\ H_1 &\sim \mathrm{Binomial}(N_1,\theta)\\ \text{simulated: }\ H_2&\sim \mathrm{Binomial}(N_2,\theta) \end{aligned} \]

That is, unconditional on $\theta$ (i.e. integrating it out), $H_2$ is a draw from $H_2\mid H_1$

I then evaluate the expected utility of choosing the third option as:

\[ \begin{aligned} \text{option 3: }\ E_{H_2\mid H_1}\left[\max\left\{u(-m), p_2(H_2)u(W-m)+(1-p_2(H_2))u(-L-m)\right\}\right] \end{aligned} \]

where the expectation is evaluated with respect to $H_2\mid H_1$, and in practice is computed in the generated quantities block of Stan.

Here is the Stan program I wrote to evaluate this expectation. As you can see, I also calculate certainty equivalents of each option, as this converts utility numbers (which are hard to interpret and impossible to compare across individuals) back into dollars (which are easy to interpret and can be compared across individuals).


functions {
  
  // CARA utility function
  real ufun(real x, real r) {
    
    return r==0 ? x : (1.0-exp(-r*x))/r;
  }
  
  // inverse CARA utility function 
  real iufun(real u, real r) {
    
    /* some working
    
    u = (1-exp(-rx))/r
    ru = 1-exp(-rx)
    -rx = log(1-ru)
    x = -log(1-ru)/r
    */
    
    return r==0 ? u : -log(1.0-r*u)/r;
  }
  
}

data {
  // first and second sample sizes
  int<lower=0> N1, N2;
  
  // number of heads observed in first sample
  int<lower=0,upper=N1> nHeads1;
  
  // prior for theta (Beta)
  vector[2] prior_theta;
  
  // benefit/cost values
  real<lower=0> W, L, m;
  
  // CARA coefficient
  real r; 
  
}


parameters {
  
  // probability of flipping heads
  real<lower=0, upper=1> theta;
  
}

model {
  
  // prior
  theta ~ beta(prior_theta[1],prior_theta[2]);
  
  // likelihood of first sample
  nHeads1 ~ binomial(N1, theta);
  
}

generated quantities {
  
  // posterior after observing first sample
  vector[2] posterior1 =[prior_theta[1]+nHeads1, 
          prior_theta[2]+N1-nHeads1]';
  
  // prediction after observing first sample
  real prHeads1 = posterior1[1]/sum(posterior1);
  
  // second sample
  int nHeads2 = binomial_rng(N2,theta);
  
  // posterior for beta after observing the second sample
  vector[2] posterior2 = [prior_theta[1]+nHeads1+nHeads2, 
          prior_theta[2]+N1+N2-nHeads1-nHeads2]';
  
  
  // prediction for next coin flip conditional on seeing both samples
  real prHeads2 = posterior2[1]/sum(posterior2);
  
  vector[3] EUchoices;
  
  // take the bet now
  EUchoices[1] = ufun(W,r)*prHeads1+ufun(-L,r)*(1.0-prHeads1);
  // stop now
  EUchoices[2] = ufun(0.0,r);
  // pay for more data
  EUchoices[3] = fmax(ufun(-m,r),ufun(W-m, r)*prHeads2+ufun(-L-m,r)*(1.0-prHeads2));
  
  vector[3] CEchoices;
    for (cc in 1:3) {
      
      CEchoices[cc] = iufun(EUchoices[cc],r);
      
    }

}

Some results

I fixed $W=10$, $L=8$, and $m=1$, and assumed that the first sample was just one head and one tail. Then, I explored the recommendations made by the model for $r\in\{0, 0.02, 0.04, \ldots 0.10\}$ and $N_2\in\{1, 2, \ldots, 30\}$. Here is what the certainty equivalents look like:

So with this setup, if you are risk-neutral ($r=0$) you would be willing to pay $1 to see about ten or more coin flips. For the intermediate levels of risk aversion (say, $r=0.02$ and $r=0.04$) the extra information is even more valuable. But for the most risk-averse types (say, $r\geq 0.08$) the information is not valuable because it wouldn’t change their decision to not take the bet after observing the extra $N_2$ coin flips.

One problem I ran into was that there is some simulation uncertainty in the certainty equivalent of the “buy more data” option. This is why you can see in my R code (below) that I upped the number of iterations to 10,000 from RStan’s default 2,000. In this problem, we are more interested in the simulation accuracy of the mean of the certainty equivalent than its posterior standard deviation. That’s why I plot the ribbons for $\pm$ one se_mean. Especially if my decision-maker told me that their $r$ was about 0.06, I would run the simulation for much longer to really pin down whether or not it is worth paying for the data, as the certainty equivalents for not taking the bet and paying for the data are very close to each other.

Appendix – R code

library(tidyverse)
library(rstan)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)


## Binomial model --------------------------------------------------------------

model<-"_posts/2026-02-14-a-dynamic-programming-problem/binomial.stan" |>
  stan_model()


ESTIMATES<-tibble()

for (rr in seq(0, 0.1, by = 0.02)) {
  for (nn in seq(1,30, by=1)) {
    
    print(paste(rr,nn))

      d<-list(
        N1 = 2,
        N2 = nn,
        nHeads1 = 1,
        prior_theta = c(1,1),
        
        W = 10, 
        L = 8,
        m = 1,
        
        r = rr
        
      )
      
      Fit<-model |>
        sampling(
          data=d,
          seed=42,
          cores = 1, refresh = 0,
          iter = 10000,
          par = "CEchoices", include=TRUE
        )
      
      ESTIMATES<-ESTIMATES |>
        rbind(
          summary(Fit)$summary |>
            data.frame() |>
            rownames_to_column() |>
            mutate(
              r = rr,
              N2 = nn
            )
        )
}
}

ESTIMATES |>
  write.csv("_posts/2026-02-14-a-dynamic-programming-problem/binomial.csv")

Or as Jim Savage puts it: ABIYLFOYP (always be integrating your loss function over your posterior).↩︎

Should I pay for more information?

A model for paying for information

Approaching the problem as a Bayesian

Some results

Appendix – R code