October 2023
What are you going to do with it?
Each one of these is a different research question, and therefore might be best answered with a different experiment design
This paper: How can I design an experiment to estimate a structural model with a specific research question in mind?
\[ \max_{d\in \mathbb D} V(d) \]
\[ \max_{d\in \mathbb D} V(d) \]
Suggestions for making the estimates (in some sense) “as precise as possible”, based on the Fisher information matrix, which measures our estimates’ expected precision
Focuses on precision of estimates \(\hat\theta\), not predictions \(g(\hat\theta)\)
Specify a utility function \(v(\tilde\theta\mid\theta)\) that measures our utility of using point estimate \(\tilde\theta\) when the true parameters are actually \(\theta\).
Then, integrate out the uncertainty:
\[ \begin{aligned} V(d)&=\int_\Theta\int_\Theta v(\tilde\theta\mid\theta)p(\tilde\theta\mid\theta,d)\mathrm d\tilde\theta p(\theta)\mathrm d\theta=E\left[E\left[v(\tilde\theta\mid\theta)\big| \theta,d\right]\big| d\right] \end{aligned} \]
\[ \begin{aligned} V(d)&=E\left[E\left[v(\tilde\theta\mid\theta)\big| \theta,d\right]\big| d\right] \end{aligned} \]
Squared prediction error of parameters:
\[ v(\tilde\theta\mid\theta)=-(\tilde\theta-\theta)^\top(\tilde\theta-\theta) \]
Squared prediction error of a function of parameters:
\[ v(\tilde\theta\mid\theta)=-(g(\tilde\theta)-g(\theta))^\top(g(\tilde\theta)-g(\theta)) \]
Expected log predictive density (for a binary choice data)
\[ v(\tilde\theta\mid\theta)=\frac{1}{T}\sum_{t=1}^T\left[\Lambda(\Delta_{t})\log\Lambda(\tilde\Delta_t)+(1-\Lambda(\Delta_t))\log(1-\Lambda(\tilde\Delta_t))\right] \]
Whatever you like!
\[ \begin{aligned} V(d)&=E\left[E\left[v(\tilde\theta\mid\theta)\big| \theta,d\right]\big| d\right] \end{aligned} \]
The outer expectation is over true parameters \(\theta\). We can use Monte Carlo integration to approximate this (use draws from the prior \(p(\theta)\)).
The inner expectation is over the sampling distribution of \(\tilde\theta\mid d,\theta\). I use an approximation for this.
\[ \begin{aligned} v(\tilde\theta\mid\theta)&\approx v(\theta\mid\theta)+(\tilde\theta-\theta)^\top\frac{\partial v(\theta\mid\theta)}{\partial \tilde\theta}+\frac{1}{2}(\tilde\theta-\theta)^\top\frac{\partial^2 v(\theta\mid\theta)}{\partial \tilde\theta\partial \tilde\theta^\top}(\tilde\theta-\theta) \end{aligned} \]
…
\[ \begin{aligned} E\left[v(\tilde\theta\mid\theta)\mid d,\theta\right] &\approx v(\theta\mid\theta)+\frac{1}{2}E\left[(\tilde\theta-\theta)^\top\frac{\partial^2 v(\theta\mid\theta)}{\partial \tilde\theta\partial \tilde\theta^\top}(\tilde\theta-\theta)\mid d,\theta\right] \end{aligned} \]
Theorem: Let \(X\) be an \(n\times 1\) random vector with mean \(\mu\) and covariance \(\Sigma\), and let \(A\) be a symmetric \(n\times n\) matrix. Then, the expectation of the quadratic form \(X^\top A X\) is \(\mu^\top A\mu + \mathrm{tr}(A\Sigma)\)
\[ \begin{aligned} E[v(\tilde\theta\mid\theta)\mid d,\theta]&\approx v(\theta\mid\theta)+\frac12E(\tilde\theta-\theta)^\top \frac{\partial^2v(\theta\mid\theta)}{\partial \tilde\theta\partial\tilde\theta^\top}E(\tilde\theta-\theta)\\ &\quad+\frac{1}{2}\mathrm{tr}\left(\frac{\partial^2v(\theta\mid\theta)}{\partial \tilde\theta\partial\tilde\theta^\top}\mathcal I^{-1}(\theta)\right) \end{aligned} \]
Assume the bias term \(E(\tilde\theta-\theta)\) is negligable.
\[ \begin{aligned} E(v(\tilde\theta\mid\theta)\mid d,\theta)&\approx v(\theta\mid \theta)+\frac{1}{2}\underbrace{\frac{\partial^2v(\theta\mid\theta)}{\partial \tilde\theta^2}}_{<0}V(\tilde\theta) \end{aligned} \]
Minimize the (asymptotic) variance of the estimator
For more than one parameter, we have a “weighted variance”
\[ \begin{aligned} E[v(\tilde\theta\mid\theta)\mid d,\theta]&\approx v(\theta\mid\theta)+\frac{1}{2}\mathrm{tr}\left(\frac{\partial^2v(\theta\mid\theta)}{\partial \tilde\theta\partial\tilde\theta^\top}\mathcal I^{-1}(\theta)\right) \end{aligned} \]
If \(v(\tilde\theta\mid\theta)=-(\tilde\theta-\theta)^\top(\tilde\theta-\theta)\), then we will be computing the \(\mathcal A\)-Optimal design (maximize the trace of the information matrix)
Rank-dependent utility (RDU) (Quiggin 1982) with CRRA utility, Prelec (1998) probability weighting, and logit choice
\[ \begin{aligned} U_i(L)&=\sum_{k=1}^K \pi_{i,k}^L(x^L_k)^{r_i}\\ \pi_{i,k}^L&=\omega_i\left(\sum_{j=1}^kp_{j}^L\right)-\omega_i\left(\sum_{j=1}^{k-1}p_{j}^L\right)\\ \omega_i(p)&=\exp\left(-(-\log\rho_i)^{1-\psi_i}(-\log p)^{\psi_i}\right)\\ \Pr\left(y_{i,t}=L\mid L_t,R_t,\theta\right)&=\frac{\exp(\lambda_i U_i(L_t))}{\exp(\lambda_i U_i(L_t))+\exp(\lambda_i U_i(R_t))} \end{aligned} \]
Estimate a certainty equivalent as precisely as possible:
\[ v^\text{CE}(\tilde\theta\mid\theta) = -(C(\tilde\theta)-C(\theta))^2 \]
Test the expected utility parameter restriction (\(\psi=1\))
\[ v^\psi(\tilde\theta\mid\theta)=-(\tilde\psi-\psi)^2 \]
Predict decisions in another experiment (Harrison and Ng 2016)
\[ v^\text{ELPD}(\tilde\theta\mid\theta)=\frac{1}{T_b}\sum_{t=1}^T\left[\Lambda(\Delta_t)\log\Lambda(\tilde\Delta_t)+(1-\Lambda(\Delta_t))\log(1-\Lambda(\tilde\Delta_t))\right] \]
Bayesian hierarchical model using data from Harrison and Swarthout (2023) (undergrad participants only)
\[ \begin{pmatrix} \log r_i& \log\psi_i & \Phi^{-1}(\rho_i) & \log \lambda_i \end{pmatrix}^\top\sim iid N\left(\mu , \mathrm{diag}(\tau)\Omega\mathrm{diag}(\tau)\right) \]
\(\log(r)\) | \(\log(\psi)\) | \(\Phi^{-1}(\rho)\) | \(\log(\lambda)\) | |
---|---|---|---|---|
\(\mu\) | -0.68 (0.051) | -0.251 (0.033) | -0.408 (0.118) | 2.512 (0.035) |
\(\tau\) | 0.697 (0.05) | 0.433 (0.029) | 1.232 (0.107) | 0.439 (0.032) |
corr: \(r\) | 0.004 (0.107) | 0.112 (0.133) | -0.212 (0.091) | |
corr: \(\psi\) | -0.21 (0.085) | -0.015 (0.09) | ||
corr: \(\rho\) | 0.158 (0.109) |
experiment | CE | lambda | psi | r | rho |
---|---|---|---|---|---|
CE | 0.047 | 4.832 | 0.201 | 0.167 | 0.190 |
ELPD | 0.054 | 4.449 | 0.151 | 0.140 | 0.185 |
HN2016 | 0.062 | 6.861 | 0.182 | 0.238 | 0.209 |
psi | 0.066 | 4.596 | 0.141 | 0.148 | 0.200 |
Larger numbers are better
experiment | -log10(-ELPD) | log10(sd) |
---|---|---|
CE | -16.99 | 14.993427 |
ELPD | -9.74 | 7.642208 |
HN2016 | -12.84 | 10.834724 |
psi | -22.48 | 20.476469 |
experiment | LR | Wald |
---|---|---|
CE | 0.264 | 0.065 |
ELPD | 0.374 | 0.120 |
HN2016 | 0.394 | 0.121 |
psi | 0.410 | 0.136 |
On my laptop (bought in 2017)
Estimating the hierarchical model: A couple of days
Optimizing the designs: Overnight, using an exchange algorithm. Read the paper for more details on this!
Monte Carlo simulation: Overnight
Maybe obvious, but important
Importance of formalizing the research question and the structural model(s) before running the experiment
Experiments designed to answer a different research question can still be used to estimate structural models, but they will not answer your research question as well.
Example: Expected Utility Theory \(\implies\) common ratio property
These tests have different alternative hypotheses:
For large \(x\):
\[ \begin{aligned} \log(1+\exp(x))&\approx x\\ \log(1+\exp(x))&=\begin{cases} \log(1+\exp(x))&\text{ according to math}\\ \mathrm{Inf} &\text{ according to my computer} \end{cases} \end{aligned} \]
A randomly-generated experiment will most likely produce very biased estimators!
An experiment optimized for one goal will probably be terrible for another goal
Start with initial experiment design \(d\), and partition it into \(T\) elements so it can be expressed as \(d=\{d_t\}_{t=1}^T\)
Harrison, Glenn W, and Jia Min Ng. 2016. “Evaluating the Expected Welfare Gain from Insurance.” Journal of Risk and Insurance 83 (1): 91–120.
Harrison, Glenn W, and Todd Swarthout. 2023. “Cumulative Prospect Theory in the Laboratory: A Reconsideration.” In Research in Experimental Economics: Models of Risk Preferences: Descriptive and Normative Challenges, edited by G. W. Harrison and D. Ross. Bingley, UK: Emerald.
Loomes, Graham, and Robert Sugden. 1998. “Testing Different Stochastic Specificationsof Risky Choice.” Economica 65 (260): 581–98.
Prelec, Drazen. 1998. “The Probability Weighting Function.” Econometrica, 497–527.
Quiggin, John. 1982. “A Theory of Anticipated Utility.” Journal of Economic Behavior & Organization 3 (4): 323–43.