# Chapter 7 MCMC Under the Hood

We present you with a choice: either read this chapter or skip it.
You do you.
Chapter 7 is truly optional at this point in your Bayesian studies.
In Chapter 6, we discussed the need for MCMC posterior simulation in advanced Bayesian analyses and how to achieve these ends using the **rstan** package.
You now have everything you need to move on and analyze more sophisticated Bayesian models in Unit 3 and beyond.
Mainly, you can drive from point a to point b without a working understanding of the engine.
However, whether now or later, it will be beneficial to your deeper understanding of Bayesian methodology to come back and learn more about how MCMC methods actually *work*.
We’ll take a peak under the hood in Chapter 7.

MCMC simulation is a rich field, spanning various algorithms that share a common goal: approximate the posterior model.
For example, the **rstan** package utilized throughout this book employs a **Hamiltonian Monte Carlo** algorithm.
The alternative **rjags** package for Bayesian modeling employs a **Gibbs sampling** algorithm.
Though the details differ, these algorithms are variations on the fundamental **Metropolis-Hastings** algorithm.
Given its foundational nature and the fact that studying all variations would require a book itself, we’ll focus our attention on the Metropolis-Hastings in Chapter 7.
On top of developing intuition for simulation process, you will learn how to *implement* the Metropolis-Hastings, *from scratch*, in R.
Though this implementation will require computer programming skills that are otherwise outside the scope of this book (eg: writing *functions* and *for loops*), we’ll keep the focus on the concepts so that all can follow along.

`library(tidyverse)`

- Build a strong
**conceptual understanding**of how Markov chain algorithms work. - Explore the foundational
**Metropolis-Hastings**algorithm. **Implement**the Metropolis-Hastings algorithm in the Normal-Normal and Beta-Binomial settings.

## 7.1 The big idea

Consider the following Normal-Normal model where \(Y\) is the numerical outcome from an experiment in which outcomes vary Normally around some unknown mean \(\mu\) with a standard deviation of 0.75:

\[\begin{equation} \begin{split} Y|\mu & \sim \text{N}(\mu, 0.75^2) \\ \mu & \sim \text{N}(0, 1^2) \\ \end{split} \tag{7.1} \end{equation}\]

The corresponding likelihood function \(L(\mu|y)\) and prior pdf \(f(\mu)\) for \(y \in (-\infty, \infty)\) and \(\mu \in (-\infty, \infty)\) are:

\[\begin{equation} L(\mu|y) = \frac{1}{\sqrt{2\pi \cdot 0.75^2}} \exp\bigg[{-\frac{(y-\mu)^2}{2 \cdot 0.75^2}}\bigg] \;\;\;\;\; \text{ and } \;\;\;\;\; f(\mu) = \frac{1}{\sqrt{2\pi}} \exp\bigg[{-\frac{\mu^2}{2}}\bigg] \; . \tag{7.2} \end{equation}\]

Suppose we observe an outcome \(Y = 6.25\). Then by our work in Chapter 5 and summarized by (5.14), the updated posterior model of \(\mu\) is Normal with mean 4 and standard deviation 0.6:

\[\mu | (Y = 6.25) \sim \text{N}(4, 0.6^2) \; .\]

As discussed in Chapter 6, if we *weren’t* able to specify this posterior model of \(\mu\) (just pretend), we could *approximate* it using MCMC simulation.
To get a sense for *how* this works, consider the results of a potential \(N = 5000\) iteration MCMC simulation (Figure 7.1).
Ultimately, you can think of the illustrated Markov chain \(\left\lbrace \mu^{(1)}, \mu^{(2)}, \ldots, \mu^{(N)} \right\rbrace\) as a **tour** around the range of posterior plausible values of \(\mu\) and yourself as the **tour manager**.
The trace plot (left) illustrates the sequence of tour stops, \(\mu^{(i)}\).
As tour manager, it’s your job to ensure that the density of tour stops in each \(\mu\) neighborhood is proportional to its posterior plausibility.
That is, the chain should spend more time touring values of \(\mu\) between 2 and 6, where the Normal posterior pdf is greatest, and less time visiting \(\mu\) values less than 2 or greater than 6, where the posterior drops off.
This feature is crucial to producing a collection of tour stops that accurately approximate the posterior, as does the tour here (right).

In designing an appropriate tour, the **Metropolis-Hastings algorithm** iterates through a two-step process.
Assuming the Markov chain is at location \(\mu^{(i)} = \mu\) at iteration or “tour stop” \(i\), the next tour stop \(\mu^{(i + 1)}\) is selected as follows:

**Step 1**: Propose a random location, \(\mu'\), for the next tour stop.**Step 2**: Decide whether to go to the proposed location (\(\mu^{(i+1)} = \mu'\)) or to stay at the current location for another iteration (\(\mu^{(i+1)} = \mu\)).

This might seem easy!
For example, if there were no constraints on your tour plan, you could simply draw proposed tour stops \(\mu'\) from posterior pdf \(f(\mu' | (y = 6.25))\) (Step 1) and then go there (Step 2).
This special case of the Metropolis-Hastings algorithm has a special name – **Monte Carlo**.

**Monte Carlo algorithm**

To construct a Monte Carlo sample for posterior pdf \(f(\mu|y)\), \(\left\lbrace \mu^{(1)}, \mu^{(2)}, ..., \mu^{(N)}\right\rbrace\), select each tour stop \(\mu^{(i)} = \mu\) as follows:

**Step 1: Propose a location.**

Draw a location \(\mu\) from the posterior model with pdf \(f(\mu | y)\).

**Step 2: Go there.**

The Monte Carlo algorithm is so convenient!
We can implement this algorithm by simply using `rnorm()`

to sample directly from the \(N(4,0.6^2)\) posterior.
The results are a nice independent sample from the posterior which, in turn, produces an accurate posterior approximation:

```
set.seed(84375)
<- data.frame(mu = rnorm(5000, mean = 4, sd = 0.6))
mc_tour ggplot(mc_tour, aes(x = mu)) +
geom_histogram(aes(y = ..density..), color = "white", bins = 15) +
stat_function(fun = dnorm, args = list(4, 0.6), color = "red")
```

**BUT** there’s a glitch.
Remember that we only need MCMC to approximate a Bayesian posterior when that posterior is too complicated to specify.
And if a posterior is too complicated to specify, it’s typically too complicated to directly sample or draw from as we did in our Monte Carlo tour above.
And if a posterior is too complicated to sample, we can’t implement the Monte Carlo algorithm (specifically, Step 1).
This is where the more general Metropolis-Hastings MCMC algorithm comes in.

Metropolis-Hastings relies on the fact that, even if we don’t know the posterior model, we *do* know that the posterior pdf is proportional to the product of the *known* prior pdf and likelihood function (7.2):

\[f(\mu | (y=4)) \propto f(\mu)L(\mu|(y=6.25)). \]

This *unnormalized* pdf is drawn in Figure 7.2.

Further, in **Step 1** of the Metropolis-Hastings algorithm, we must propose Markov chain tour stops by sampling from some model *other than* the unknown N\((4, 0.6^2)\) posterior.
To this end, we’ll utilize a **Uniform proposal model** with half-width \(w\).
Specifically, let \(\mu^{(i)} = \mu\) denote the current tour location.
*Conditioned* on this current location, we propose the next location by taking a random draw \(\mu'\) from the Uniform model which is centered at the current location \(\mu\) and ranges from \(\mu - w\) to \(\mu + w\):

\[\mu' | \mu \; \sim \; \text{Unif}(\mu - w, \mu + w)\]

with pdf

\[q(\mu'|\mu) = \frac{1}{2w} \;\; \text{ for } \;\; \mu' \in (\mu - w, \mu + w)\;.\]

Figure 7.3 illustrates that, using this method, proposals \(\mu'\) are equally likely to be any value between \(\mu - w\) and \(\mu + w\).

Figure 7.4 illustrates this idea in a specific scenario.
Suppose we’re utilizing a Uniform half-width of \(w = 1\) and that the Markov chain tour is at location \(\mu = 3\).
*Conditioned* on this current location, we’ll then propose the next location by taking a random draw from a \(\text{Unif}(3-w, 3+w) = \text{Unif}(2,4)\) model.
Thus the chosen half-width \(w = 1\) plays an important role here, defining the *neighborhood* of potential proposals.
Specifically, the proposed next stop is equally likely to be anywhere within the restricted neighborhood stretching from 2 to 4, around the chain’s current location of 3.

The whole idea behind Step 1 might seem goofy.
How can proposals drawn from a *Uniform* model produce a decent approximation of the *Normal* posterior model?!?
Well, they’re only *proposals*.
As with any other proposal in life, they can thankfully be *rejected* or *accepted*.
Mainly, if a proposed location \(\mu'\) is “bad,” we can reject it.
When we do, the chain sticks around at its current location \(\mu\) for at least another iteration.

**Step 2** of the Metropolis-Hastings algorithm provides a formal process for deciding whether to accept or reject a proposal.
Let’s first check in with our intuition about how this process *should* work.
Revisiting Figure 7.4, suppose that our random Unif\((2, 4)\) draw proposes that the chain move from its current location of 3 to 3.8.
Does this proposal seem *desirable* to you?
Well, sure! Notice that the (unnormalized) posterior plausibility of 3.8 is greater than that of 3.
Thus we *want* our Markov chain tour to spend more time exploring values of \(\mu\) around 3.8 than around 3.
Accepting the proposal gives us the chance to do so.
In contrast, if our random Unif\((2, 4)\) draw proposed that the chain move from 3 to 2.1, a location with very *low* posterior plausibility, we might be more hesitant.
Consider three possible rules for *automating* Step 2 in the following quiz.^{48}

Suppose we start our Metropolis-Hastings Markov chain tour at location \(\mu^{(1)} = 3\) and utilize a Normal proposal model in Step 1 of the algorithm. Consider three possible rules to follow in Step 2, deciding whether or not to accept a proposal:

- Rule 1:
*Never*accept the proposed location. - Rule 2:
*Always*accept the proposed location. - Rule 3: Only accept the proposed location
*if*its (unnormalized) posterior plausibility is greater than that of the current location.

Each rule above was used to generate one of the Markov chain tours below, with the visited location on the y-axis and the stop number on the x-axis. Match each rule to the correct tour.

The quiz above presented you with three *very poor* options for determining whether to accept or reject proposed tour stops in Step 2 of the Metropolis-Hastings algorithm.
**Rule 1** presents one extreme: *never* accept a proposal.
This is a terrible idea.
It results in the Markov chain remaining at the same location at every iteration (Tour 2), which would certainly produce a silly posterior approximation.
**Rule 2** presents the opposite extreme: *always* accept a proposal.
This results in a Markov chain which is not at all discerning in where it travels (Tour 3), completely ignoring the information we have from the (unnormalized) posterior model regarding the plausibility of a proposal.
For example, Tour 3 spends ample time exploring *implausible* posterior values \(\mu\) above 6.

**Rule 3** might seem like a reasonable balance between the two extremes: it neither always rejects nor always accepts proposals.
However, it’s still problematic.
Since this rule only accepts a proposed stop if its posterior plausibility is greater than that at the current location, it ends up producing a Markov chain similar to that of Tour 1 above.
Though this chain floats toward values near \(\mu = 4\) where the (unnormalized) posterior pdf is greatest, it then gets stuck there *forever*.

Putting all of this together, we’re closer to understanding how to make the Metropolis-Hastings algorithm work.
Upon proposing a next tour stop (Step 1), the process for rejecting or accepting this proposal (Step 2) must embrace the idea that the chain should spend more time exploring areas of high posterior plausibility *but* shouldn’t get stuck there forever:

**Step 1**: Propose a location, \(\mu'\), for the next tour stop by taking a draw from a proposal model.**Step 2**: Decide whether to go to the proposed location (\(\mu^{(i+1)} = \mu'\)) or to stay at the current location for another iteration (\(\mu^{(i+1)} = \mu\)) as follows.- If the (unnormalized) posterior plausibility of the proposed location \(\mu'\) is
*greater*than that of the current location \(\mu\), \(f(\mu')L(\mu'|y) > f(\mu)L(\mu|y)\),*definitely*go there. - Otherwise,
*maybe*go there.

- If the (unnormalized) posterior plausibility of the proposed location \(\mu'\) is

There are more details to fill in here (eg: what does it mean to “maybe” accept a proposal?!). We’ll do that next.

## 7.2 The Metropolis-Hastings algorithm

The Metropolis-Hastings algorithm for constructing a Markov chain tour \(\left\lbrace \mu^{(1)}, \mu^{(2)}, ..., \mu^{(N)}\right\rbrace\) is formalized here. We’ll break down the details below, exploring how to implement this algorithm in the context of our Normal-Normal Bayesian model.

**Metropolis-Hastings algorithm**

Conditioned on data \(y\), let parameter \(\mu\) have posterior pdf \(f(\mu | y) \propto f(\mu) L(\mu |y)\). A Metropolis-Hastings Markov chain for \(f(\mu|y)\), \(\left\lbrace \mu^{(1)}, \mu^{(2)}, ..., \mu^{(N)}\right\rbrace\), evolves as follows. Let \(\mu^{(i)} = \mu\) be the chain’s location at iteration \(i \in \{1,2,...,N-1\}\) and identify the next location \(\mu^{(i+1)}\) through a two-step process:

**Step 1: Propose a new location.**

Conditioned on the current location \(\mu\), draw a location \(\mu'\) from a proposal model with pdf \(q(\mu'|\mu)\).

**Step 2: Decide whether or not to go there.**Calculate the

**acceptance probability**, i.e. the probability of accepting the proposal:

\[\begin{equation} \alpha = \min\left\lbrace 1, \; \frac{f(\mu')L(\mu'|y)}{f(\mu)L(\mu|y)} \frac{q(\mu|\mu')}{q(\mu'|\mu)} \right\rbrace \tag{7.3} \end{equation}\]Flip a weighted coin. If it’s Heads, with probability \(\alpha\), go to the proposed location. If it’s Tails, with probability \(1 - \alpha\), stay:

\[\mu^{(i+1)} = \begin{cases} \mu' & \text{ with probability } \alpha \\ \mu & \text{ with probability } 1- \alpha \\ \end{cases}\]

Though the notation and details are new, the algorithm above matches the concepts we developed in the previous section.
First, recall that for our Normal-Normal simulation, we utilized a \(\mu'|\mu \; \sim \text{Unif}(\mu - w, \mu + w)\) proposal model in Step 1.
In fact, this is a bit lazy.
Though our Normal-Normal posterior model is defined for \(\mu \in (-\infty, \infty)\), the Uniform proposal model lives on a truncated neighborhood around the current chain location.
However, utilizing a Uniform proposal model *simplifies* the Metropolis-Hastings algorithm by the fact that it’s **symmetric**:

\[q(\mu' | \mu) = q(\mu | \mu') \;. \]

Specifically, when \(\mu\) and \(\mu'\) are within \(w\) units of each other,

\[q(\mu'|\mu) = \frac{1}{2w} = q(\mu|\mu') \; .\]

*Conceptually*, this symmetry means that the chance of proposing a chain move from \(\mu\) to \(\mu'\) is the same as proposing a move from \(\mu'\) to \(\mu\).
For example, the Uniform model is equally likely to propose a move from \(\mu = 3\) to \(\mu' = 3.8\) as a move from \(\mu = 3.8\) to \(\mu' = 3\).
We refer to this special case of the Metropolis-Hastings algorithm as simply the **Metropolis algorithm**.

**Metropolis algorithm**

The Metropolis algorithm is a special case of the Metropolis-Hastings in which the proposal model is *symmetric*. That is, the chance of proposing a move to \(\mu'\) from \(\mu\) is equal to that of proposing a move to \(\mu\) from \(\mu'\): \(q(\mu'|\mu) = q(\mu|\mu')\). Thus the acceptance probability (7.3) simplifies to

\[\begin{equation} \alpha = \min\left\lbrace 1, \; \frac{f(\mu')L(\mu'|y)}{f(\mu)L(\mu|y)} \right\rbrace \; . \tag{7.4} \end{equation}\]

Inspecting (7.4) reveals the nuances of Step 2, determining whether to accept or reject the proposed location drawn in Step 1. First, notice that we can rewrite the acceptance probability \(\alpha\) as follows:

\[\alpha = \min\left\lbrace 1, \; \frac{f(\mu')L(\mu'|y) / f(y)}{f(\mu)L(\mu|y) / f(y)} \right\rbrace = \min\left\lbrace 1, \; \frac{f(\mu'|y)}{f(\mu|y)} \right\rbrace \; .\]

That is, though we can’t calculate the posterior pdfs of \(\mu'\) and \(\mu\), their *ratio* is equivalent to that of the *unnormalized* posterior pdfs (which we *can* calculate).
Thus the probability of accepting a move from a current location \(\mu\) to a proposed location \(\mu'\) comes down to a comparison of their posterior plausibility: \(f(\mu'|y)\) versus \(f(\mu|y)\). There are two possible scenarios here:

Scenario 1: \(f(\mu'|y) \ge f(\mu|y)\)

When the posterior plausibility of \(\mu'\) is*at least*as great as that of \(\mu\), \(\alpha = 1\). Thus we’ll*definitely*move there.Scenario 2: \(f(\mu'|y) < f(\mu|y)\)

If the posterior plausibility of \(\mu'\) is*less*than that of \(\mu\), then\[\alpha = \frac{f(\mu'|y)}{f(\mu|y)} < 1 \; .\]

Thus we

*might*move there. Further, \(\alpha\) approaches 1 as \(f(\mu'|y)\) nears \(f(\mu|y)\). That is, the probability of accepting the proposal increases with the plausibility of \(\mu'\) relative to \(\mu\).

Scenario 1 is straightforward. We’ll always jump at the chance to move our tour to a more plausible posterior region. To wrap our minds around Scenario 2, a little R simulation is helpful. For example, suppose our Markov tour is currently at location “3”:

`<- 3 current `

Further, suppose we’re utilizing a Uniform proposal model with half-width \(w = 1\).
Then to determine the next tour stop, we first propose a location by taking a random draw from the Unif(`current`

- 1, `current`

+ 1) model centered at the `current`

value (Step 1):

```
set.seed(8)
<- runif(1, min = current - 1, max = current + 1)
proposal
proposal1] 2.933 [
```

Revisiting Figure 7.2, we observe that the posterior plausibility of the proposed tour location (2.93) is *slightly* less than that of the current location (3).
We can calculate the exact yet unnormalized posterior plausibility of these two \(\mu\) locations, \(f(\mu)L(\mu|(y=6.25))\), using `dnorm()`

to evaluate the N\((0,1^2)\) prior pdf \(f(\mu)\) *and* the N\((\mu, 0.75^2)\) likelihood function \(L(\mu|(y=6.25))\):

```
<- dnorm(proposal, 0, 1) * dnorm(6.25, proposal, 0.75)
proposal_plaus
proposal_plaus1] 1.625e-07
[<- dnorm(current, 0, 1) * dnorm(6.25, current, 0.75)
current_plaus
current_plaus1] 1.972e-07 [
```

It follows that, though not certain, the probability \(\alpha\) of accepting and subsequently moving to the proposed location is relatively high:

```
<- min(1, proposal_plaus / current_plaus)
alpha
alpha1] 0.824 [
```

To make the final determination, we flip a weighted coin which accepts the proposal with probability \(\alpha\) (0.824) and rejects the proposal with probability \(1 - \alpha\) (0.176).
In a random flip of this coin using the `sample()`

function, we *accept* the proposal, meaning that the `next_stop`

on the tour is 2.933:

```
<- sample(c(proposal, current), size = 1,
next_stop prob = c(alpha, 1-alpha))
next_stop1] 2.933 [
```

This is merely one of countless possible outcomes for a single iteration of the Metropolis-Hastings algorithm for our Normal posterior.
To streamline this process, we’ll write our own **R function**, `one_mh_iteration()`

, which implements a single Metropolis-Hastings iteration starting from any given `current`

tour stop and utilizing a Uniform proposal model with any given half-width `w`

.
If you are new to writing functions, we encourage you to focus on the structure over the details of this code:

```
<- function(w, current){
one_mh_iteration # STEP 1: Propose the next chain location
<- runif(1, min = current - w, max = current + w)
proposal
# STEP 2: Decide whether or not to go there
<- dnorm(proposal, 0, 1) * dnorm(6.25, proposal, 0.75)
proposal_plaus <- dnorm(current, 0, 1) * dnorm(6.25, current, 0.75)
current_plaus <- min(1, proposal_plaus / current_plaus)
alpha <- sample(c(proposal, current),
next_stop size = 1, prob = c(alpha, 1-alpha))
# Return the results
return(data.frame(proposal, alpha, next_stop))
}
```

Let’s try it out!
Running `one_mh_iteration()`

from our `current`

tour stop of `3`

under a seed of 8 reproduces the results from above:

```
set.seed(8)
one_mh_iteration(w = 1, current = 3)
proposal alpha next_stop1 2.933 0.824 2.933
```

If we use a seed of 83, the proposed next tour stop is 2.018 which has a low corresponding acceptance probability of 0.017:

```
set.seed(83)
one_mh_iteration(w = 1, current = 3)
proposal alpha next_stop1 2.018 0.01709 3
```

This makes sense!
We see from Figure 7.2 that the posterior plausibility of 2.018 is much lower than that of our current location of 3.
Though we do want to explore such extreme values, we don’t want to do so often.
In fact, we see that upon the flip of our coin, the proposal was rejected and the tour again visits location 3 on its `next_stop`

.
As a final example, we can confirm that when the posterior plausibility of the proposed next stop is greater than that of our current location, the acceptance probability is 1 and the proposal is automatically accepted:

```
set.seed(7)
one_mh_iteration(w = 1, current = 3)
proposal alpha next_stop1 3.978 1 3.978
```

## 7.3 Implementing the Metropolis-Hastings

We’ve spent much energy understanding how to implement *one iteration* of the Metropolis-Hastings algorithm.
That was the hard part!
To construct an entire Metropolis-Hastings tour of our N\((4, 0.6^2)\) posterior, we now just have to repeat this process over and over and over.
To this end, the `mh_tour()`

function below constructs a Metropolis-Hastings tour of any given length `N`

utilizing a Uniform proposal model with any given half-width `w`

:

```
<- function(N, w){
mh_tour # 1. Start the chain at location 3
<- 3
current
# 2. Initialize the simulation
<- rep(0, N)
mu
# 3. Simulate N Markov chain stops
for(i in 1:N){
# Simulate one iteration
<- one_mh_iteration(w = w, current = current)
sim
# Record next location
<- sim$next_stop
mu[i]
# Reset the current location
<- sim$next_stop
current
}
# 4. Return the chain locations
return(data.frame(iteration = c(1:N), mu))
}
```

Again, this code is a big leap if you’re new to *for loops* and functions.
Let’s focus on the general structure indicated by the `#`

comment blocks. In one call to this function:

- Start the tour at location 3 (a somewhat arbitrary choice).
- Initialize the tour simulation by setting up an empty vector in which to store the
`N`

tour stops (`mu`

). - Utilizing a
*for loop*, at each tour stop \(i\) from 1 to`N`

, run`one_mh_iteration()`

and store the resulting`next_stop`

in the`i`

th element of the`mu`

vector. Before closing the*for loop*, update the`current`

tour stop to serve as a starting point for the next iteration. - Return a data frame with the iteration numbers and corresponding tour stops
`mu`

.

To see this function in action, use `mh_tour()`

to simulate a Markov chain tour of length `N = 5000`

utilizing a Uniform proposal model with half-width \(w = 1\):

```
set.seed(84735)
<- mh_tour(N = 5000, w = 1) mh_simulation_1
```

A trace plot and histogram of the tour results are shown below.
Notably, this tour produces a remarkably accurate approximation of the N\((4, 0.6^2)\) posterior.
You might need to step out of the weeds we’ve waded into in order to reflect upon the mathemagic here.
Through a rigorous and formal process, we utilized *dependent* draws from a *Uniform* model to approximate a *Normal* model.
And it worked!

```
ggplot(mh_simulation_1, aes(x = iteration, y = mu)) +
geom_line()
ggplot(mh_simulation_1, aes(x = mu)) +
geom_histogram(aes(y = ..density..), color = "white", bins = 20) +
stat_function(fun = dnorm, args = list(4,0.6), color = "red")
```

## 7.4 Tuning the Metropolis-Hastings algorithm

Let’s wade into one final weed patch.
In implementing the Metropolis-Hastings algorithm above, we utilized a Uniform proposal model \(\mu' | \mu \; \sim \; \text{Unif}(\mu - w, \mu + w)\) with half-width \(w = 1\).
Our selection of \(w = 1\) defined the *neighborhood* or range of potential proposals (Figure 7.4).
Naturally, the choice of \(w\) impacts the performance of our Markov chain tour.
What do you *anticipate* would happen if we utilized a proposal model with very small neighborhoods (say \(w = 0.01\)) or very big neighborhoods (say \(w = 100\))?
Take the quiz here.^{49}

Examine trace plots and histograms of three separate Metropolis-Hastings tours of the N\((4,0.6^2)\) posterior below. Each tour utilizes a Uniform proposal model, but with different half-widths: \(w = 0.01\), \(w = 1\), or \(w = 100\). Match each tour to the \(w\) with which it was generated.

The punchline here is that the Metropolis-Hastings algorithm can work – we’ve seen as much – but we have to **tune** it.
In our example, this means that we have to pick an appropriate half-width \(w\) for the Uniform proposal model.
The tours in the quiz above illustrate the **Goldilocks challenge** this presents: we don’t want \(w\) to be too small or too large, but *just right*.^{50}
Tour 2 illustrates what can go wrong when \(w\) is too small (here \(w = 0.01\)).
You can reproduce these results with the following code:

```
set.seed(84735)
<- mh_tour(N = 5000, w = 0.01)
mh_simulation_2 ggplot(mh_simulation_2, aes(x = iteration, y = mu)) +
geom_line() +
lims(y = c(1.6, 6.4))
```

In this case, the Uniform proposal model places a tight neighborhood around the tour’s current location. Proposals will therefore tend to be very close to the current location with similar posterior plausibility, thus a very high probability of being accepted. This results in a Markov chain that is almost always moving, but takes such excruciatingly small steps that it will take an excruciatingly long time to explore the entire posterior plausible region of \(\mu\) values. Tour 1 illustrates the other extreme in which \(w\) is too large (here \(w = 100\)):

```
set.seed(7)
<- mh_tour(N = 5000, w = 100)
mh_simulation_3 ggplot(mh_simulation_3, aes(x = iteration, y = mu)) +
geom_line() +
lims(y = c(1.6,6.4))
```

By utilizing a Uniform proposal model with such a large neighborhood around the tour’s current location, proposals can be far flung – far outside the region of posterior plausible \(\mu\).
In this case, proposals will often be rejected, resulting in a tour which gets stuck at the same location for multiple iterations in a row (as evidenced by the flat parts of the trace plot).
Tour 3 presents a happy medium.
This is our original tour (`mh_simulation_1`

) which utilized \(w = 1\), a neighborhood size which is neither too small nor too big but just right.
The corresponding tour efficiently explores the posterior plausible region of \(\mu\), not getting stuck in one place or region for too long.

## 7.5 A Beta-Binomial example

Before moving on, let’s implement the Metropolis-Hastings algorithm for a Beta-Binomial model in which we observe \(Y=1\) success in 2 trials:^{51}

\[\begin{split} Y|\pi & \sim \text{Bin}(2, \pi) \\ \pi & \sim \text{Beta}(2, 3) \\ \end{split} \;\; \Rightarrow \;\; \pi | (Y = 1) \sim \text{Beta}(3, 4) \; .\]

Again, *pretend* we were only able to define the posterior pdf up to some missing normalizing constant,

\[f(\pi | (y = 1)) \propto f(\pi) L(\pi | (y = 1)),\]

where \(f(\pi)\) is the Beta(2,3) prior pdf and \(L(\pi | (y=1))\) is the \(\text{Bin}(2,\pi)\) likelihood function. Our goal then is to construct a Metropolis-Hastings tour of the posterior,

\[\left\lbrace \pi^{(1)}, \pi^{(2)}, \ldots, \pi^{(N)}\right\rbrace,\]

utilizing a two-step iterative process: (1) at the current tour stop \(\pi\), take a random draw \(\pi'\) from a proposal model; and then (2) decide whether or not to move there.
In step 1, the proposed tour stops would ideally be restricted to be between 0 and 1, just like \(\pi\) itself.
Thus we’ll tune and utilize a \(\text{Beta}(a,b)\) proposal model instead of a Normal proposal model (which can technically span the entire real line).
In doing so we’ll utilize a \(\text{Beta}(a,b)\) proposal model, thus a proposal strategy, which does not depend on current tour stop.
This special case of the Metropolis-Hastings is referred to as the **independence sampling algorithm**.

**Independence sampling algorithm**

The independence sampling algorithm is a special case of the Metropolis-Hastings in which the same proposal model is utilized at each iteration, *independent* of the chain’s current location. That is, from a current location \(\pi\), a proposal \(\pi'\) is drawn from a proposal model with pdf \(q(\pi')\) (as opposed to \(q(\pi'|\pi)\)). Thus the acceptance probability (7.3) simplifies to

\[\begin{equation} \alpha = \min\left\lbrace 1, \; \frac{f(\pi')L(\pi'|y)}{f(\pi)L(\pi|y)}\frac{q(\pi)}{q(\pi')} \right\rbrace \; . \tag{7.5} \end{equation}\]

We can rewrite \(\alpha\) for the independence sampler to emphasize its dependence on the relative posterior plausibility of proposal \(\pi'\) versus current location \(\pi\):

\[\alpha = \min\left\lbrace 1, \; \frac{f(\pi')L(\pi'|y)/f(y)}{f(\pi)L(\pi|y)/f(y)}\frac{q(\pi)}{q(\pi')} \right\rbrace = \min\left\lbrace 1, \; \frac{f(\pi'|y)}{f(\pi|y)}\frac{q(\pi)}{q(\pi')} \right\rbrace \; .\]

Since \(q(\pi)\) and \(q(\pi')\) aren’t typically equal, they do not cancel out of the formula. Rather, the inclusion of their ratio serves as a corrective measure: the probability \(\alpha\) of accepting a proposal \(\pi'\) decreases as \(q(\pi')\) increases. Conceptually, we place a penalty on common proposal values, ensuring that our tour doesn’t float toward these values simply because we keep proposing them.

Below we write a `one_iteration()`

function which implements a single iteration of this independence sampling algorithm, starting from any `current`

value \(\pi\) and utilizing a \(\text{Beta}(a,b)\) proposal model for any given `a`

and `b`

.
In the calculation of acceptance probability \(\alpha\) (`alpha`

), notice that we utilize `dbeta()`

to evaluate the prior and proposal pdfs as well as `dbinom()`

to evaluate the likelihood function given \(Y = 1\):

```
<- function(a, b, current){
one_iteration # STEP 1: Propose the next chain location
<- rbeta(1, a, b)
proposal
# STEP 2: Decide whether or not to go there
<- dbeta(proposal, 2, 3) * dbinom(1, 2, proposal) /
proposal_plaus dbeta(proposal, a, b)
<- dbeta(current, 2, 3) * dbinom(1, 2, current) /
current_plaus dbeta(current, a, b)
<- min(1, proposal_plaus / current_plaus)
alpha <- sample(c(proposal, current),
next_stop size = 1, prob = c(alpha, 1-alpha))
return(data.frame(proposal, alpha, next_stop))
}
```

Subsequently, we write a `betabin_tour()`

function which constructs an `N`

-length Markov chain tour, utilizing `one_iteration()`

to determine each stop:

```
<- function(N, a, b){
betabin_tour # 1. Start the chain at location 0.5
<- 0.5
current
# 2. Initialize the simulation
<- rep(0, N)
pi
# 3. Simulate N Markov chain stops
for(i in 1:N){
# Simulate one iteration
<- one_iteration(a = a, b = b, current = current)
sim
# Record next location
<- sim$next_stop
pi[i]
# Reset the current location
<- sim$next_stop
current
}
# 4. Return the chain locations
return(data.frame(iteration = c(1:N), pi))
}
```

Though we encourage you to try different tunings of the \(\text{Beta}(a,b)\) proposal model, we’ll keep it simple here, running a 5,000 step tour of the Beta-Binomial posterior by drawing each proposal from the \(\text{Beta}(1,1)\) (or \(\text{Unif}(0,1)\)) model:

```
set.seed(84735)
<- betabin_tour(N = 5000, a = 1, b = 1)
betabin_sim
# Plot the results
ggplot(betabin_sim, aes(x = iteration, y = pi)) +
geom_line()
ggplot(betabin_sim, aes(x = pi)) +
geom_histogram(aes(y = ..density..), color = "white") +
stat_function(fun = dbeta, args = list(3, 4), color = "blue")
```

You’re likely not surprised by now that this worked. The tour appears to be stable, random, and provides an excellent approximation of the Beta(3,4) posterior model (which in practice we wouldn’t have access to for comparison). That is, the Metropolis-Hastings algorithm allowed us to utilize draws from the Beta(1,1) proposal model to approximate our Beta(3,4) posterior. Cool.

## 7.6 Proof

This entire chapter and study of the Metropolis-Hastings algorithm has been extra.
A *proof* of how this algorithm actually works is *extra* extra.
Though we’ll skip a formal proof, we’ll examine the underlying spirit.
Throughout our proof, consider constructing a Metropolis-Hastings tour of some *generic* posterior pdf \(f(\mu|y)\) utilizing a proposal pdf \(q(\mu'|\mu)\).
For simplicity, let’s also assume \(\mu\) is a discrete variable.
In general, the Metropolis-Hastings tour moves between different pairs of values, \(\mu\) and \(\mu'\).
Specifically, the chain can move from \(\mu\) to \(\mu'\) with probability

\[P(\mu \to \mu') = P\left(\mu^{(i+1)}=\mu' \; | \; \mu^{(i)}=\mu \right)\]

or from \(\mu'\) to \(\mu\) with probability

\[P(\mu' \to \mu) = P\left(\mu^{(i+1)}=\mu \; | \; \mu^{(i)}=\mu' \right) \; .\]

In order for the Metropolis-Hastings algorithm to *work* (i.e. produce a good posterior approximation), it must preserve the relative posterior plausibility of \(\mu'\) and \(\mu\) in its tour.
That is, the relative probabilities of moving between these two values must satisfy

\[\frac{P(\mu \to \mu')}{P(\mu' \to \mu)} = \frac{f(\mu'|y)}{f(\mu|y)} \; .\]

This is indeed the case for any Metropolis-Hastings algorithm. And we can prove it. First, we need to calculate the movement probabilities \(P(\mu \to \mu')\) and \(P(\mu' \to \mu)\). Consider the former. For the tour to move from \(\mu\) to \(\mu'\), two things must happen: we need to propose \(\mu'\) and then accept the proposal. The associated probabilities of these two events are described by \(q(\mu'|\mu)\) and \(\alpha\) (7.3), respectively. It follows that the chain moves from \(\mu\) to \(\mu'\) with probability

\[P(\mu \to \mu') = q(\mu'|\mu) \cdot \min\left\lbrace 1, \; \frac{f(\mu'|y)}{f(\mu|y)} \frac{q(\mu|\mu')}{q(\mu'|\mu)} \right\rbrace \; .\]

Similarly, the chain moves from \(\mu'\) to \(\mu\) with probability

\[P(\mu' \to \mu) = q(\mu|\mu') \cdot \min\left\lbrace 1, \; \frac{f(\mu|y)}{f(\mu'|y)}\frac{q(\mu'|\mu)}{q(\mu|\mu')} \right\rbrace \;.\]

To simplify the ratio of movement probabilities \(P(\mu \to \mu') / P(\mu' \to \mu)\), consider the two potential scenarios:

Scenario 1: \(f(\mu'|y) \ge f(\mu|y)\)

When \(\mu'\) is at least as plausible than \(\mu\), \(P(\mu \to \mu')\) simplifies to \(q(\mu'|\mu)\) and

\[\frac{P(\mu \to \mu')}{P(\mu' \to \mu)} = \frac{q(\mu'|\mu)}{q(\mu|\mu') \frac{f(\mu|y)}{f(\mu'|y)}\frac{q(\mu'|\mu)}{q(\mu|\mu')} } = \frac{f(\mu'|y)}{f(\mu|y)} \; .\]Scenario 2: \(f(\mu'|y) < f(\mu|y)\)

When \(\mu'\) is less plausible than \(\mu\), \(P(\mu' \to \mu)\) simplifies to \(q(\mu|\mu')\) and\[\frac{P(\mu \to \mu')}{P(\mu' \to \mu)} = \frac{q(\mu'|\mu) \frac{f(\mu'|y)}{f(\mu|y)}\frac{q(\mu|\mu')}{q(\mu'|\mu)}}{q(\mu|\mu')} = \frac{f(\mu'|y)}{f(\mu|y)} \; .\]

Thus no matter the scenario, the Metropolis-Hastings algorithm preserves the relative likelihood of any pair of values \(\mu\) and \(\mu'\)!

## 7.7 Variations on the theme

We’ve merely considered one MCMC algorithm here, the Metropolis-Hastings.
Though this is a powerful and foundational algorithm, it has its limits.
In future chapters, we’ll muddy the waters, building upon the complexity of our Bayesian models.
The dimensions of these models will grow to include multiple parameters, as opposed to the single \(\mu\) parameter in our Normal-Normal model and \(\pi\) parameter in our Beta-Binomial model.
In turn, these multivariate models require multi-dimensional Markov chain tours.
If we were to simulate these models from scratch, tuning our Metropolis-Hastings algorithm would become more and more difficult.
Though Metropolis-Hastings isn’t a one-size-fits-all algorithm for *every* possible Bayesian modeling scenario, it *does* provide the powerful and necessary foundation for a more flexible set of MCMC tools: adaptive Metropolis-Hastings algorithm, Gibbs sampler, Hamiltonian Monte Carlo.
Again, studying these alternative algorithms would require a book itself.
Instead, from here on out, we’ll rely on **rstan** with a fresh confidence in what’s going on under the hood.

## 7.8 Chapter summary

In Chapter 7, you built a strong **conceptual understanding** of the foundational **Metropolis-Hastings** MCMC algorithm.
You also implemented this algorithm to study the familiar Normal-Normal and Beta-Binomial models.
Whether in these relatively simple one-parameter model settings, or in more complicated model settings, the Metropolis-Hastings algorithm produces an approximate sample from the posterior by iterating between two steps:

**Propose a new chain location**by drawing from a proposal pdf which is, perhaps, dependent upon the current location.**Determine whether to accept the proposal.**Simply put, whether or not we accept a proposal depends on how favorable its posterior plausibility is relative to the posterior plausibility of the current location.

## 7.9 Exercises

Chapter 7 continued to shift focus towards more computational methods. Change can be uncomfortable, but it also spurs growth. As you work through these exercises, know that if you are struggling a bit, that just means you are engaging with some new concepts and you are putting in the work to grow as a Bayesian. Our advice: verbalize what you do and don’t understand. Don’t rush yourself. Take a break and come back to exercises that you feel stuck on. Work with a buddy. Ask for help, and help others when you can.

### 7.9.1 Conceptual exercises

**Exercise 7.1 (Getting to know Monte Carlo)**

- What are the steps to the Monte Carlo Algorithm?
- Name one pro and one con for the Monte Carlo Algorithm.

**Exercise 7.2 (Getting to know Metropolis-Hastings)**

- What are the steps to the Metropolis-Hastings Algorithm?
- What’s the difference between the Monte Carlo and Metropolis-Hastings algorithms?
- What’s the difference between the Metropolis and Metropolis-Hastings algorithms?
- Name one pro and one con for the Metropolis-Hastings algorithm relative to the Metropolis.

**Exercise 7.3 (Metropolis vs. Metropolis-Hastings)**Select all of the correct endings to this sentence:

*The difference between the Metropolis and Metropolis Hastings algorithms is that…*

- Metropolis accepts all proposals, whereas Metropolis-Hastings accepts only some proposals.
- Metropolis uses a symmetric proposal model, whereas a Metropolis-Hastings proposal model is not necessarily symmetric.
- The acceptance probability is simpler to calculate for the Metropolis than for the Metropolis-Hastings.
- The acceptance probability is different for the Metropolis and Metropolis-Hastings.

**Exercise 7.4 (Tuning the Metropolis-Hastings)**In this exercise you will consider how to tune your Metropolis-Hastings proposal model.

- Draw a trace plot for a Metropolis Hastings tour where the proposal model is Normal with a very small variance.
- Why is it problematic if our proposal model defines the neighborhood too narrowly, such as when the proposal model is Normal with a very small variance?
- Draw a trace plot for a Metropolis Hastings tour where the proposal model is Normal with a very large variance.
- Why is it problematic if our proposal model defines the neighborhood too widely?
- Draw a trace plot for a Metropolis Hastings tour where the proposal model is Normal with a variance is neither too small or too large.
- Describe how you would go about finding an appropriate variance for a Normal proposal model.

**Exercise 7.5 (Independence sampling: True or False)**Identify whether the below statements about independence sampling are True or False. If False, explain.

- The proposal model depends on the current value of the chain.
- The proposal model is the same every time.
- It is a special case of the Metropolis-Hastings algorithm.
- It is a special case of Metropolis algorithm.

**Exercise 7.6 (Proposing a new location)**In each situation below, complete Step 1 of the Metropolis-Hastings Algorithm. That is, starting from the given current chain value \(\lambda^{(i)} = \lambda\) and with

`set.seed(84735)`

each time, use the given proposal model to draw a \(\lambda'\) proposal value for the next chain value \(\lambda^{(i+1)}\). NOTE: The functions `rnorm()`

and `runif()`

will come in handy.
- \(\lambda = 4.6\), \(\lambda'|\lambda \sim N(\lambda, 2^2)\)
- \(\lambda = 2.1\), \(\lambda'|\lambda \sim N(\lambda, 7^2)\)
- \(\lambda = 8.9\), \(\lambda'|\lambda \sim \text{Unif}(\lambda-2, \lambda+2)\)
- \(\lambda = 1.2\), \(\lambda'|\lambda \sim \text{Unif}(\lambda-0.5, \lambda+0.5)\)
- \(\lambda = 7.7\), \(\lambda'|\lambda \sim \text{Unif}(\lambda-3, \lambda+3)\)

**Exercise 7.7 (Calculate the acceptance probability: Part I)**Suppose that a Markov chain is currently at \(\lambda^{(i)}=2\) and that the proposal for \(\lambda^{(i+1)}\) is \(\lambda'=2.1\). For each pair of unnormalized posterior pdf \(f(\lambda) L(\lambda |y)\) and proposal pdf \({q(\lambda'|\lambda)}\), calculate the acceptance probability \(\alpha\) used in Step 2 of the Metropolis-Hastings algorithm (7.3).

- \(f(\lambda) L(\lambda |y)= \lambda^{-2}\), \({\lambda'|\lambda} \sim \text{Normal}(\lambda, 1)\) with pdf \(q(\lambda'|\lambda)\)
- \(f(\lambda) L(\lambda |y)= e^{\lambda}\), \({\lambda'|\lambda} \sim \text{Normal}(\lambda, 0.2)\) with pdf \(q(\lambda'|\lambda)\)
- \(f(\lambda) L(\lambda |y)= e^{-1.2\lambda}\), \({\lambda'|\lambda} \sim \text{Unif}(\lambda - 0.5, \lambda + 0.5)\) with pdf \(q(\lambda'|\lambda)\)
- \(f(\lambda) L(\lambda |y)= e^{-\lambda^4}\), \({\lambda'|\lambda} \sim \text{Exp}(\lambda)\) with pdf \(q(\lambda'|\lambda)\)
- For which of these scenarios is there a 100% acceptance probability? Explain why we’d certainly want to accept \(\lambda'\) in these scenarios.

**Exercise 7.8 (Calculate the acceptance probability: Part II)**Repeat the above exercise, this time assuming that a Markov chain is currently at \(\lambda^{(i)} = 1.8\) and that the proposal for \(\lambda^{(i+1)}\) is \(\lambda' = 1.6\).

- \(f(\lambda) L(\lambda |y)= \lambda^{-1}\), \({\lambda'|\lambda} \sim \text{Normal}(\lambda, 2)\) with pdf \(q(\lambda'|\lambda)\)
- \(f(\lambda) L(\lambda |y)= e^{3\lambda}\), \({\lambda'|\lambda} \sim \text{Normal}(\lambda, 0.5)\) with pdf \(q(\lambda'|\lambda)\)
- \(f(\lambda) L(\lambda |y)= e^{-1.9\lambda}\), \({\lambda'|\lambda} \sim \text{Unif}(\lambda-0.3, \lambda+0.3)\) with pdf \(q(\lambda'|\lambda)\)
- \(f(\lambda) L(\lambda |y)= e^{-\lambda^4}\), \({\lambda'|\lambda} \sim \text{Exp}(\lambda)\) with pdf \(q(\lambda'|\lambda)\)
- For which of these scenarios is there a 100% acceptance probability? Explain why we’d certainly want to accept \(\lambda'\) in these scenarios.

### 7.9.2 Practice: Gamma-Poisson simulation

In the next set of exercises, return to the Bayesian model setting of (7.1),

\[\begin{split} Y|\lambda & \sim \text{Pois}(\lambda) \\ \lambda & \sim \text{Gamma}(1, 1) \\ \end{split}\]

Assume we observe \(Y = 0\) and wish to construct a Metropolis-Hastings simulation of the corresponding posterior model for \(\lambda\).

**Exercise 7.9 (One iteration with a Normal proposal model)**The function

`one_mh_iteration()`

from the text utilizes a Normal proposal model, \(\lambda'|\lambda \sim N(\lambda,\sigma^2)\). Starting from a current value of \(\lambda = 1\) and using `set.seed(84735)`

, run the code below and comment on the returned `proposal`

, `alpha`

, and `next_stop`

values.
`one_mh_iteration(sigma = 1, current = 1)`

`one_mh_iteration(sigma = 0.1, current = 1)`

`one_mh_iteration(sigma = 2, current = 1)`

`one_mh_iteration(sigma = 0.01, current = 1)`

`one_mh_iteration(sigma = 3, current = 1)`

**Exercise 7.10 (An entire tour with a Normal proposal model)**Implement the Metropolis-Hastings function

`mh_tour()`

defined in Section 7.3 to construct tours of \(\lambda\) under each of the following scenarios. Construct trace plots and histograms for each tour.
- 50 iterations, \(\sigma = 3\)
- 50 iterations, \(\sigma = 0.01\)
- 1000 iterations, \(\sigma = 3\)
- 1000 iterations, \(\sigma = 0.01\)
- Contrast the trace plots in parts a and b. Explain why changing \(\sigma\) has this effect.
- Consider the results in parts c and d. Is the \(\sigma\) value as important when the number of iterations is much larger? Explain.

**Exercise 7.11 (Changing the proposal model) **
For this exercise, modify `one_mh_iteration()`

to create a new function, `one_mh_iteration_unif()`

, which utilizes a Uniform proposal model with width \(w\):

\[\lambda'|\lambda \sim \text{Unif}\left(\lambda - \frac{w}{2}, \; \lambda + \frac{w}{2} \right)\]

Subsequently, starting from a current value of \(\lambda = 1\) and`set.seed(84735)`

, run this function under each setting below. Comment on the returned `proposal`

, `alpha`

, and `next_stop`

values.
`one_mh_iteration_unif(w = 1, current = 1)`

`one_mh_iteration_unif(w = 0.1, current = 1)`

`one_mh_iteration_unif(w = 2, current = 1)`

`one_mh_iteration_unif(w = 0, current = 1)`

`one_mh_iteration_unif(w = 8, current = 1)`

**Exercise 7.12 (Metropolis-Hastings tour with Uniform proposals) **
Upon completing the previous exercise, modify `mh_tour()`

to create a new function, `mh_tour_unif()`

, which constructs a chain of \(\lambda\) values using a Uniform proposal model with width \(w\). Subsequently, using `set.seed(84735)`

, run this function under each setting below and construct a trace plot of the chain.

- 20 iterations, \(w = 10\)
- 20 iterations, \(w = 0.01\)
- 1000 iterations, \(w = 0.01\)
- 1000 iterations, \(w = 0.1\)
- 1000 iterations, \(w = 1\)
- 1000 iterations, \(w = 3\)
- Contrast the trace plots in a and b. Explain in simple terms why changing the width of the Uniform proposal model causes these differences.
- Based on the trace plots in c through f, which width do you think is the best for the Uniform proposal model? Explain your reasoning.

**Exercise 7.13 (Change the Gamma prior)**The

`one_mh_iteration(sigma, current)`

function is tailored to the Gamma-Poisson model (7.1) with a Gamma(1,1) prior for \(\lambda\). For this exercise, create a new function `new_mh_iteration(sigma, current, s, r)`

which can utilize *any*Gamma(s,r) prior. Subsequently, starting from a current value of \(\lambda = 1\) and using

`set.seed(84735)`

, run this function under each setting below. Comment on the resulting `proposal`

, `alpha`

, and `next_stop`

values.
`new_mh_iteration(sigma=1, current=1, s=2, r=2)`

`new_mh_iteration(sigma=0.1, current=1, s=3, r=1)`

`new_mh_iteration(sigma=2, current=1, s=1, r=4)`

`new_mh_iteration(sigma=3, current=1, s=0.5, r=0.5)`

`new_mh_iteration(sigma=4, current=1, s=20, r=15)`

**Exercise 7.14 (A new Gamma-Poisson)**Consider a new Gamma-Poisson model in which rate \(\lambda\) has a Gamma(1,0.1) prior and you observe one Poisson data point, \(Y = 4\). In this exercise, you will simulate the posterior model of \(\lambda\) using an

**independence sampler**.

- Which of the following would make the most reasonable proposal model to use in your independence sampler for \(\lambda\): Normal, Uniform, Beta, or Exponential? Hint: recall that \(\lambda > 0\).
- Using the proposal model that you identified in part a, simulate a tour of 1000 \(\lambda\) values. Tune your algorithm until you are satisfied and produce a trace plot of your final results.
- This is a situation where we
*can*derive the exact posterior model of \(\lambda\). What is it? - Plot a histogram of your tour and overlay the exact model. How good is your Markov chain approximation?

### 7.9.3 Practice: Simulating more Bayesian models

In this section you will apply your new simulation skills to answer your *own* research question using your own data!
In each case, identify a question for which data collection takes fewer than 5 minutes of effort.
(That part is supposed to be fun, not overwhelming.)

**Exercise 7.15 (Using your own data: Beta-Binomial Metropolis-Hastings)**

- Identify a question that could be answered with a Beta-Binomial model for \(\pi\), some probability or proportion of interest.
*Eg: Consider scenarios for which data collection would be easy, such as the proportion of emails that you reply to within 24 hours, the proportion of phone calls that you answer, etc.* - Tune and describe a Beta prior model for \(\pi\).
- Collect and record your data here.
- Below you will simulate the posterior model of \(\pi\) using a Metropolis-Hastings algorithm. What is a reasonable proposal model to use in this algorithm? Explain.
- Simulate a tour of 2000 \(\pi\) values. Tune your proposal model until you are satisfied and construct the resulting trace plot.
- Plot a histogram of your tour, and comment on the posterior model approximation within the context of your question.

**Exercise 7.16 (Using your own data: Normal-Normal MCMC)**

- Identify a question that could be answered with a Normal-Normal model for \(\mu\), some average of interest.
*Eg: Consider scenarios for which data collection would be easy, such as the average high temperature this time of year, your average hourly screen time, etc.* - Tune and describe a Normal prior model for \(\mu\).
- Collect and record your data here.
- Below you will simulate the posterior model of \(\mu\) using a Metropolis-Hastings algorithm. What is a reasonable proposal model to use in this algorithm? Explain.
- Simulate a tour of 2000 \(\mu\) values. Tune your proposal model until you are satisfied and construct the resulting trace plot.
- Plot a histogram of your tour, and comment on the posterior model approximation within the context of your question.

**Exercise 7.17 (Using your own data: Normal-Normal Independence Sampler)**Repeat the previous exercise, this time using an independence sampler.

Answers: Rule 1 produces Tour 2, Rule 2 produces Tour 3, and Rule 3 produces Tour 1.↩︎

Answers: Tour 1 uses \(w = 100\), Tour 2 uses \(w = 0.01\), Tour 3 uses \(w = 1\).↩︎

This technical term was inspired by the “Goldilocks and the three bears” fairy tale in which, for some reason, a child (Goldilocks) taste tests three different bears’ porridge while trespassing in the bears’ house. The first bowl of porridge is too hot, the second is too cold, and the third is just right.↩︎

By Chapter 3, the Beta posterior has parameters 3 (\(\alpha + y = 2 + 1\)) and 4 (\(\beta + n - y = 3 + 2 - 1\)).↩︎