Chapter 7 MCMC Under the Hood

Life is full of choices. We present you with one here: either read this chapter or skip it. Of course, you have this choice with any chapter. You do you. The current chapter is truly optional at this point in your Bayesian studies. In Chapter 6, we discussed the need for MCMC posterior simulation in advanced Bayesian analyses and how to achieve these ends using the rstan package. With this tool in hand, you have everything you need to move on and analyze more sophisticated Bayesian models in Unit 3 and beyond. Mainly, you can drive from point a to point b without a working understanding of the engine. However, whether it’s now or later, it will be beneficial to your deeper understanding of Bayesian methodology to come back and learn more about how MCMC methods actually work. We’ll take a peak under the hood in Chapter 7 through the exploration of the fundamental Metropolis-Hastings MCMC algorithm.

We’ll start our exploration by applying Metropolis-Hastings to the following Gamma-Poisson model where \(Y\) is the number of events in a one hour period, where events happen at rate \(\lambda\):

\[\begin{equation} \begin{split} Y|\lambda & \sim \text{Pois}(\lambda) \\ \lambda & \sim \text{Gamma}(1, 1) \\ \end{split} \tag{7.1} \end{equation}\]

These models have corresponding likelihood function \(L(\lambda|y)\) and prior pdf \(f(\lambda)\) for \(y \in \{0,1,2,...\}\) and \(\lambda > 0\):

\[\begin{equation} L(\lambda|y) = \frac{\lambda^ye^{-\lambda}}{y!} \;\;\;\;\; \text{ and } \;\;\;\;\; f(\lambda) = e^{-\lambda} \; . \tag{7.2} \end{equation}\]

Suppose we observe \(Y = 0\). Then by our work in Chapter 5, the updated posterior model of \(\lambda\) is Gamma with parameters 1 (\(s + Y = 1 + 0\)) and 2 (\(r + n = 1 + 1\)):

\[\lambda | (Y = 0) \sim \text{Gamma}(1, 2) \; .\]

As discussed in Chapter 6, if we weren’t able to specify this posterior model of \(\lambda\), we could approximate it using MCMC simulation (via rstan). The results of a potential \(N = 10000\) iteration MCMC simulation are shown in Figure 7.1. Ultimately, you can think of the illustrated Markov chain \(\left\lbrace \lambda^{(1)}, \lambda^{(2)}, \ldots, \lambda^{(N)} \right\rbrace\) as a tour around the range of posterior plausible values of \(\lambda\) and yourself as the tour manager. The trace plot (left) illustrates the sequence of tour stops, \(\lambda^{(i)}\). As tour manager, it’s your job to ensure that the density of tour stops in each \(\lambda\) neighborhood is proportional to its posterior plausibility. That is, the chain should more time touring values of \(\lambda\) between 0 and 1 where the Gamma(1,2) posterior pdf \(f(\lambda | (y=0))\) is greatest and less time visiting \(\lambda\) values greater than 2 where \(f(\lambda | (y=0))\) drops off. This feature is crucial to producing a collection of tour stops that accurately approximate the posterior, as does the tour here (right).

A trace plot (left) and histogram (right) of a 10,000 iteration MCMC simulation of the Gamma(1,2) posterior. The posterior pdf is superimposed in red (right).

FIGURE 7.1: A trace plot (left) and histogram (right) of a 10,000 iteration MCMC simulation of the Gamma(1,2) posterior. The posterior pdf is superimposed in red (right).

The remaining question is not whether we can construct a Markov chain tour that accurately approximates the posterior (we can!), but how. In Chapter 7 you’ll explore the MCMC Metropolis-Hastings algorithm . Though Metropolis-Hastings doesn’t provide a one-size-fits-all algorithm for every possible Bayesian modeling scenario, it provides the powerful and necessary foundation for a more flexible set of MCMC tools. You will learn how to implement the Metropolis-Hastings, from scratch, in R. Though this implementation will require computer programming skills that are otherwise outside the scope of this book (eg: writing functions and for loops), we’ll keep the focus on the concepts so that all can follow along. Most of the code will rely on base R functions but we will rely on the tidyverse for visualizations and wrangling:

library(tidyverse)
  • Build a strong conceptual understanding of how Markov chain algorithms work.
  • Explore the foundational Metropolis-Hastings algorithm.
  • Implement the Metropolis-Hastings algorithm in the Gamma-Poisson and Beta-Binomial settings.

7.1 The big idea

Recall that as a Markov chain tour manager, your goal is to construct a tour \(\left\lbrace \lambda^{(1)}, \lambda^{(2)}, \ldots, \lambda^{(N)} \right\rbrace\) of the posterior pdf \(f(\lambda | (y=0))\). You can meet this goal with the Metropolis-Hastings algorithm which, at the high level, iterates through a two-step process. Assuming the Markov chain is at location \(\lambda^{(i)} = \lambda\) at iteration or “tour stop” \(i\), the next tour stop \(\lambda^{(i + 1)}\) is selected as follows:

  • Step 1: Propose a random location, \(\lambda'\), for the next tour stop.
  • Step 2: Decide whether to go to the proposed location (\(\lambda^{(i+1)} = \lambda'\)) or to stay at the current location for another iteration (\(\lambda^{(i+1)} = \lambda\)).

This might seem easy! For example, if there were no constraints on your tour plan, you could simply draw proposed tour stops \(\lambda'\) from posterior pdf \(f(\lambda' | (y = 0))\) (Step 1) and then go there (Step 2). This special case of the Metropolis-Hastings algorithm has a special name – it’s the Monte Carlo algorithm.

Monte Carlo algorithm

To construct a Monte Carlo sample for posterior pdf \(f(\lambda|y)\), \(\left\lbrace \lambda^{(1)}, \lambda^{(2)}, ..., \lambda^{(N)}\right\rbrace\), select each tour stop \(\lambda^{(i)} = \lambda\) as follows:

  • Step 1: Propose a location.
    Draw a location \(\lambda\) from the posterior model with pdf \(f(\lambda | y)\).
  • Step 2: Go there.

The Monte Carlo algorithm is so convenient! In fact, without naming it, we utilized the Monte Carlo algorithm in Chapter 2 to simulate various posteriors. If implemented here, this algorithm would produce a nice independent sample from the Gamma(1,2) posterior which would, in turn, provide an accurate approximation of this posterior (assuming the sample size is big enough). BUT there’s a glitch. Remember that we only need to MCMC approximate a Bayesian posterior when that posterior is too complicated to specify. And if a posterior is too complicated to specify, it’s typically too complicated for us to directly sample or draw from. And if a posterior is too complicated to sample, we can’t implement the Monte Carlo algorithm (specifically, Step 1). This is where the more general Metropolis-Hastings MCMC algorithm comes in handy.

Let’s again pretend that we don’t know the posterior model to be Gamma(1,2) and that the only information we do know is that the posterior pdf is proportional to the product of the known prior pdf and likelihood function defined by (7.2),

\[f(\lambda | (y=0)) \propto f(\lambda)L(\lambda|(y=0)). \]

This unnormalized pdf is drawn in Figure 7.2.

The unnormalized Gamma(1,2) posterior pdf.

FIGURE 7.2: The unnormalized Gamma(1,2) posterior pdf.

Not knowing the complete normalized posterior, we must propose the next tour stop by sampling from some model other than the Gamma(1,2) in Step 1 of the Metropolis-Hastings algorithm. A common choice is to utilize a Normal proposal model. Figure 7.3 captures the idea in pictures. Suppose the Markov chain tour is at location “1” in the \(i\)th iteration: \(\lambda^{(i)} = 1\). Conditioned on this current location, we propose the next location by taking a random draw from a Normal model centered at this location with standard deviation of \(\sigma = 0.3\), \(N(1, \sigma^2)\). In general, the conditional dependence of a proposal \(\lambda'\) on the chain’s current location \(\lambda^{(i)} = \lambda\) is modeled by

\[\lambda' | \lambda \; \sim \; N(\lambda, \sigma^2) \; .\]

The chosen standard deviation of \(\sigma = 0.3\) plays an important role, defining the neighborhood of potential proposals. Revisiting Figure 7.3, notice that the proposed next stop will likely be within the neighborhood stretching from 0.1 to 1.9 (\(1 \pm 3\sigma\)) around the chain’s current location of 1. It’s even more likely to be between 0.7 and 1.3 where the Normal pdf is greatest (\(1 \pm \sigma\)).

A visual representation of Step 1 in the Metropolis-Hastings algorithm for the Gamma(1,2) posterior. A Normal proposal model centered at the current chain location of 1 with a standard deviation of \(\sigma = 0.3\) (black curve) is drawn against the unnormalized posterior pdf.

FIGURE 7.3: A visual representation of Step 1 in the Metropolis-Hastings algorithm for the Gamma(1,2) posterior. A Normal proposal model centered at the current chain location of 1 with a standard deviation of \(\sigma = 0.3\) (black curve) is drawn against the unnormalized posterior pdf.

The whole idea behind Step 1 of the Metropolis-Hastings algorithm might seem goofy. How can proposals drawn from a Normal model produce a decent approximation of the Gamma posterior model?!? Well, they’re only proposals. As with any other proposal in life, they can thankfully be rejected or accepted. Mainly, if a proposed location \(\lambda'\) is “bad,” we can reject it. When we do, the chain sticks around at its current location \(\lambda\) for at least another iteration.

Step 2 of the Metropolis-Hastings algorithm provides a formal process for deciding whether to accept or reject a proposal. Let’s first check in with our intuition about how this formal process should work. Revisiting Figure 7.3, suppose that our random \(N(1, 0.3^2)\) draw proposes that the chain move from its current location of 1 to 0.6. Does this proposal seem desirable to you? Well, sure! Notice that the (unnormalized) posterior plausibility of 0.6 is greater than that of 1. Thus we want our Markov chain tour to spend more time exploring values of \(\lambda\) around 0.6 than around 1. Accepting the proposal gives us the chance to do so. In contrast, if our random \(N(1, 0.3^2)\) draw proposed that the chain move from 1 to a 2, a location with very low posterior plausibility, we might be more hesitant. Consider three possible rules for automating Step 2 in the following quiz.51

Suppose we start our Metropolis-Hastings Markov chain tour at location \(\lambda^{(1)} = 1\) and utilize a Normal proposal model in Step 1 of the algorithm. Consider three possible rules to follow in Step 2, deciding whether or not to accept a proposal:

  • Rule 1: Never accept the proposed location.
  • Rule 2: Always accept the proposed location.
  • Rule 3: Only accept the proposed location if its (unnormalized) posterior plausibility is greater than that of the current location.

Each rule above was used to generate one of the Markov chain tours below. Match each rule to the correct tour.

The quiz above presented you with three very poor options for determining whether to accept or reject proposed tour stops in Step 2 of the Metropolis-Hastings algorithm. Rule 1 presents one extreme: never accept a proposal. This is a terrible idea. It results in the Markov chain remaining at the same location at every iteration (Tour 2), which would certainly produce a silly approximation of the posterior. Rule 2 presents the opposite extreme: always accept a proposal. This results in a Markov chain which is not at all discerning in where it travels (Tour 3), completely ignoring the information we have from the (unnormalized) posterior model regarding the plausibility of a proposal. For example, Tour 3 spends ample time exploring \(\lambda\) values above 2 and even negative \(\lambda\) values despite the fact that these are implausible and impossible posterior values, respectively.

Rule 3 might seem like a reasonable balance between the two extremes: it neither always rejects or always accepts proposals. However, it’s still problematic. Since this rule only accepts a proposed stop if its posterior plausibility is greater than that at the current location, it ends up producing a Markov chain similar to that of Tour 1 above. Though this chain floats toward values near \(\lambda = 0\) where the (unnormalized) posterior pdf is greatest, it then gets stuck there forever.

Putting all of this together, we’re closer to understanding how to make the Metropolis-Hastings algorithm work. Upon proposing a next tour stop (Step 1), the process for rejecting or accepting this proposal (Step 2) must embrace the idea that the chain should spend more time exploring areas of high posterior plausibility but shouldn’t get stuck there forever:

  • Step 1: Propose a location, \(\lambda'\), for the next tour stop by taking a draw from a proposal model.
  • Step 2: Decide whether to go to the proposed location (\(\lambda^{(i+1)} = \lambda'\)) or to stay at the current location for another iteration (\(\lambda^{(i+1)} = \lambda\)) as follows.
    • If the (unnormalized) posterior plausibility of the proposed location \(\lambda'\) is greater than that of the current location \(\lambda\), \(f(\lambda')L(\lambda'|y) > f(\lambda)L(\lambda|y)\), definitely go there.
    • Otherwise, maybe go there.

There are more details to fill in here (eg: what does it mean to “maybe” accept a proposal?!). We’ll do that next.

7.2 The Metropolis-Hastings algorithm

The Metropolis-Hastings algorithm for constructing a Markov chain tour \(\left\lbrace \lambda^{(1)}, \lambda^{(2)}, ..., \lambda^{(N)}\right\rbrace\) is formalized here. We’ll break down the details below, exploring how to implement this algorithm in the context of our Gamma-Poisson Bayesian model.

Metropolis-Hastings algorithm

Conditioned on data \(y\), let parameter \(\lambda\) have posterior pdf \(f(\lambda | y) \propto f(\lambda) L(\lambda |y)\). A Metropolis-Hastings Markov chain for \(f(\lambda|y)\), \(\left\lbrace \lambda^{(1)}, \lambda^{(2)}, ..., \lambda^{(N)}\right\rbrace\), evolves as follows. Let \(\lambda^{(i)} = \lambda\) be the location of the chain at iteration \(i \in \{1,2,...,N-1\}\) and identify the next location \(\lambda^{(i+1)}\) through a two-step process:

  • Step 1: Propose a new location.
    Conditioned on the current location \(\lambda\), draw a location \(\lambda'\) from a proposal model with pdf \(q(\lambda'|\lambda)\).
  • Step 2: Decide whether or not to go there.
    • Calculate the acceptance probability, ie. the probability of accepting the proposal:
      \[\begin{equation} \alpha = \min\left\lbrace 1, \; \frac{f(\lambda')L(\lambda'|y)}{f(\lambda)L(\lambda|y)} \frac{q(\lambda|\lambda')}{q(\lambda'|\lambda)} \right\rbrace \tag{7.3} \end{equation}\]

    • Flip a weighted coin. If it’s Heads, with probability \(\alpha\), go to the proposed location. If it’s Tails, with probability \(1 - \alpha\), stay:

      \[\lambda^{(i+1)} = \begin{cases} \lambda' & \text{ with probability } \alpha \\ \lambda & \text{ with probability } 1- \alpha \\ \end{cases}\]

Though the notation and details are new, the algorithm above matches the concepts we developed in the previous section. First, recall that for our Gamma-Poisson simulation, we utilized a \(\lambda'|\lambda \; \sim N(\lambda, 0.3^2)\) proposal model in Step 1. In fact, this is a bit lazy. Though our Gamma-Poisson posterior model is defined for positive \(\lambda\), the Normal proposal model lives on the entire real number line thus sometimes proposes that the chain explore negative \(\lambda\) (which we automatically reject). However, utilizing a Normal proposal model simplifies the Metropolis-Hastings algorithm by the fact that it’s symmetric:

\[q(\lambda' | \lambda) = q(\lambda | \lambda') \;. \]

Conceptually, this symmetry means that the chance of proposing a chain move from \(\lambda\) to \(\lambda'\) is the same as proposing a move from \(\lambda'\) to \(\lambda\). Figure 7.4 highlights this concept, here demonstrating that the Normal model is equally likely to propose a move from 1 to 0.6 as a move from 0.6 to 1.

Utilizing a symmetric Normal proposal model, the Metropolis-Hastings algorithm is equally likely to propose that the tour move from 1 to 0.6 as to propose that the tour move from 0.6 to 1.

FIGURE 7.4: Utilizing a symmetric Normal proposal model, the Metropolis-Hastings algorithm is equally likely to propose that the tour move from 1 to 0.6 as to propose that the tour move from 0.6 to 1.

A proposal model needn’t be symmetric. However, when it is, we refer to this special case of the Metropolis-Hastings algorithm as simply the Metropolis algorithm.

Metropolis algorithm

The Metropolis algorithm is a special case of the Metropolis-Hastings in which the proposal model is symmetric. That is, the chance of proposing a move to \(\lambda'\) from \(\lambda\) is equal to that of proposing a move to \(\lambda\) from \(\lambda'\): \(q(\lambda'|\lambda) = q(\lambda|\lambda')\). Thus the acceptance probability (7.3) simplifies to

\[\begin{equation} \alpha = \min\left\lbrace 1, \; \frac{f(\lambda')L(\lambda'|y)}{f(\lambda)L(\lambda|y)} \right\rbrace \; . \tag{7.4} \end{equation}\]

Inspecting (7.4) reveals the nuances of Step 2, determining whether to accept or reject the proposed location drawn in Step 1. First, notice that we can rewrite the acceptance probability \(\alpha\) as follows:

\[\alpha = \min\left\lbrace 1, \; \frac{f(\lambda')L(\lambda'|y) / f(y)}{f(\lambda)L(\lambda|y) / f(y)} \right\rbrace = \min\left\lbrace 1, \; \frac{f(\lambda'|y)}{f(\lambda|y)} \right\rbrace \; .\]

That is, the ratio of unnormalized posterior pdfs (which we can calculate) is equivalent to the ratio of posterior pdfs (which we can’t calculate). Thus the probability of accepting a move from a current location \(\lambda\) to a proposed location \(\lambda'\) comes down to a comparison of their posterior plausibility: \(f(\lambda'|y)\) versus \(f(\lambda|y)\). There are two possible scenarios here:

  • Scenario 1: \(f(\lambda'|y) \ge f(\lambda|y)\)
    When the posterior plausibility of \(\lambda'\) is at least as great as that of \(\lambda\), \(\alpha = 1\). Thus we’ll definitely move there.

  • Scenario 2: \(f(\lambda'|y) < f(\lambda|y)\)
    If the posterior plausibility of \(\lambda'\) is less than that of \(\lambda\), then

    \[\alpha = \frac{f(\lambda'|y)}{f(\lambda|y)} < 1 \; .\]

    Thus we might move there. Further, \(\alpha\) approaches 1 as \(f(\lambda'|y)\) nears \(f(\lambda|y)\). That is, the probability of accepting the proposal increases with the plausibility of \(\lambda'\) relative to \(\lambda\).

Scenario 1 is straightforward. We’ll always jump at the chance to move our tour to a more plausible region of the posterior. To wrap our minds around Scenario 2, a little R simulation is helpful. For example, suppose our Markov tour is currently at location “1”:

current <- 1

To determine the next tour stop, we first propose a location by taking a random draw from the Normal model centered at the current value with standard deviation 0.3 (Step 1):

set.seed(4)
proposal <- rnorm(1, mean = current, sd = 0.3)
proposal
[1] 1.065

Revisiting Figure 7.2, we observe that the posterior plausibility of the proposed tour location (1.065) is slightly less than that of the current location (1). In fact, we can calculate the exact (unnormalized) posterior plausibility of these two \(\lambda\) locations, \(f(\lambda)L(\lambda|(y=0))\), utilizing dgamma() to evaluate the Gamma prior pdf \(f(\lambda)\) and dpois() to evaluate the Poisson likelihood function \(L(\lambda|(y=0))\):

proposal_plaus <- dgamma(proposal,1,1) * dpois(0,proposal)
proposal_plaus
[1] 0.1188
current_plaus  <- dgamma(current,1,1) * dpois(0,current)
current_plaus 
[1] 0.1353

It follows that, though not certain, the probability \(\alpha\) of accepting and subsequently moving to the proposed location is relatively high:

alpha <- min(1, proposal_plaus / current_plaus)
alpha
[1] 0.878

To make the final determination, we flip a weighted coin which accepts the proposal with probability \(\alpha\) (0.878) and rejects the proposal with probability \(1 - \alpha\) (0.122). In a random flip of this coin using the sample() function, we accept the proposal, meaning that the next_stop on the tour is 1.065:

next_stop <- sample(c(proposal, current), size = 1, 
  prob = c(alpha, 1-alpha))
next_stop
[1] 1.065

This is merely one of countless possible outcomes for a single iteration of the Metropolis-Hastings algorithm for our Gamma(1,2) posterior. To streamline this process, we’ll write our own R function, one_mh_iterion(), which implements a single Metropolis-Hastings iteration starting from any given current tour stop and utilizing a Normal proposal model with any given sigma parameter (\(\sigma\)). If you are new to writing functions, we encourage you to focus on the structure over the details of this code:

one_mh_iteration <- function(sigma, current){
 # STEP 1: Propose the next chain location
 proposal <- rnorm(1, mean = current, sd = sigma)
  
 # STEP 2: Decide whether or not to go there
 if(proposal < 0) {alpha <- 0}
 else {
  proposal_plaus <- dgamma(proposal, 1, 1) * dpois(0, proposal)
  current_plaus  <- dgamma(current, 1, 1) * dpois(0, current)
  alpha <- min(1, proposal_plaus / current_plaus)
 }
 next_stop <- sample(c(proposal, current), 
  size = 1, prob = c(alpha, 1-alpha))
  
 # Return the results
 return(data.frame(proposal, alpha, next_stop))
}

Let’s try it out! Running one_mh_iteration() from our current tour stop of 1 under a seed of 4 reproduces the results from above:

set.seed(4)
one_mh_iteration(sigma = 0.3, current = 1)
  proposal alpha next_stop
1    1.065 0.878     1.065

If we use a seed of 7, the proposed next tour stop is 1.686 which has a low corresponding acceptance probability of 0.254:

set.seed(7)
one_mh_iteration(sigma = 0.3, current = 1)
  proposal  alpha next_stop
1    1.686 0.2535         1

This makes sense! We see from Figure 7.2 that the posterior plausibility of 1.686 is much lower than that of our current location of 1. Though we do want to explore such extreme values, we don’t want to do so often. In fact, we see that upon the flip of our coin, the proposal was rejected and the tour again visits location 1 on its next_stop. As a final example, we can confirm that when the posterior plausibility of the proposed next stop is greater than that of our current location, the acceptance probability is 1 and the proposal is automatically accepted:

set.seed(1)
one_mh_iteration(sigma = 0.3, current = 1)
  proposal alpha next_stop
1   0.8121     1    0.8121

7.3 Implementing the Metropolis-Hastings

We’ve spent a lot of energy understanding how to implement one iteration of the Metropolis-Hastings algorithm. That was the hard part! To construct an entire Metropolis-Hastings tour of our Gamma(1,2) posterior, we now just have to repeat this process over and over and over. To this end, the mh_tour() function below constructs a Metropolis-Hastings tour of any given length N utilizing a Normal proposal model with any given standard deviation sigma:

mh_tour <- function(N, sigma){
  # 1. Start the chain at location 1
  current <- 1

  # 2. Initialize the simulation
  lambda <- rep(0, N)

  # 3. Simulate N Markov chain stops
  for(i in 1:N){    
    # Simulate one iteration
    sim <- one_mh_iteration(sigma = sigma, current = current)
    
    # Record next location
    lambda[i] <- sim$next_stop
    
    # Reset the current location
    current <- sim$next_stop
  }
  
  # 4. Return the chain locations
  return(data.frame(iteration = c(1:N), lambda))
}

Again, this code is a big leap if you’re new to for loops and functions. Let’s focus on the general structure indicated by the # comment blocks. In one call to this function:

  1. Start the tour at location 1 (a somewhat arbitrary choice).
  2. Initialize the tour simulation by setting up an empty vector in which to store the N tour stops (lambda).
  3. Utilizing a for loop, at each tour stop \(i\) from 1 to N, run one_mh_iteration() and store the resulting next_stop in the ith element of the lambda vector. Before closing the for loop, update the current tour stop to serve as a starting point for the next iteration.
  4. Return a data frame with the iteration numbers and corresponding tour stops lambda.

To see this function in action, use mh_tour() to simulate a Markov chain tour of length N = 5000 utilizing a Normal proposal model with standard deviation \(\sigma = 0.3\):

set.seed(84735)
mh_simulation_1 <- mh_tour(N = 5000, sigma = 0.3)

A trace plot and histogram of the tour results are shown below. Notably, this tour produces a remarkably accurate approximation of the Gamma(1,2) posterior. You might need to step out of the weeds we’ve waded into in order to reflect upon the mathemagic here. Through a rigorous and formal process, we utilized dependent draws from a Normal model to approximate a Gamma model. And it worked!

ggplot(mh_simulation_1, aes(x = iteration, y = lambda)) + 
  geom_line()

ggplot(mh_simulation_1, aes(x = lambda)) + 
  geom_histogram(color = "white")

7.4 Tuning the Metropolis-Hastings algorithm

Let’s wade into one final weed patch. In implementing the Metropolis-Hastings algorithm above, we utilized a Normal proposal model with standard deviation \(\sigma = 0.3\): \(\lambda' | \lambda \; \sim \; N(\lambda, \sigma^2)\). Our selection of \(\sigma = 0.3\) defined the neighborhood or range of potential proposals (Figure 7.3). Naturally, the choice of \(\sigma\) impacts the performance of our Markov chain tour. What do you anticipate would happen if we utilized a proposal model with very small neighborhoods (say \(\sigma = 0.01\)) or very big neighborhoods (say \(\sigma = 50\))? Take the quiz here.52

Examine trace plots and histograms of three separate Metropolis-Hastings tours of the Gamma(1,2) posterior below. Each tour utilizes a Normal proposal model, but with different standard deviations: \(\sigma = 0.01\), \(\sigma = 0.3\), or \(\sigma = 100\). Match each tour to the \(\sigma\) with which it was generated.

The punchline here is that the Metropolis-Hastings algorithm can work – we’ve seen as much – but we have to tune it. In our example, this means that we have to pick an appropriate standard deviation \(\sigma\) for the Normal proposal model. The tours in the quiz above illustrate the Goldilocks challenge this presents: we don’t want \(\sigma\) to be too small or too large, but just right.53 Tour 2 illustrates what can go wrong when \(\sigma\) is too small (here \(\sigma = 0.01\)). You can reproduce these results with the following code:

set.seed(84735)
mh_simulation_2 <- mh_tour(N = 5000, sigma = 0.01)
ggplot(mh_simulation_2, aes(x = iteration, y = lambda)) + 
  geom_line() + 
  lims(y = c(0, 4))

In this case, the Normal proposal model places a tight neighborhood around the tour’s current location. Proposals will therefore tend to be very close to the current location with similar posterior plausibility, thus a very high probability of being accepted. This results in a Markov chain that is almost always moving, but takes such excruciatingly small steps that it will take an excruciatingly long time to explore the entire posterior plausible region of \(\lambda\) values. Tour 1 illustrates the other extreme in which \(\sigma\) is too large (here \(\sigma = 100\)):

set.seed(84735)
mh_simulation_3 <- mh_tour(N = 5000, sigma = 100)
ggplot(mh_simulation_3, aes(x = iteration, y = lambda)) + 
  geom_line() + 
  lims(y = c(0, 4))

By utilizing a Normal proposal model with such a large neighborhood around the tour’s current location, proposals can be far flung – even outside the region of posterior plausible \(\lambda\). In this case, proposals will often be rejected, resulting in a tour which gets stuck at the same location for multiple iterations in a row (as evidenced by the flat parts of the trace plot). Tour 3 presents a happy medium. This is our original tour (mh_simulation_1) which utilized \(\sigma = 0.3\), a neighborhood size which is neither too small nor too big but just right. The corresponding tour efficiently explores the posterior plausible region of \(\lambda\), not getting stuck in one place or region for too long.

7.5 A Beta-Binomial example

Before moving on, let’s implement the Metropolis-Hastings algorithm for a Beta-Binomial model in which we observe \(Y=1\) success in 2 trials:54

\[\begin{split} Y|\pi & \sim \text{Bin}(2, \pi) \\ \pi & \sim \text{Beta}(2, 3) \\ \end{split} \;\; \Rightarrow \;\; \pi | (Y = 1) \sim \text{Beta}(3, 4) \; .\]

Again, pretend we were only able to define the posterior pdf up to some missing normalizing constant,

\[f(\pi | (y = 1)) \propto f(\pi) L(\pi | (y = 1)),\]

where \(f(\pi)\) is the Beta(2,3) prior pdf and \(L(\pi | (y=1))\) is the \(\text{Bin}(2,\pi)\) likelihood function. Our goal then is to construct a Metropolis-Hastings tour of the posterior,

\[\left\lbrace \pi^{(1)}, \pi^{(2)}, \ldots, \pi^{(N)}\right\rbrace,\]

utilizing a two-step iterative process: (1) at the current tour stop \(\pi\), take a random draw \(\pi'\) from a proposal model; and then (2) decide whether or not to move there. In step 1, the proposed tour stops would ideally be restricted to be between 0 and 1, just like \(\pi\) itself. Thus we’ll tune and utilize a \(\text{Beta}(a,b)\) proposal model instead of a Normal proposal model (which can technically span the entire real line). In doing so we’ll utilize a \(\text{Beta}(a,b)\) proposal model, thus a proposal strategy, which does not depend on current tour stop. This special case of the Metropolis-Hastings is referred to as the independence sampling algorithm.

Independence sampling algorithm

The independence sampling algorithm is a special case of the Metropolis-Hastings in which the same proposal model is utilized at each iteration, independent of the chain’s current location. That is, from a current location \(\pi\), a proposal \(\pi'\) is drawn from a proposal model with pdf \(q(\pi')\) (as opposed to \(q(\pi'|\pi)\)). Thus the acceptance probability (7.3) simplifies to

\[\begin{equation} \alpha = \min\left\lbrace 1, \; \frac{f(\pi')L(\pi'|y)}{f(\pi)L(\pi|y)}\frac{q(\pi)}{q(\pi')} \right\rbrace \; . \tag{7.5} \end{equation}\]

We can rewrite \(\alpha\) for the independence sampler to emphasize its dependence upon the relative posterior plausibility of a proposal \(\pi'\) versus current location \(\pi\):

\[\alpha = \min\left\lbrace 1, \; \frac{f(\pi')L(\pi'|y)/f(y)}{f(\pi)L(\pi|y)/f(y)}\frac{q(\pi)}{q(\pi')} \right\rbrace = \min\left\lbrace 1, \; \frac{f(\pi'|y)}{f(\pi|y)}\frac{q(\pi)}{q(\pi')} \right\rbrace \; .\]

Since \(q(\pi)\) and \(q(\pi')\) aren’t typically equal, they do not cancel out of the formula. Rather, the inclusion of their ratio serves as a corrective measure: the probability \(\alpha\) of accepting a proposal \(\pi'\) decreases as \(q(\pi')\) increases. Conceptually, we place a penalty on common proposal values, ensuring that our tour doesn’t float toward these values simply because we keep proposing them.

Below we write a one_iteration() function which implements a single iteration of this independence sampling algorithm, starting from any current value \(\pi\) and utilizing a \(\text{Beta}(a,b)\) proposal model for any given a and b. In the calculation of acceptance probability \(\alpha\) (alpha), notice that we utilize dbeta() to evaluate the prior and proposal pdfs as well as dbinom() to evaluate the likelihood function given \(Y = 1\):

one_iteration <- function(a, b, current){
 # STEP 1: Propose the next chain location
 proposal <- rbeta(1, a, b)
  
 # STEP 2: Decide whether or not to go there
 proposal_plaus <- dbeta(proposal, 2, 3) * dbinom(1, 2, proposal) / 
  dbeta(proposal, a, b)
 current_plaus  <- dbeta(current, 2, 3) * dbinom(1, 2, current) / 
   dbeta(current, a, b)
 alpha <- min(1, proposal_plaus / current_plaus)
 next_stop <- sample(c(proposal, current), 
  size = 1, prob = c(alpha, 1-alpha))
  
 return(data.frame(proposal, alpha, next_stop))
}

Subsequently, we write a betabin_tour() function which constructs an N-length Markov chain tour, utilizing one_iteration() to determine each stop:

betabin_tour <- function(N, a, b){
  # 1. Start the chain at location 0.5
  current <- 0.5

  # 2. Initialize the simulation
  pi <- rep(0, N)
  
  # 3. Simulate N Markov chain stops
  for(i in 1:N){    
    # Simulate one iteration
    sim <- one_iteration(a = a, b = b, current = current)
    
    # Record next location
    pi[i] <- sim$next_stop
    
    # Reset the current location
    current <- sim$next_stop
  }
  
  # 4. Return the chain locations
  return(data.frame(iteration = c(1:N), pi))
}

Though we encourage you to try different tunings of the \(\text{Beta}(a,b)\) proposal model, we’ll keep it simple here, running a 5,000 step tour of the Beta-Binomial posterior by drawing each proposal from the \(\text{Beta}(1,1)\) (or \(\text{Unif}(0,1)\)) model:

set.seed(84735)
betabin_sim <- betabin_tour(N = 5000, a = 1, b = 1)

# Plot the results
ggplot(betabin_sim, aes(x = iteration, y = pi)) + 
  geom_line()
ggplot(betabin_sim, aes(x = pi)) + 
  geom_histogram(aes(y = ..density..), color = "white") + 
  stat_function(fun = dbeta, args = list(3, 4), color = "blue")

You’re likely not surprised by now that this worked. The tour appears to be stable, random, and provides an excellent approximation of the Beta(3,4) posterior model (which in practice we wouldn’t have access to for comparison). That is, the Metropolis-Hastings algorithm allowed us to utilize draws from the Beta(1,1) proposal model to approximate our Beta(3,4) posterior. Cool.

7.6 Proof

This entire chapter and study of the Metropolis-Hastings algorithm has been extra. A proof of how this algorithm actually works is extra extra. Though we’ll skip a formal proof, we’ll examine the underlying spirit. Please read on if you’re curious.

Throughout our proof, consider constructing a Metropolis-Hastings tour of some generic posterior pdf \(f(\lambda|y)\) utilizing a proposal pdf \(q(\lambda'|\lambda)\). For simplicity, let’s also assume \(\lambda\) is a discrete variable. In general, the Metropolis-Hastings tour moves between different pairs of values, \(\lambda\) and \(\lambda'\). Specifically, the chain can move from \(\lambda\) to \(\lambda'\) with probability

\[P(\lambda \to \lambda') = P\left(\lambda^{(i+1)}=\lambda' \; | \; \lambda^{(i)}=\lambda \right)\]

or from \(\lambda'\) to \(\lambda\) with probability

\[P(\lambda' \to \lambda) = P\left(\lambda^{(i+1)}=\lambda \; | \; \lambda^{(i)}=\lambda' \right) \; .\]

In order for the Metropolis-Hastings algorithm to work (ie. produce a good approximation of the posterior), it must preserve the relative posterior plausibility of \(\lambda'\) and \(\lambda\) in its tour. That is, the relative probabilities of moving between these two values must satisfy

\[\frac{P(\lambda \to \lambda')}{P(\lambda' \to \lambda)} = \frac{f(\lambda'|y)}{f(\lambda|y)} \; .\]

This is indeed the case for any Metropolis-Hastings algorithm. And we can prove it. First, we need to calculate the movement probabilities \(P(\lambda \to \lambda')\) and \(P(\lambda' \to \lambda)\). Consider the former. In order for the tour to move from \(\lambda\) to \(\lambda'\), two things must happen: we need to propose \(\lambda'\) and then accept the proposal. The associated probabilities of these two events are described by \(q(\lambda'|\lambda)\) and \(\alpha\) (7.3), respectively. It follows that the chain moves from \(\lambda\) to \(\lambda'\) with probability

\[P(\lambda \to \lambda') = q(\lambda'|\lambda) \cdot \min\left\lbrace 1, \; \frac{f(\lambda'|y)}{f(\lambda|y)} \frac{q(\lambda|\lambda')}{q(\lambda'|\lambda)} \right\rbrace \; .\]

Similarly, the chain moves from \(\lambda'\) to \(\lambda\) with probability

\[P(\lambda' \to \lambda) = q(\lambda|\lambda') \cdot \min\left\lbrace 1, \; \frac{f(\lambda|y)}{f(\lambda'|y)}\frac{q(\lambda'|\lambda)}{q(\lambda|\lambda')} \right\rbrace \;.\]

To simplify the ratio of movement probabilities \(P(\lambda \to \lambda') / P(\lambda' \to \lambda)\), consider the two potential scenarios:

  1. Scenario 1: \(f(\lambda'|y) \ge f(\lambda|y)\)
    When \(\lambda'\) is at least as plausible than \(\lambda\), \(P(\lambda \to \lambda')\) simplifies to \(q(\lambda'|\lambda)\) and
    \[\frac{P(\lambda \to \lambda')}{P(\lambda' \to \lambda)} = \frac{q(\lambda'|\lambda)}{q(\lambda|\lambda') \frac{f(\lambda|y)}{f(\lambda'|y)}\frac{q(\lambda'|\lambda)}{q(\lambda|\lambda')} } = \frac{f(\lambda'|y)}{f(\lambda|y)} \; .\]

  2. Scenario 2: \(f(\lambda'|y) < f(\lambda|y)\)
    When \(\lambda'\) is less plausible than \(\lambda\), \(P(\lambda' \to \lambda)\) simplifies to \(q(\lambda|\lambda')\) and

    \[\frac{P(\lambda \to \lambda')}{P(\lambda' \to \lambda)} = \frac{q(\lambda'|\lambda) \frac{f(\lambda'|y)}{f(\lambda|y)}\frac{q(\lambda|\lambda')}{q(\lambda'|\lambda)}}{q(\lambda|\lambda')} = \frac{f(\lambda'|y)}{f(\lambda|y)} \; .\]

Thus no matter the scenario, the Metropolis-Hastings algorithm preserves the relative likelihood of any pair of values \(\lambda\) and \(\lambda'\)!

7.7 Variations on the theme

Markov chain Monte Carlo simulation is a rich field. We’ve merely considered one MCMC algorithm here, the Metropolis-Hastings. Though this is a powerful and foundational algorithm, it has its limits. In future chapters, we’ll muddy the waters, building upon the sophistication (thus complexity) of our Bayesian models. The dimensions of these models will grow to include multiple parameters, as opposed to the single \(\lambda\) parameter in our Gamma-Poisson model and \(\pi\) parameter in our Beta-Binomial model. In turn, these multivariate models require multi-dimensional Markov chain tours. If we were to simulate these models from scratch, tuning our Metropolis-Hastings algorithm would become more and more difficult. Other algorithms, each of which build upon the Metropolis-Hastings, can offer powerful alternatives: adaptive Metropolis-Hastings algorithm, Gibbs sampler, Hamiltonian Monte Carlo. In fact, the Hamilton Monte Carlo algorithm provides the basis of simulations conducted in rstan.

Studying these alternative algorithms would require a book itself. Instead, from here on out, we’ll rely on RStan with a fresh confidence in what’s going on under the hood.

7.8 Chapter summary

In Chapter 7, you built a strong conceptual understanding of the foundational Metropolis-Hastings MCMC algorithm. You also implemented this algorithm to study the familiar Gamma-Poisson and Beta-Binomial models. Whether in these relatively simple one-parameter model settings, or in more complicated model settings, the Metropolis-Hastings algorithm produces an approximate sample from the posterior by iterating between two steps:

  1. Propose a new chain location by drawing from a proposal pdf which is, perhaps, dependent upon the current location.
  2. Determine whether to accept the proposal. Simply put, whether or not we accept a proposal depends on how favorable its posterior plausibility is relative to the posterior plausibility of the current location.

7.9 Exercises

In Chapter 7 we have continued to shift focus towards more computational methods. Change can be uncomfortable, but it also spurs on new growth. As you work through these exercises, know that if you are struggling a bit, that just means you are engaging with some new concepts and methods and you are putting in the work to grow as a Bayesian. Our advice: verbalize what you do and don’t understand. Don’t rush yourself. Take a break and come back to concepts/exercises that you feel stuck on. Work with a buddy. Ask for help, and help others when you can.

7.9.1 Conceptual exercises

Exercise 7.1 (Getting to know Monte Carlo)
  1. What are the steps to the Monte Carlo Algorithm?
  2. Name one pro and one con for the Monte Carlo Algorithm.
Exercise 7.2 (Getting to know Metropolis-Hastings)
  1. What are the steps to the Metropolis-Hastings Algorithm?
  2. What’s the difference between the Monte Carlo and Metropolis-Hastings algorithms?
  3. What’s the difference between the Metropolis and Metropolis-Hastings algorithms?
  4. Name one pro and one con for the Metropolis-Hastings algorithm relative to the Metropolis.
Exercise 7.3 (Metropolis vs. Metropolis-Hastings) Select all of the correct endings to this sentence: The difference between the Metropolis and Metropolis Hastings algorithms is that…
  1. Metropolis accepts all proposals, whereas Metropolis-Hastings accepts only some proposals.
  2. Metropolis uses a symmetric proposal model, whereas a Metropolis-Hastings proposal model is not necessarily symmetric.
  3. the acceptance probability is simpler to calculate for the Metropolis than for the Metropolis-Hastings.
  4. the acceptance probability is different for the Metropolis and Metropolis-Hastings.
Exercise 7.4 (Tuning the Metropolis-Hastings) Not all Metropolis-Hastings tours provide you with the same results. In this exercise you will consider how to tune your Metropolis-Hastings proposal model.
  1. Draw a trace plot for a Metropolis Hastings tour where the proposal model is Normal with a very small variance.
  2. Why is it problematic if our proposal model defines the neighborhood too narrowly, such as when the proposal model is Normal with a very small variance?
  3. Draw a trace plot for a Metropolis Hastings tour where the proposal model is Normal with a very large variance.
  4. Why is it problematic if our proposal model defines the neighborhood too widely?
  5. Draw a trace plot for a Metropolis Hastings tour where the proposal model is Normal with a variance is neither too small or too large.
  6. Describe how you would go about finding an appropriate variance for a Normal proposal model.
Exercise 7.5 (Independence sampling: True or False) Identify whether the below statements about independence sampling are True or False. If False, explain.
  1. The proposal model depends on the current value of the chain.
  2. The proposal model is the same every time.
  3. It is a special case of the Metropolis-Hastings algorithm.
  4. It is a special case of Metropolis algorithm.
Exercise 7.6 (Proposing a new location) In each situation below, complete Step 1 of the Metropolis-Hastings Algorithm. That is, starting from the given current chain value \(\lambda^{(i)} = \lambda\) and with set.seed(84735), use the given proposal model to draw a \(\lambda'\) proposal value for the next chain value \(\lambda^{(i+1)}\). NOTE: The functions rnorm() and runif() will come in handy.
  1. \(\lambda = 4.6\), \(\lambda'|\lambda \sim N(\lambda, 2^2)\)
  2. \(\lambda = 2.1\), \(\lambda'|\lambda \sim N(\lambda, 7^2)\)
  3. \(\lambda = 8.9\), \(\lambda'|\lambda \sim \text{Unif}(\lambda-2, \lambda+2)\)
  4. \(\lambda = 1.2\), \(\lambda'|\lambda \sim \text{Unif}(\lambda-0.5, \lambda+0.5)\)
  5. \(\lambda = 7.7\), \(\lambda'|\lambda \sim \text{Unif}(\lambda-3, \lambda+3)\)
Exercise 7.7 (Calculate the acceptance probability: Part I) Suppose that a Markov chain is currently at \(\lambda^{(i)}=2\) and that the proposal for \(\lambda^{(i+1)}\) is \(\lambda'=2.1\). For each pair of unnormalized posterior pdf \(f(\lambda) L(\lambda |y)\) and proposal pdf \({q(\lambda'|\lambda)}\), calculate the acceptance probability \(\alpha\) used in Step 2 of the Metropolis-Hastings algorithm (7.3).
  1. \(f(\lambda) L(\lambda |y)= \lambda^{-2}\), \({\lambda'|\lambda} \sim \text{Normal}(\lambda, 1)\) with pdf \(q(\lambda'|\lambda)\)
  2. \(f(\lambda) L(\lambda |y)= e^{\lambda}\), \({\lambda'|\lambda} \sim \text{Normal}(\lambda, 0.2)\) with pdf \(q(\lambda'|\lambda)\)
  3. \(f(\lambda) L(\lambda |y)= e^{-1.2\lambda}\), \({\lambda'|\lambda} \sim \text{Unif}(\lambda - 0.5, \lambda + 0.5)\) with pdf \(q(\lambda'|\lambda)\)
  4. \(f(\lambda) L(\lambda |y)= e^{-\lambda^4}\), \({\lambda'|\lambda} \sim \text{Exp}(\lambda)\) with pdf \(q(\lambda'|\lambda)\)
  5. For which of these scenarios is there a 100% acceptance probability? Explain why we’d certainly want to accept \(\lambda'\) in these scenarios.
Exercise 7.8 (Calculate the acceptance probability: Part II) Repeat the above exercise, this time assuming that a Markov chain is currently at \(\lambda^{(i)} = 1.8\) and that the proposal for \(\lambda^{(i+1)}\) is \(\lambda' = 1.6\).
  1. \(f(\lambda) L(\lambda |y)= \lambda^{-1}\), \({\lambda'|\lambda} \sim \text{Normal}(\lambda, 2)\) with pdf \(q(\lambda'|\lambda)\)
  2. \(f(\lambda) L(\lambda |y)= e^{3\lambda}\), \({\lambda'|\lambda} \sim \text{Normal}(\lambda, 0.5)\) with pdf \(q(\lambda'|\lambda)\)
  3. \(f(\lambda) L(\lambda |y)= e^{-1.9\lambda}\), \({\lambda'|\lambda} \sim \text{Unif}(\lambda-0.3, \lambda+0.3)\) with pdf \(q(\lambda'|\lambda)\)
  4. \(f(\lambda) L(\lambda |y)= e^{-\lambda^4}\), \({\lambda'|\lambda} \sim \text{Exp}(\lambda)\) with pdf \(q(\lambda'|\lambda)\)
  5. For which of these scenarios is there a 100% acceptance probability? Explain why we’d certainly want to accept \(\lambda'\) in these scenarios.

7.9.2 Practice: Gamma-Poisson simulation

In the next set of exercises, return to the Bayesian model setting of (7.1),

\[\begin{split} Y|\lambda & \sim \text{Pois}(\lambda) \\ \lambda & \sim \text{Gamma}(1, 1) \\ \end{split}\]

Assume we observe \(Y = 0\) and wish to construct a Metropolis-Hastings simulation of the corresponding posterior model for \(\lambda\).

Exercise 7.9 (One iteration with a Normal proposal model) The function one_mh_iteration() from the text utilizes a Normal proposal model, \(\lambda'|\lambda \sim N(\lambda,\sigma^2)\). Starting from a current value of \(\lambda = 1\) and using set.seed(84735), run the code below and comment on the returned proposal, alpha, and next_stop values.
  1. one_mh_iteration(sigma = 1, current = 1)
  2. one_mh_iteration(sigma = 0.1, current = 1)
  3. one_mh_iteration(sigma = 2, current = 1)
  4. one_mh_iteration(sigma = 0.01, current = 1)
  5. one_mh_iteration(sigma = 3, current = 1)
Exercise 7.10 (An entire tour with a Normal proposal model) Implement the Metropolis-Hastings function mh_tour() defined in Section 7.3 to construct tours of \(\lambda\) under each of the following scenarios. Construct trace plots and histograms for each tour.
  1. 50 iterations, \(\sigma = 3\)
  2. 50 iterations, \(\sigma = 0.01\)
  3. 1000 iterations, \(\sigma = 3\)
  4. 1000 iterations, \(\sigma = 0.01\)
  5. Contrast the trace plots in parts a and b. Explain why changing \(\sigma\) has this effect.
  6. Consider the results in parts c and d. Is the \(\sigma\) value as important when the number of iterations is much larger? Explain.

Exercise 7.11 (Changing the proposal model) For this exercise, modify one_mh_iteration() to create a new function, one_mh_iteration_unif(), which utilizes a Uniform proposal model with width \(w\):

\[\lambda'|\lambda \sim \text{Unif}\left(\lambda - \frac{w}{2}, \; \lambda + \frac{w}{2} \right)\]

Subsequently, starting from a current value of \(\lambda = 1\) and set.seed(84735), run this function under each setting below. Comment on the returned proposal, alpha, and next_stop values.
  1. one_mh_iteration_unif(w = 1, current = 1)
  2. one_mh_iteration_unif(w = 0.1, current = 1)
  3. one_mh_iteration_unif(w = 2, current = 1)
  4. one_mh_iteration_unif(w = 0, current = 1)
  5. one_mh_iteration_unif(w = 8, current = 1)

Exercise 7.12 (Metropolis-Hastings tour with Uniform proposals) Upon completing the previous exercise, modify mh_tour() to create a new function, mh_tour_unif(), which constructs a chain of \(\lambda\) values using a Uniform proposal model with width \(w\). Subsequently, using set.seed(84735), run this function under each setting below and construct a trace plot of the resulting chain.

  1. 20 iterations, \(w = 10\)
  2. 20 iterations, \(w = 0.01\)
  3. 1000 iterations, \(w = 0.01\)
  4. 1000 iterations, \(w = 0.1\)
  5. 1000 iterations, \(w = 1\)
  6. 1000 iterations, \(w = 3\)
  7. Contrast the trace plots in a and b. Explain in simple terms why changing the width of the Uniform proposal model causes these differences.
  8. Based on the trace plots in c through f, which width do you think is the best for the Uniform proposal model? Explain your reasoning.
Exercise 7.13 (Change the Gamma prior) The one_mh_iteration(sigma, current) function is tailored to the Gamma-Poisson model (7.1) with a Gamma(1,1) prior for \(\lambda\). For this exercise, create a new function new_mh_iteration(sigma, current, s, r) which can utilize any Gamma(s,r) prior. Subsequently, starting from a current value of \(\lambda = 1\) and using set.seed(84735), run this function under each setting below. Comment on the resulting proposal, alpha, and next_stop values.
  1. new_mh_iteration(sigma = 1, current = 1, s = 2, r = 2)
  2. new_mh_iteration(sigma = 0.1, current = 1, s = 3, r = 1)
  3. new_mh_iteration(sigma = 2, current = 1, s = 1, r = 4)
  4. new_mh_iteration(sigma = 3, current = 1, s = 0.5, r = 0.5)
  5. new_mh_iteration(sigma = 4, current = 1, s = 20, r = 15)
Exercise 7.14 (A new Gamma-Poisson) Consider a new Gamma-Poisson model in which rate \(\lambda\) has a Gamma(1,0.1) prior and you observe one Poisson data point, \(Y = 4\). In this exercise, you will simulate the posterior model of \(\lambda\) using an independence sampler.
  1. Which of the following would make the most reasonable proposal model to use in your independence sampler for \(\lambda\): Normal, Uniform, Beta, or Exponential? Hint: recall that \(\lambda > 0\).
  2. Using the proposal model that you identified in part a, simulate a tour of 1000 \(\lambda\) values. Tune your algorithm until you are satisfied and produce a trace plot of your final results.
  3. This is a situation where we can derive the exact posterior model of \(\lambda\). What is it?
  4. Plot a histogram of your tour and overlay the exact model. How good is your Markov chain approximation?

7.9.3 Practice: Simulating more Bayesian models

In this section you will apply your new simulation skills to answer your own research question using your own data! In each case, identify a question for which data collection takes fewer than 5 minutes of effort. (That part is supposed to be fun, not overwhelming.)

Exercise 7.15 (Using your own data: Beta-Binomial Metropolis-Hastings)
  1. Identify a question that could be answered with a Beta-Binomial model for \(\pi\), some probability or proportion of interest. Eg: Consider scenarios for which data collection would be easy, such as the proportion of emails that you reply to within 24 hours, the proportion of phone calls that you answer, etc.
  2. Tune and describe a Beta prior model for \(\pi\).
  3. Collect and record your data here.
  4. Below you will simulate the posterior model of \(\pi\) using a Metropolis-Hastings algorithm. What is a reasonable proposal model to use in this algorithm? Explain.
  5. Simulate a tour of 2000 \(\pi\) values. Tune your proposal model until you are satisfied and construct the resulting trace plot.
  6. Plot a histogram of your tour, and comment on the posterior model approximation within the context of your question.
Exercise 7.16 (Using your own data: Normal-Normal MCMC)
  1. Identify a question that could be answered with a Normal-Normal model for \(\mu\), some average of interest. Eg: Consider scenarios for which data collection would be easy, such as the average high temperature this time of year, your average hourly screen time, etc.
  2. Tune and describe a Normal prior model for \(\mu\).
  3. Collect and record your data here.
  4. Below you will simulate the posterior model of \(\mu\) using a Metropolis-Hastings algorithm. What is a reasonable proposal model to use in this algorithm? Explain.
  5. Simulate a tour of 2000 \(\mu\) values. Tune your proposal model until you are satisfied and construct the resulting trace plot.
  6. Plot a histogram of your tour, and comment on the posterior model approximation within the context of your question.
Exercise 7.17 (Using your own data: Normal-Normal Independence Sampler) Repeat the previous exercise, this time using an independence sampler.

  1. Answers: Rule 1 produces Tour 2, Rule 2 produces Tour 3, and Rule 3 produces Tour 1.↩︎

  2. Answers: Tour 1 uses \(\sigma = 50\), Tour 2 uses \(\sigma = 0.01\), Tour 3 uses \(\sigma = 0.3\).↩︎

  3. This technical term was inspired by the “Goldilocks and the three bears” fairy tale in which, for some reason, a little girl (Goldilocks) taste tests three different bears’ porridge while trespassing in the bears’ house. The first bowl of porridge is too hot, the second is too cold, and the third is just right.↩︎

  4. By Chapter 3, the Beta posterior has parameters 3 (\(\alpha + y = 2 + 1\)) and 4 (\(\beta + n - y = 3 + 2 - 1\)).↩︎