# Chapter 8 Posterior Inference and Prediction

Imagine you find yourself standing at the Museum of Modern Art (MoMA) in New York City, captivated by the artwork in front of you. While understanding that “modern” art doesn’t necessarily mean “new” art, a question still bubbles up: what are the chances that this modern artist is Gen X or even younger (ie. born in 1965 or later)? In this chapter, we’ll perform a Bayesian analysis with the goal of answering this question. To this end, let $$\pi$$ denote the proportion of artists represented in major U.S. modern art museums that are Gen X or younger. The Beta(4,6) prior model for $$\pi$$ (Figure 8.1) reflects our own very vague prior assumption that major modern art museums disproportionately display artists born before 1965. After all, “modern art” dates back to the 1880s and it can take a while to attain such high recognition in the art world.

To learn more about $$\pi$$, we’ll examine $$n =$$ 100 artists sampled from the MoMA collection. This moma_small dataset in the bayesrules package is a subset of data made available by MoMA itself .

# Load packages
library(bayesrules)
library(tidyverse)
library(janitor)
library(rstan)
library(bayesplot)

# Load data
data("moma_small")

Among the sampled artists, $$Y =$$ 14 are Gen X or younger:

moma_small %>%
mutate(genx = (birth >= 1965)) %>%
tabyl(genx)
genx  n percent
FALSE 86    0.86
TRUE 14    0.14

Recognizing that the dependence of $$Y$$ on $$\pi$$ follows a Binomial model, our analysis follows the Beta-Binomial framework. Thus our updated posterior model of $$\pi$$ in light of the observed art data follows from (3.10):

$\begin{split} Y | \pi & \sim \text{Bin}(100, \pi) \\ \pi & \sim \text{Beta}(4, 6) \\ \end{split} \;\;\;\; \Rightarrow \;\;\;\; \pi | (Y = 14) \sim \text{Beta}(18, 92)$

with corresponding posterior pdf

$$$f(\pi | (y = 14)) = \frac{\Gamma(18 + 92)}{\Gamma(18)\Gamma(92)}\pi^{18-1} (1-\pi)^{92-1} \;\; \text{ for } \pi \in [0,1] \; . \tag{8.1}$$$

The evolution in our understanding of $$\pi$$ is exhibited in Figure 8.1.

plot_beta_binomial(alpha = 4, beta = 6, y = 14, n = 100)

After celebrating our success in constructing the posterior, please recognize that there’s a lot of work ahead. We must be able to utilize this posterior to perform a rigorous posterior analysis of $$\pi$$. There are three common tasks in posterior analysis: estimation, hypothesis testing, and prediction. For example, what’s our estimate of $$\pi$$? Does our model support the claim that less than 20% of museum artists are Gen X or younger? If we sample 20 more museum artists, how many do we predict will be Gen X or younger?

• Establish the theoretical foundations for the three posterior analysis tasks: estimation, hypothesis testing, and prediction.
• Explore how Markov chain simulations can be used to approximate posterior features, hence be utilized in posterior analysis.

## 8.1 Posterior estimation

Reexamine the Beta(18, 92) posterior model for $$\pi$$, the proportion of modern art museum artists that are Gen X or younger (Figure 8.1). In a Bayesian analysis, we can think of this entire posterior model as an estimate of $$\pi$$. After all, this model of posterior plausible values provides a complete picture of the trends and uncertainty in $$\pi$$. Yet in specifying and communicating our posterior understanding, it’s also useful to compute simple posterior summaries of $$\pi$$. Check in with your gut on how we might approach this task.

Which of the following best describes your posterior estimate of $$\pi$$?

1. Roughly 16% of museum artists are Gen X or younger.
2. It’s most likely that roughly 16% of museum artists are Gen X or younger, but that figure could plausibly be anywhere between 9% and 26%.

If you responded with answer b, your Bayesian intuition is correct! To see why, consider Figure 8.2 which illustrates our Beta(18, 92) posterior for $$\pi$$ (left) alongside the Beta(4, 16) posterior model (right) for another analyst who started with the same Beta(4, 6) prior but only observed 10 artists, 0 of which were Gen X or younger. These models share very similar trends, thus the other analyst’s best guess of $$\pi$$ is similar to ours: roughly 16-17% of represented artists are Gen X or younger. However, reporting only this shared trend would make our posterior seem misleadingly similar to that of the other analyst. In fact, whereas we’re quite confident that the representation of young artists is between 10% and 24%, the other analyst is only willing to put that figure somewhere in the much wider range from 6% to 40%. This makes sense – they only collected 10 artworks whereas we collected 100.

The punchline here is that posterior estimates should reflect both the posterior trend and uncertainty in $$\pi$$. The posterior mean and mode of $$\pi$$ provide quick summaries of the trend alone. These features for our Beta(18, 92) posterior follow from the general Beta properties (3.2):

$\begin{split} E(\pi | (Y=14)) & = \frac{18}{18 + 92} \approx 0.164 \\ \text{Mode}(\pi | (Y=14)) & = \frac{18 - 1}{18 + 92 - 2} \approx 0.157 \; . \\ \end{split}$

Better yet, to capture both the posterior trend and uncertainty in $$\pi$$, we can construct a range of posterior plausible $$\pi$$ values. We call this range a posterior credible interval (CI) for $$\pi$$. For example, we noticed earlier that the proportion of museum artists that are Gen X or younger is most likely between 10% and 24%. This range captures the most plausible values of $$\pi$$ while eliminating more extreme and unlikely scenarios. In fact, 0.1 and 0.24 are the 0.025th and 0.975th posterior quantiles, thus mark the middle 95% of posterior plausible $$\pi$$ values. Mathematically speaking, the area under the posterior pdf (8.1) between 0.1 and 0.24 is 0.95:

$\int_{0.1}^{0.24} f(\pi|(y=14)) d\pi \; \approx \; 0.95 \; .$ We can confirm these calculations using qbeta():

# 0.025th & 0.975th quantiles of the Beta(18,92) posterior
qbeta(c(0.025, 0.975), 18, 92)
[1] 0.1009 0.2379

This range of posterior plausible $$\pi$$ values, (0.1, 0.24), forms a 95% credible interval for $$\pi$$ and is represented by the shaded region in Figure 8.2. The interpretation of this interval is intuitive: there’s a 95% posterior probability that between 10% and 24% of museum artists are Gen X or younger. Please stop for a moment. Does this interpretation feel natural and intuitive? Thus a bit anticlimactic? If so, we’re happy you feel that way – it means you’re thinking like a Bayesian. In Section 8.5 we’ll come back to just how special this result is.

Before moving on to the next step in our posterior analysis, we want to point out that this “middle 95%” approach isn’t the only option for calculating posterior credible intervals! The first tweak we could make is to the 95% credible level (Figure 8.3). For example, a 50% posterior credible interval would narrow the focus to values of $$\pi$$ that are closer to the trend. In the other direction, increasing the span of our interval to the middle 99% would only kick out the least plausible 1%, providing us with a fuller picture of what values of $$\pi$$ are plausible (though possibly unlikely). Though a 95% level is a common choice among practitioners, it is somewhat arbitrary and simply ingrained through decades of tradition. There’s no one “right” credible level. In fact, even throughout this book, we’ll sometimes use 50% or 80% or 95% levels, depending upon the context of the analysis. Each provide a different snapshot of our posterior understanding.

It’s also not necessary to report the middle 95% of posterior plausible values. In fact, if our posterior pdf were extremely skewed, the middle 95% might eliminate some of the most plausible values. If you squint at the 50% and 95% credible intervals in Figure 8.3, you’ll see some evidence of this in the ever-so-slightly lopsided nature of the shaded region in our ever-so-slightly non-symmetric posterior. We might instead report the 95% of posterior values with the highest posterior density. You can explore this idea in the exercises, though we won’t lose sleep over it here. Mainly, it’s rare to see this method used in practice since it is both more difficult to implement than the middle 95% approach and will only produce meaningfully different results in extreme cases (when the posterior is extremely skewed).

Posterior Credible Intervals

Let parameter $$\pi$$ have posterior pdf $$f(\pi | y)$$. A posterior credible interval (CI) provides a range of posterior plausible values of $$\pi$$, thus a summary of posterior trend and uncertainty. For example, a middle 95% CI is constructed by the 2.5th and 97.5th posterior quantiles

$(\pi_{0.025}, \pi_{0.975}) \; .$

Thus there’s a 95% posterior probability that $$\pi$$ is in this range:

$P(\pi \in (\pi_{0.025}, \pi_{0.975}) | (Y=y)) = \int_{\pi_{0.025}}^{\pi^{0.975}} f(\pi|y)d\pi = 0.95 \; .$

## 8.2 Posterior hypothesis testing

### 8.2.1 One-sided tests

Hypothesis testing is another common task in a posterior analysis. For example, suppose we read an article claiming that less than 20% of museum artists are Gen X or younger. Does our posterior model of $$\pi$$ support this claim? Your instinct here might be to note that the 95% credible interval for $$\pi$$, (0.1, 0.24), is mostly below 0.20. Thus the claim is at least partially plausible. This is a great start! To more rigorously evaluate just how plausible it is that $$\pi < 0.2$$, we can calculate the posterior probability of this scenario (represented by the shaded region in Figure 8.4).

Mathematically, this probability corresponds to the area under the posterior pdf below 0.20:

$P(\pi < 0.20 \; | \; (Y = 14)) = \int_{0}^{0.20}f(\pi | (y=14))d\pi \;.$

We perform this calculation using pbeta() below, revealing that the evidence in favor of our claim is quite strong. There’s a roughly 84.9% posterior chance that Gen Xers account for less than 20% of modern art museum artists.

# Posterior probability that pi < 0.20
post_prob <- pbeta(0.20, 18, 92)
post_prob
[1] 0.849

This analysis of our claim is refreshingly straightforward. We simply calculated the posterior probability of the scenario of interest. Though not always necessary, practitioners often formalize this procedure into a hypothesis testing framework. For example, we can frame our analysis as two competing hypotheses: the null hypothesis $$H_0$$ contends that at least 20% of museum artists are Gen X or younger (the status quo here) whereas our alternative hypothesis $$H_a$$ contends that this figure is below 20%. In mathematical notation:

$\begin{split} H_0: & \; \; \pi \ge 0.20 \\ H_a: & \; \; \pi < 0.20 \\ \end{split}$

Note that $$H_a$$ claims that $$\pi$$ lies on one side of 0.20 ($$\pi < 0.20$$) as opposed to $$\pi \ne 0.20$$. Thus, we call this a one-sided hypothesis test.

We’ve already shown that the posterior probability of the alternative hypothesis, $$P(H_a \; | \; (Y=14))$$, is roughly 0.849. Thus the posterior probability of the null hypothesis, $$P(H_0 \; | \; (Y=14))$$, is roughly 0.151. Putting this together, the posterior odds that $$\pi < 0.2$$ are roughly 5.62. That is, our posterior assessment is that $$\pi$$ is nearly 6 times more likely to be below 0.20 than to be above 0.20:

$\text{Posterior odds } = \frac{P(H_a \; | \; (Y=14))}{P(H_0 \; | \; (Y=14))} \approx 5.62$

# Posterior odds
post_odds <- post_prob / (1 - post_prob)
post_odds
[1] 5.622

Don’t forget that these posterior odds represent our updated understanding of $$\pi$$ upon observing the survey data ($$Y =$$ 14 of $$n =$$ 100 sampled artists were Gen X or younger). Prior to sampling these artists, we had a much higher assessment of Gen X representation at major art museums (Figure 8.5).

Specifically, the prior probability that $$\pi < 0.20$$, $$P(H_a)$$, was only 0.0856:

# Prior probability that pi < 0.20
prior_prob <- pbeta(0.20, 4, 6)
prior_prob
[1] 0.08564

Or, in mathematical notation,

$P(H_a) = \int_0^{0.20} f(\pi) d\pi \approx 0.0856$

where $$f(\pi)$$ is the Beta(4,6) prior pdf. Thus the prior probability of the null hypothesis, $$P(H_0)$$, is roughly 0.914. It follows that the prior odds of Gen X representation being below 0.20 were roughly only 1 in 10:

$\text{Prior odds } = \frac{P(H_a)}{P(H_0)} \approx 0.093 \; .$

# Prior odds
prior_odds <- prior_prob / (1 - prior_prob)
prior_odds
[1] 0.09366

The Bayes Factor (BF) provides a rigorous comparison of the posterior odds to the prior odds, hence insight into just how much our understanding about Gen X representation evolved upon observing our sample data:

$\text{Bayes Factor} = \frac{\text{Posterior odds }}{\text{Prior odds }} \; .$

In our example, the Bayes Factor is roughly 60, revealing that upon observing the artwork sample, the posterior odds of our hypothesis about Gen Xers are roughly 60 times higher than the prior odds! Or, our confidence in this hypothesis jumped quite a bit.

# Bayes factor
BF <- post_odds / prior_odds
BF
[1] 60.02

The definition box below summarizes the Bayes Factor and includes some guiding principles for its interpretation.

Bayes Factor

In a hypothesis test of two competing hypotheses, $$H_a$$ vs $$H_0$$, the Bayes Factor is an odds ratio for $$H_a$$:

$\text{Bayes Factor} = \frac{\text{Posterior odds}}{\text{Prior odds}} = \frac{P(H_a | Y) / P(H_0 | Y)}{P(H_a) / P(H_0)} \; .$

As a ratio, it’s meaningful to compare the Bayes Factor (BF) to 1. To this end, consider three possible scenarios:

1. BF = 1: The plausibility of $$H_a$$ didn’t change in light of the observed data.
2. BF > 1: The plausibility of $$H_a$$ increased in light of the observed data. Thus the greater the Bayes Factor, the more convincing the evidence for $$H_a$$.
3. BF < 1: The plausibility of $$H_a$$ decreased in light of the observed data.

Bringing it all together, the posterior probability (0.85) and Bayes Factor (60) establish fairly convincing evidence in favor of the claim that less than 20% of artists at major modern art museums are Gen X or younger. Did you wince in reading that sentence? The term “fairly convincing” might seem a little wishy-washy. In your past experiences with statistics, you might have learned specific cut-offs that distinguish between “statistically significant” and “not statistically significant” results, or allow you to “reject” or “fail to reject” a hypothesis. However, this practice provides a false comfort. Reality is not so black-and-white. For this reason, the broader statistics community (both frequentist and Bayesian) advocates against the practice of making rigid conclusions using universal rules and for a more nuanced practice which takes into account the context and potential implications of each individual hypothesis test. Thus, there is no magic, one-size-fits-all cut-off for what Bayes Factor or posterior probability evidence is big enough to filter claims into “true” or “false” categories. In fact, what we have is more powerful than a binary decision – we have a holistic measure of our level of uncertainty about the claim. This level of uncertainty can inform our next steps. In our art example, do we have ample evidence for our claim, or do we need to gather more data? We think we have ample evidence.

### 8.2.2 Two-sided tests

Especially when working in new settings, it’s not always the case that we wish to test a one-directional claim about some parameter $$\pi$$. For example, consider a new art researcher that simply wishes to test whether or not 30% of major museum artists are Gen X or younger. This hypothesis is two-sided:

$\begin{split} H_0: & \; \; \pi = 0.3 \\ H_a: & \; \; \pi \ne 0.3 \\ \end{split}$

When we try to hit this two-sided hypothesis test with the same hammer we used for the one-sided hypothesis test, we quickly run into a problem. Since $$\pi$$ is continuous, the prior and posterior probabilities of $$H_0$$ are both zero. Thus the posterior odds, prior odds, and consequentially the Bayes factor are all undefined. For example,

$\text{Posterior odds } = \frac{P(H_a \; | \; (Y=14))}{P(H_0 \; | \; (Y=14))} = \frac{1}{0} = \text{ nooooo!}$

No problem. We don’t need to rely on one recipe for success. It doesn’t exist. To that end, try the following self quiz.55

Recall that the 95% posterior credible interval (CI) for $$\pi$$ is (0.1, 0.24). With which of the following statements do you agree:

1. The CI provides ample evidence that $$\pi$$ differs from 0.3.
2. The CI doesn’t provide ample evidence that $$\pi$$ differs from 0.3.

If you answered “a,” then you intuited a reasonable approach to two-sided hypothesis testing. The hypothesized value of $$\pi$$ (here 0.3) is “substantially” outside the range of the posterior credible interval, thus we have ample evidence in favor of $$H_a$$. The fact that 0.3 is so far above the credible interval makes us pretty confident that the proportion of museum artists that are Gen X or younger is not 0.3. Yet what’s “substantial” or clear in one context might be different than what’s “substantial” in another. With that in mind, it is best practice to define “substantial” ahead of time (ie. before seeing any data). For example, we might consider any proportion outside the 0.05 window around 0.3 to be meaningfully different from 0.3 in the context of artist representation. This essentially adds a little buffer into our hypotheses, either $$\pi$$ is between 0.25 and 0.35 or it’s not:

$\begin{split} H_0: & \; \; \pi \in (0.25, 0.35) \\ H_a: & \; \; \pi \notin (0.25, 0.35) \\ \end{split}$

With this defined buffer in place, we can more rigorously claim belief in $$H_a$$ since the entire hypothesized range for $$\pi$$, (0.25, 0.35), lies above its 95% credible interval. Note also that since $$H_0$$ no longer includes a singular hypothesized value of $$\pi$$, its corresponding posterior and prior probabilities are no longer 0. Thus, just as we did in the one-sided hypothesis testing setting, we could (but won’t here) supplement our posterior credible interval analysis above with a Bayes Factor calculation.

## 8.3 Posterior prediction

A third common task in a posterior analysis is the posterior prediction of new outcomes. For example, suppose we get our hands on 20 more artworks. What number would you predict are done by artists that are Gen X or younger? Your knee-jerk reaction might be: ‘I got this one. It’s 3!’ This is a reasonable place to start. After all, our best posterior guess was that roughly 16% of major museum artists are Gen X or younger ($$\text{Mode}(\pi | (Y=14)) \approx$$ 0.16) and 16% of 20 is roughly 3. However, this calculation ignores two sources of potential variability in our prediction: sampling variability and posterior variability in $$\pi$$.

First, consider sampling variability. When we sample 20 artists at random, we wouldn’t expect exactly 3 (16%) of these to be Gen X or younger. Rather, the number will fluctuate depending upon which random sample we happen to get. Second, there’s posterior variability in $$\pi$$: 0.16 isn’t the only posterior plausible value of $$\pi$$. Rather, $$\pi$$ might be anywhere between roughly 0.1 and 0.24 (Figure 8.1). Thus, when making predictions about the new sample of 20 artworks, we need to consider what outcomes we’d expect to see under each possible $$\pi$$ while accounting for the fact that some $$\pi$$ are more plausible than others.

Let’s specify these concepts with some math. First, let $$Y' = y'$$ be the (yet unknown) number of the 20 new artworks that are done by Gen X or younger artists. Thus conditioned on $$\pi$$, the randomness in $$Y'$$ can be modeled by $$Y'|\pi \sim \text{Bin}(20,\pi)$$ with pdf

$f(y'|\pi) = P(Y' = y' | \pi) = \binom{20}{ y'} \pi^{y'}(1-\pi)^{20 - y'}$

where $$y'$$ can be any number of artists in $$\{0,1,...,20\}$$. Further, recall that the posterior model of $$\pi$$ given the original data ($$Y =$$ 14) is Beta(18, 92) with pdf $$f(\pi|(y=14))$$ from (8.1). Combining these two pieces, weighting $$f(y'|\pi)$$ by $$f(\pi|(y = 14))$$,

$f(y'|\pi) f(\pi|(y=14)) \; ,$

captures the chance of observing $$Y' = y'$$ for the given $$\pi$$ ($$f(y'|\pi)$$) while taking into account the posterior plausibility of $$\pi$$ ($$f(\pi|(y=14))$$). Figure 8.6 illustrates this idea, plotting the weighted behavior of $$Y'$$ for just three possible values of $$\pi$$: the 2.5th posterior quantile (0.1), the posterior mode (0.16), and the 97.5th posterior quantile (0.24). Naturally, we see that the greater $$\pi$$ is, the greater $$Y'$$ tends to be: when $$\pi = 0.1$$ the most likely value of $$Y'$$ is 2, whereas when $$\pi = 0.24$$ the most likely value of $$Y'$$ is 5. Also notice that since values of $$\pi$$ as low as 0.1 or as high as 0.24 are not very plausible, the values of $$Y'$$ that might be generated under these scenarios are given less weight (ie. the sticks are much shorter) than those that are generated under $$\pi =$$ 0.16.

Putting this all together, the posterior predictive model of $$Y'$$, the number of the 20 new artists that are Gen X, takes into account both the sampling variability in $$Y'$$ and posterior variability in $$\pi$$. Specifically, the posterior predictive pmf calculates the overall chance of observing $$Y' = y'$$ across all possible $$\pi$$ from 0 to 1 by averaging across the chance of observing $$Y' = y'$$ for any given $$\pi$$:

$f(y'|(y=14)) = \int_0^1 f(y'|\pi) f(\pi|(y=14)) d\pi \; .$

Some calculus, which we leave for the exercises, establishes a formula for this pmf:

$f(y'|(y=14)) = {20 \choose y'} \frac{\Gamma(110)}{\Gamma(18)\Gamma(92)} \frac{\Gamma(18+y')\Gamma(112-y')} {\Gamma(130)} \;\;\; \text{ for } \;\;\; y' \in \{0,1,\ldots,20\}\;.$

For example, plugging into this formula, there’s a 0.2217 posterior predictive probability that 3 of the 20 new artists are Gen X:

$f((y'=3)|(y=14)) = {20 \choose 3} \frac{\Gamma(110)}{\Gamma(18)\Gamma(92)} \frac{\Gamma(18+3)\Gamma(112-3)} {\Gamma(130)} = 0.2217 \;.$

If you’ve had some calculus experience, you might try to confirm this formula for $$f(y'|(y=14))$$ (it’s fun!). If not, no problem. The first key is to recognize that $$f(y'|(y=14))$$ depends only upon $$y'$$, our new data point. Yet we also see the influence of our Beta(18,92) posterior model for $$\pi$$ (through parameters 18 and 92), hence the original prior and data, on $$f(y'|(y=14))$$. The second key is to be able to interpret the meaning of this posterior predictive pmf. To this end, $$f(y'|(y=14))$$ is illustrated in Figure 8.7.

Though it looks similar to the model of $$Y'$$ when we assume that $$\pi$$ equals the posterior mode of 0.16 (Figure 8.6), the posterior predictive model puts more relative weight on the smaller and larger values of $$Y'$$ that we might expect when $$\pi$$ deviates from 0.16. Thinking like Bayesians, this entire posterior predictive model provides us with a prediction of how many of the next 20 artworks will be done by Gen X or younger artists. Though the most likely number is 3, it could plausibly be anywhere between, say, 0 and 8.

Posterior predictive model

Let $$Y'$$ denote a new outcome of variable $$Y$$. Further, let pdf $$f(y'|\pi)$$ denote the dependence of $$Y'$$ on $$\pi$$ and posterior pdf $$f(\pi|y)$$ denote the posterior plausibility of $$\pi$$ given the original data $$Y = y$$. Then the posterior predictive model for $$Y'$$ has pdf

$f(y'|y) = \int f(y'|\pi) f(\pi|y) d\pi \; .$

In words, the overall chance of observing $$Y' = y'$$ weights the chance of observing this outcome under any possible $$\pi$$ ($$f(y'|\pi)$$) by the posterior plausibility of $$\pi$$ ($$f(\pi|y)$$).

## 8.4 Posterior analysis with MCMC

It’s great to know that there’s some theory behind Bayesian posterior analysis. And when we’re working with models that are as straightforward as the Beta-Binomial, we can directly implement this theory – that is, we can calculate exact posterior credible intervals, probabilities, and predictive models. Yet we’ll soon be leaving this nice territory and entering scenarios in which we cannot specify posterior models, let alone calculate exact summaries of their features. Recall from Chapter 6, that in these scenarios we can approximate posteriors using MCMC methods. In this section, we’ll explore how the resulting Markov chain sample values can also be used to approximate specific posterior features, hence be used to conduct posterior analysis.

Below we run four parallel Markov chains for 10,000 iterations each. After tossing out the first 5,000 iterations of all four chains, we end up with four separate Markov chain samples of size 5,000, $$\left\lbrace \pi^{(1)}, \pi^{(2)}, \ldots, \pi^{(5000)}\right\rbrace$$, or a combined Markov chain sample size of 20,000.

# STEP 1: DEFINE the model
art_model <- "
data {
int<lower=0, upper=100> Y;
}
parameters {
real<lower=0, upper=1> pi;
}
model {
Y ~ binomial(100, pi);
pi ~ beta(4, 6);
}
"

# STEP 2: SIMULATE the posterior
art_sim <- stan(
model_code = art_model, data = list(Y = 14),
chains = 4, iter = 5000*2, seed = 84735)

The trace plots (left) and density plots (middle) of the parallel chains indicate stability. Further, the density plot of the combined 20,000 chain values (right) provides an accurate approximation of the posterior pdf:

# Parallel trace plots & density plots
mcmc_trace(art_sim, pars = "pi", size = 0.5)
mcmc_dens_overlay(art_sim, pars = "pi")

# Combined density plot
mcmc_dens(art_sim, pars = "pi")

As icing on the cake, the effective sample size n_eff is satisfyingly high (suggesting that our dependent chains are behaving “enough” like an independent sample) and the Rhat value is effectively 1 (suggesting extreme stability across our parallel chains):

# Quick summary statistics of the Markov chain
summary(art_sim, pars = "pi")\$summary
mean   se_mean      sd   2.5%    25%    50%
pi 0.1642 0.0004046 0.03518 0.1011 0.1395 0.1624
75%  97.5% n_eff Rhat
pi 0.1871 0.2388  7560    1

We can now use this simulation, with confidence, to estimate any feature of the Beta(18, 92) posterior model (Figure 8.1) by the corresponding feature of the Markov chain approximation (Figure 8.8, right). To this end, the summary() function above provides some handy statistics for the combined 20,000 Markov chain values stored in art_sim. First, notice that the reported mean of our 20,000 Markov chain values is 0.1642. Further, the 2.5th and 97.5th quantiles of the Markov chain values are 0.1011 and 0.2388, respectively. We can visualize these quantiles, hence the approximate middle 95% credible interval for $$\pi$$, using the mcmc_areas() function in the bayesplot package:

# Shade in the middle 95% interval
mcmc_areas(
art_sim, pars = "pi",
prob = 0.95, point_est = "mean"
)

Importantly, these Markov chain features are quite similar to the theoretical posterior properties we calculated in Section 8.1: $$E(\pi | (Y=14)) \approx 0.164$$ and 95% credible interval (0.1, 0.24). This hopefully gives you some peace of mind that, when needed, Markov chain simulation can produce accurate estimates of posterior features.

Though a nice first stop, the summary() function doesn’t necessarily provide every summary statistic of interest. For example, it doesn’t report the mode of our Markov chain sample values. No problem! We can calculate summary statistics directly from the Markov chain values. The first step is to convert an array of the four parallel chains into a single data frame of the combined chains:

# Store the 4 chains in 1 data frame
art_chains_df <- as.data.frame(art_sim,
pars ="lp__", include = FALSE)

With the chains in data frame form, we can proceed as usual, using our dplyr tools to transform and summarize. For example, we can directly calculate the sample mean, mode, and quantiles of the combined Markov chain values:

# Calculate posterior summaries of pi
art_chains_df %>%
summarize(post_mean = mean(pi),
post_mode = sample_mode(pi),
lower_95 = quantile(pi, 0.025),
upper_95 = quantile(pi, 0.975))
post_mean post_mode lower_95 upper_95
1    0.1642    0.1598   0.1011   0.2388

We can also use these raw chain values to tackle the next task in our posterior analysis – testing the claim that less than 20% of major museum artists are Gen X (ie. $$\pi < 0.20$$). We calculated the “official” posterior probability of this scenario to be $$P(\pi < 0.20 | (Y=14)) \approx$$ 0.849. Our posterior simulation produces a very close approximation, with roughly 84.6% of the Markov chain sample values being below 0.20:

# Tabulate pi values that are below 0.20
art_chains_df %>%
mutate(exceeds = pi < 0.20) %>%
tabyl(exceeds)
exceeds     n percent
FALSE  3080   0.154
TRUE 16920   0.846

Finally, we can utilize our Markov chain values to approximate the posterior predictive model of $$Y'$$, the number of the next 20 sampled artists that will be Gen X or younger. Simulating this model also helps us build intuition for the underlying theory. Recall that the posterior predictive model reflects the relative chance of observing $$Y' = y'$$ across all possible values of $$\pi$$ while taking into account the posterior plausibility of $$\pi$$. Thus to approximate this model, we can use rbinom() to simulate one observation of $$Y'$$ from each of the 20,000 $$\pi$$ values in our Markov chain:

# Set the seed
set.seed(84735)

# Predict a value of Y' for each pi value in the chain
art_chains_df <- art_chains_df %>%
mutate(y_predict = rbinom(length(pi), size = 20, prob = pi))

# Check it out
art_chains_df %>%
head(3)
pi y_predict
1 0.1301         4
2 0.1755         4
3 0.2214         4

The resulting collection of 20,000 predictions closely approximates the true posterior predictive distribution (Figure 8.7):

# Plot the 20,000 predictions
ggplot(art_chains_df, aes(x = y_predict)) +
stat_count()

## 8.5 Benefits of being Bayesian

The point of this book is for you to explore the Bayesian approach to statistical analysis. However, it’s helpful to contrast what you’ve learned here with what you might have learned in past frequentist studies. As you might have experienced, the toughest part of a Bayesian analysis is building the posterior model. Once we have that piece in place, it’s fairly straightforward and intuitive to utilize this posterior for estimation, hypothesis testing, and prediction. The same is not as true for frequentist analyses. Rather, we often have to rely on formulas behind black box R functions to perform the analogous frequentist calculations.

But where Bayesian analysis really outshines the frequentist alternative is the ease with which its results can be interpreted. As we discussed way back in Chapter 1, a Bayesian analysis assesses the uncertainty regarding an unknown parameter $$\pi$$ in light of observed data $$Y$$. For example, consider the artist study in which $$\pi$$ represented the proportion of all museum artists that are Gen X or younger and we observed that $$Y = 14$$ of 100 sampled artists were in this generation. In light of this observed data, we determined that there was a 0.849 posterior probability that $$\pi$$ is below 0.20,

$P((\pi < 0.20) \; | \; (Y = 14)) = 0.849 \;.$

This calculation doesn’t make sense to a frequentist – $$\pi$$ either exceeds 0.20 or it doesn’t, thus this probability can only be 0 or 1. Thus a frequentist analysis, flipping the script, assesses the uncertainty of the observed data $$Y$$ in light of assumed values of $$\pi$$. On its face, this doesn’t make much sense. We observed the data, thus have no uncertainty about $$Y$$. It’s also quite unnatural. For example, the frequentist counterpart to the Bayesian posterior probability above is the p-value, which we can calculate using black box techniques:

$P((Y \le 14) | (\pi = 0.20)) = 0.08 \;.$

The opposite order of the conditioning in this probability, ($$Y$$ given $$\pi$$ instead of $$\pi$$ given $$Y$$) leads to a different interpretation than the Bayesian probability: if $$\pi$$ were only 0.20, there’s only an 8% chance we’d have observed a sample in which at most $$Y = 14$$ of 100 artists were Gen X. It’s not our writing here that’s awkward, it’s the p-value. No wonder the p-value gets so much grief. Though it doesn’t give us no information, it doesn’t give us the information that we most want – thus can be a mind bender to interpret. This is the authors’ strong opinion and it’s totally fine if you prefer the frequentist approach here.

## 8.6 Chapter summary

In Chapter 8, you learned how to turn a posterior model into answers. That is, you utilized posterior models, exact or approximate, to perform three posterior analysis tasks for an unknown parameter $$\pi$$:

1. posterior estimation
A posterior credible interval provides a range of posterior plausible values of $$\pi$$, thus a sense of both the posterior trend and uncertainty in $$\pi$$.
2. posterior hypothesis testing
Posterior probabilities provide insight into corresponding hypotheses regarding $$\pi$$.
3. posterior prediction
The posterior predictive model for a new data point $$Y$$ takes into account both the sampling variability in $$Y$$ and the posterior variability in $$\pi$$.

## 8.7 Exercises

### 8.7.1 Conceptual exercises

Exercise 8.1 (Posterior Analysis) What are the three common tasks in a posterior analysis?
Exercise 8.2 (Warming up Part I)
1. In estimating some parameter $$\lambda$$, what are drawbacks to only reporting the posterior trend?
2. The 95% credible interval for $$\lambda$$ is (1,3.4). How would you interpret this in one sentence?
Exercise 8.3 (Warming up Part II)
1. The 90% credible interval for $$\mu$$ is (-2.7,1.8). How would you interpret this in one sentence?
2. What is the benefit of reporting the 95% of posterior values with the highest posterior density as opposed to the middle 95% of posterior values for the credible interval?
3. What is the benefit of reporting the middle 95% of posterior values as opposed to the 95% of posterior values with the highest posterior density for the credible interval?
Exercise 8.4 (Credible interval or hypothesis testing?) In each situation below, indicate whether calculating a credible interval would suffice in addressing the interest of the “researcher,” or whether we should conduct a hypothesis test.
1. Your friend Trichelle claims that more than 40% of dogs at the dog park do not have a dog license.
2. Your professor is interested in learning about the proportion of students at a large university who have heard of Bayesian Statistics.
3. An environmental justice advocate wants to prove that more than 60% of voters in their state support a new regulation.
4. Sarah is studying Ptolemy’s Syntaxis Mathematica text and wants to investigate the number of times that Ptolemy uses a certain mode of argument per page of text. Based on Ptolemy’s other writings she thinks it will be about 3 times per page. Rather than reading all of 13 volumes of Syntaxis Mathematica, Sarah takes a random sample of 90 pages.
Exercise 8.5 (Bayes Factor) A key part of gaining understanding of a concept is being able to communicate that concept in just a few words. Here you will practice explaining ideas about Bayes Factors as simply as possible to your friend Enrique who has a lot of frequentist statistics experience, but knows very little about Bayes
1. Explain in words what posterior odds are.
2. Explain in words what prior odds are.
3. Now explain what the Bayes Factor is and why we might want to calculate it.
Exercise 8.6 (Posterior prediction: Concepts)
1. What two types of variability do posterior predictive models incorporate? Define each type such that your non-Bayesian statistics friends could understand.
2. Describe a real-life situation in which it would be helpful to carry out posterior prediction.
3. Is a posterior predictive model conditional on just the data, just the parameter, or on both the data and the parameter?

### 8.7.2 Practice exercises

Exercise 8.7 (Credible intervals: Part I) For each situation, find the appropriate credible interval using the “middle” approach.
1. a 95% credible interval for $$\pi$$ with $$\pi|y \sim \text{Beta}(4,5)$$
2. a 60% credible interval for $$\pi$$ with $$\pi|y \sim \text{Beta}(4,5)$$
3. a 95% credible interval for $$\lambda$$ with $$\lambda|y \sim \text{Gamma}(1,8)$$
Exercise 8.8 (Credible intervals: Part II) For each situation, find the appropriate credible interval using the “middle” approach.
1. a 99% credible interval of $$\lambda$$ with $$\lambda|y \sim \text{Gamma}(1,5)$$
2. a 95% credible interval of $$\mu$$ with $$\mu|y \sim \text{Normal}(10,4.1)$$
3. an 80% credible interval of $$\mu$$ with $$\mu|y \sim \text{Normal}(-3,2)$$
Exercise 8.9 (Credible intervals: Highest posterior density) For each situation, find the appropriate credible interval using the given approach.
1. Find the 99% credible interval of $$\lambda$$ with $$\lambda|y \sim \text{Gamma}(1,5)$$ using the highest posterior density approach. Represent this interval on a sketch of the Gamma(1,5) posterior pdf.
2. Repeat part a using the middle 99% approach. Represent this interval on a sketch of the Gamma(1,5) posterior pdf.
3. Compare the two intervals from parts a and b. Are they the same? If not, how do they differ and which approach is more appropriate in this situation?
4. Find the 80% credible interval of $$\mu$$ with $$\mu|y \sim \text{Normal}(-13,1.7)$$ using the highest posterior density approach.
5. Repeat part d using the middle 80% approach.
6. Compare the two intervals from parts d and e. Are they the same? If not, why not?
Exercise 8.10 (Hypothesis tests: Part I) For parameter $$\pi$$, suppose you have a Beta(1,0.8) prior model and a Beta(4,3) posterior. You wish to test the null hypothesis that $$\pi \le 0.4$$ versus the alternative that $$\pi > 0.4$$.
1. What is the posterior probability for the alternative hypothesis?
2. What are the posterior odds?
3. What are the prior odds?
4. What is the Bayes Factor?
5. Putting this together, explain your conclusion about these hypotheses to someone who is unfamiliar with Bayesian statistics.
Exercise 8.11 (Hypothesis tests: Part II) For parameter $$\mu$$, suppose you have a Normal(10,100) prior model and a Normal(4,12) posterior. You wish to test $$H_0: \; \mu \ge 5.2$$ versus $$H_a: \mu < 5.2$$.
1. What is the prior probability for the alternative hypothesis?
2. What are the prior odds?
3. What are the posterior odds?
4. What is the Bayes Factor?
5. Putting this together, explain your conclusion about these hypotheses to someone who is unfamiliar with Bayesian statistics.
Exercise 8.12 (Posterior predictive Beta-Binomial: With calculus) As we discussed in Section 8.3, it is sometimes possible to derive an exact posterior predictive model. Such is the case with the conjugate models we have studied thus far: the Beta-Binomial, Gamma-Poisson, and Normal-Normal. To begin, suppose we observe $$Y=y$$ successes in $$n$$ trials where $$Y | \pi \sim \text{Bin}(n,\pi)$$ and $$\pi$$ has a Beta($$\alpha, \beta$$) prior.
1. Identify the posterior pdf of $$\pi$$ given the observed data $$Y = y$$, $$f(\pi|y)$$. NOTE: This pdf will depend upon $$(y, n, \alpha, \beta, \pi)$$.
2. Suppose we conduct $$n'$$ new trials (where $$n'$$ might differ from our original number of trials $$n$$) and let $$Y' = y'$$ be the observed number of successes in these new trials. Identify the conditional pmf of $$Y'$$ given $$\pi$$, $$f(y'|\pi)$$. NOTE: This pmf will depend upon $$(y', n', \pi)$$.
3. Identify the posterior predictive pmf of $$Y'$$, $$f(y'|y)$$. NOTE: This pmf, found by multiplying the above two answers and integrating with respect to $$\pi$$, will depend upon $$(y, n, y', n', \alpha, \beta)$$.
4. As with the example in Section 8.3, suppose your posterior model of $$\pi$$ is based on a prior model with $$\alpha = 4$$ and $$\beta = 6$$ and an observed $$y = 14$$ successes in $$n = 100$$ original trials. We plan to conduct $$n' = 20$$ new trials. Specify the posterior predictive pmf of $$Y'$$, the number of successes we might observe in these 20 trials.
5. Continuing the previous part, suppose instead we plan to conduct $$n' = 4$$ new trials. Specify and sketch the posterior predictive pmf of $$Y'$$, the number of successes we might observe in these 4 trials.
Exercise 8.13 (Posterior predictive Gamma-Poisson: With calculus) Suppose we have observed count data, $$Y = y$$, where $$Y|\lambda \sim \text{Pois}(\lambda)$$ and $$\lambda$$ has a Gamma($$s, r$$) prior.
1. Identify the posterior pdf of $$\lambda$$ given the observed data $$Y = y$$, $$f(\lambda|y)$$.
2. Let $$Y' = y'$$ be the number of events that will occur in a new observation period (of the same length as the original). Identify the conditional pmf of $$Y'$$ given $$\lambda$$, $$f(y'|\lambda)$$.
3. Identify the posterior predictive pmf of $$Y'$$, $$f(y'|y)$$. NOTE: This will depend upon $$(y, y', s, r)$$.
4. Suppose your posterior model of $$\lambda$$ is based on a prior model with $$s = 50$$ and $$r = 50$$ and $$y = 7$$ events in your original observation period. Specify the posterior predictive pmf of $$Y'$$, the number of events we might observe in the next observation period.
5. Either by hand or in R, sketch and discuss the posterior predictive pmf from part d.
Exercise 8.14 (Posterior predictive Normal-Normal: With calculus) Let $$Y = y$$ be an observed data point from a Normal model with mean $$\mu$$ and standard deviation $$\sigma$$. Further, suppose $$\sigma$$ is known and that $$\mu$$ is unknown with a Normal($$\theta, \tau^2$$) prior.
1. Identify the posterior pdf of $$\mu$$ given the observed data, $$f(\mu|y)$$.
2. Let $$Y' = y'$$ be the value of a new data point. Identify the conditional pdf of $$Y'$$ given $$\mu$$, $$f(y'|\mu)$$.
3. Identify the posterior predictive pdf of $$Y'$$, $$f(y'|y)$$.
4. Suppose $$y = -10$$, $$\sigma = 3$$, $$\theta = 0$$, and $$\tau = 1$$. Specify the posterior predictive pdf of $$Y'$$.

### 8.7.3 Application & simulation exercises

Exercise 8.15 (Climate change: estimation) Let $$\pi$$ denote the proportion of U.S. adults that do not believe in climate change. To learn about $$\pi$$, we’ll use survey data on $$n$$ adults and count up the number of these that don’t believe in climate change, $$Y$$.
1. Explain which of the following Bayesian models will be appropriate for this analysis: Beta-Binomial, Gamma-Poisson, or Normal-Normal.
2. Before collecting any data, specify and discuss your own prior model for $$\pi$$.
3. For the remainder of the exercise, we’ll utilize the authors’ Beta(1,2) prior for $$\pi$$. How does your prior understanding differ from that of the authors?
4. Using the pulse_of_the_nation data from the bayesrules package, report the sample proportion of surveyed adults with the opinion that climate_change is Not Real At All.
5. In light of the Beta(1,2) prior and data, calculate and interpret a (middle) 95% posterior credible interval for $$\pi$$. NOTE: You’ll first need to specify your posterior model of $$\pi$$.
Exercise 8.16 (Climate change: hypothesis testing) Let’s continue our analysis of $$\pi$$, the proportion of U.S. adults that don’t believe in climate change. Suppose you wish to test $$H_0:$$ $$\pi \le 0.1$$ versus $$H_a:$$ $$\pi > 0.1$$.
1. What decision might you make about these hypotheses utilizing the credible interval from the previous exercise?
2. Calculate and interpret the prior odds of $$H_a$$.
3. Calculate and interpret the posterior odds of $$H_a$$.
4. Calculate and interpret the Bayes Factor for your hypothesis test.
5. Putting this together, explain your conclusion about $$\pi$$ to someone who is unfamiliar with Bayesian statistics.
Exercise 8.17 (Penguins: estimation) Let $$\mu$$ denote the typical flipper length (in mm) among the Adelie penguin species. To learn about $$\mu$$, we’ll utilize flipper length measurements $$(Y_1,Y_2,\dots,Y_n)$$ on a random sample of Adelie penguins.
1. Explain which of the following Bayesian models will be appropriate for this analysis: Beta-Binomial, Gamma-Poisson, or Normal-Normal.
2. Before looking at any data, your prior understanding is that the average flipper length for all Adelie penguins is about 200mm, but you aren’t very sure. It’s plausible that the average could be as low a 140mm or as high as 260mm. Specify an appropriate prior model for $$\mu$$ and explain your reasoning.
3. The penguins data in the palmerpenguins package contains data on the flipper lengths for a sample of three different penguin species. For the Adelie species, how many data points are there and what’s the sample mean flipper_length_mm?
4. In light of your prior and data, calculate and interpret a (middle) 95% posterior credible interval for $$\mu$$. NOTE: You’ll first need to specify your posterior model of $$\mu$$.
Exercise 8.18 (Penguins: hypothesis testing) Let’s continue our analysis of $$\mu$$, the typical flipper length (in mm) among the Adelie penguin species.
1. You hypothesize that the average Adelie flipper length is either less than 200 or greater than 220 mm. State this as a formal hypothesis test (using $$H_0$$, $$H_a$$, and $$\mu$$ notation). Note: This is a two-sided hypothesis test!
2. What decision might you make about these hypotheses utilizing the credible interval from the previous exercise?
3. Calculate and interpret the Bayes Factor for your hypothesis test.
4. Putting this together, explain your conclusion about $$\mu$$ to someone who is unfamiliar with Bayesian statistics.
Exercise 8.19 (Loons: estimation) The loon is a species of bird common to the Ontario region of Canada. Let $$\lambda$$ denote the typical number of loons observed by a birdwatcher across a 100 hour observation period. To learn about $$\lambda$$, we’ll utilize bird counts $$(Y_1,Y_2,\dots,Y_n)$$ collected in $$n$$ different outings.
1. Explain which of the following Bayesian models will be appropriate for this analysis: Beta-Binomial, Gamma-Poisson, or Normal-Normal.

2. Before looking at any data, your prior understanding is that the typical rate of loon sightings is 2 per 100 hours with a standard deviation of 1 per 100 hours. Specify an appropriate prior model for $$\lambda$$ and explain your reasoning.

3. The bird_counts data in the bayesrules package contains bird count data for the Common Loon and other species. We’ll utilize only data on the Common Loon observed since 2000, and scale the count data to a 100 hour observation period:

data(bird_counts)
loons <- bird_counts %>%
filter(species == "Common Loon", year >= 2000) %>%
mutate(count_per_100 = round(count_per_hour*100))

In this sample, how many data points do we have and what’s the average loon count per 100 hours?

4. In light of your prior and data, calculate and interpret a (middle) 95% posterior credible interval for $$\lambda$$. NOTE: You’ll first need to specify your posterior model of $$\lambda$$.

Exercise 8.20 (Loons: hypothesis testing) Let’s continue our analysis of $$\lambda$$, the typical rate of loon sightings in a 100 hour observation period.
1. You hypothesize that birdwatchers should anticipate a rate of less than 1 loon per 100 observation hours. State this as a formal hypothesis test (using $$H_0$$, $$H_a$$, and $$\lambda$$ notation).
2. What decision might you make about these hypotheses utilizing the credible interval from the previous exercise?
3. Calculate and interpret the Bayes Factor for your hypothesis test.
4. Putting this together, explain your conclusion about $$\lambda$$ to someone who is unfamiliar with Bayesian statistics.
Exercise 8.21 (Posterior analysis with MCMC: Estimation) Consider the Bayesian model of $$\pi$$ below, constructed from the observation of $$Y = 23$$ successes in $$n = 50$$ trials:

$$$\begin{split} Y | \pi & \sim \text{Bin}(50, \pi) \\ \pi & \sim \text{Beta}(7, 3) \\ \end{split} \;\; \Rightarrow \;\; \pi|(Y = 23) \sim \text{Beta}(30,30) (\#ex:ch8-bb)$$$

1. Simulate the posterior model of $$\pi$$ with RStan using 4 chains and 10000 iterations per chain.
2. Produce trace plots and overlaid density plots of the values for each of the four chains.
3. Report the effective sample size (n_eff) and Rhat values for your simulation, explaining what these values mean in context.
4. Calculate the exact 70% and 95% posterior credible intervals for $$\pi$$ using the Beta(30,30) posterior model.
5. Utilize your MCMC simulation to approximate the 70% and 95% posterior credible intervals for $$\pi$$. NOTE: You can use the summary() shortcut function for the 95% interval.
Exercise 8.22 (Posterior analysis with MCMC: Prediction)
1. Continuing with Bayesian model ??, suppose you were to run 14 more trials and let $$Y'$$ be the number of successes in those trials. Use your MCMC simulation to approximate the posterior predictive model of $$Y'$$. Construct a histogram visualization of this model.
2. Discuss your posterior predictive model of $$Y'$$ (eg: how many successes do you anticipate? what’s a reasonable range of possible $$Y'$$ values?).
3. Approximate the probability that we observe at least 9 successes in the next 14 trials.
Exercise 8.23 (Posterior analysis with MCMC: Hypothesis testing)
1. Continuing with Bayesian model ??, we want to test the hypothesis that $$\pi > 0.48$$. Calculate the exact posterior probability of this hypothesis using the Beta(30,30) posterior model and interpret this posterior probability in words.
2. Utilize your MCMC simulation to approximate posterior probability that $$\pi > 0.48$$.

1. Answer: a↩︎