# Chapter 8 Posterior Inference and Prediction

Imagine you find yourself standing at the Museum of Modern Art (MoMA) in New York City, captivated by the artwork in front of you.
While understanding that “modern” art doesn’t necessarily mean “new” art, a question still bubbles up: *what are the chances that this modern artist is Gen X or even younger, i.e. born in 1965 or later*?
In this chapter, we’ll perform a Bayesian analysis with the goal of answering this question.
To this end, let \(\pi\) denote the proportion of artists represented in major U.S. modern art museums that are Gen X or younger.
The Beta(4,6) prior model for \(\pi\) (Figure 8.1) reflects our own very vague prior assumption that major modern art museums disproportionately display artists born before 1965.
After all, “modern art” dates back to the 1880s and it can take a while to attain such high recognition in the art world.

To learn more about \(\pi\), we’ll examine \(n =\) 100 artists sampled from the MoMA collection.
This `moma_small`

dataset in the **bayesrules** package is a subset of data made available by MoMA itself (**moma_github?**).

```
# Load packages
library(bayesrules)
library(tidyverse)
library(rstan)
library(bayesplot)
library(broom.mixed)
library(janitor)
# Load data
data("moma_small")
```

Among the sampled artists, \(Y =\) 14 are Gen X or younger:

```
%>%
moma_small group_by(genx) %>%
tally()
# A tibble: 2 x 2
genx n<lgl> <int>
1 FALSE 86
2 TRUE 14
```

Recognizing that the dependence of \(Y\) on \(\pi\) follows a Binomial model, our analysis follows the Beta-Binomial framework. Thus our updated posterior model of \(\pi\) in light of the observed art data follows from (3.10):

\[\begin{split} Y | \pi & \sim \text{Bin}(100, \pi) \\ \pi & \sim \text{Beta}(4, 6) \\ \end{split} \;\;\;\; \Rightarrow \;\;\;\; \pi | (Y = 14) \sim \text{Beta}(18, 92)\]

with corresponding posterior pdf

\[\begin{equation} f(\pi | (y = 14)) = \frac{\Gamma(18 + 92)}{\Gamma(18)\Gamma(92)}\pi^{18-1} (1-\pi)^{92-1} \;\; \text{ for } \pi \in [0,1] \; . \tag{8.1} \end{equation}\]

The evolution in our understanding of \(\pi\) is exhibited in Figure 8.1.

`plot_beta_binomial(alpha = 4, beta = 6, y = 14, n = 100)`

After celebrating our success in *constructing* the posterior, please recognize that there’s a lot of work ahead.
We must be able to utilize this posterior to perform a rigorous **posterior analysis** of \(\pi\).
There are three common tasks in posterior analysis: **estimation**, **hypothesis testing**, and **prediction**.
For example, what’s our *estimate* of \(\pi\)?
Does our model support the claim that less than 20% of museum artists are Gen X or younger?
If we sample 20 more museum artists, how many do we predict will be Gen X or younger?

- Establish the
*theoretical*foundations for the three posterior analysis tasks: estimation, hypothesis testing, and prediction. - Explore how Markov chain simulations can be used to approximate posterior features, hence be utilized in posterior analysis.

## 8.1 Posterior estimation

Reexamine the Beta(18, 92) posterior model for \(\pi\), the proportion of modern art museum artists that are Gen X or younger (Figure 8.1).
In a Bayesian analysis, we can think of this *entire* posterior model as an estimate of \(\pi\).
After all, this model of posterior plausible values provides a complete picture of the trends and uncertainty in \(\pi\).
Yet in specifying and communicating our posterior understanding, it’s also useful to compute simple posterior summaries of \(\pi\).
Check in with your gut on how we might approach this task.

What best describes your posterior estimate of \(\pi\)?

- Roughly 16% of museum artists are Gen X or younger.
- It’s most likely that roughly 16% of museum artists are Gen X or younger, but that figure could plausibly be anywhere between 9% and 26%.

If you responded with answer b, your Bayesian intuition is correct!
To see why, consider Figure 8.2 which illustrates our Beta(18, 92) posterior for \(\pi\) (left) alongside the Beta(4, 16) posterior model (right) for another analyst who started with the same Beta(4, 6) prior but only observed 10 artists, 0 of which were Gen X or younger.
These models share very similar *trends*, thus the other analyst’s *best guess* of \(\pi\) is similar to ours: roughly 16-17% of represented artists are Gen X or younger.
However, reporting only this shared trend would make our posterior seem misleadingly similar to that of the other analyst.
In fact, whereas we’re quite confident that the representation of young artists is between 10% and 24%, the other analyst is only willing to put that figure somewhere in the much wider range from 6% to 40%.
This makes sense – they only collected 10 artworks whereas we collected 100.

The punchline here is that posterior estimates should reflect both the **posterior trend and uncertainty** in \(\pi\).
The **posterior mean and mode** of \(\pi\) provide quick summaries of the trend alone.
These features for our Beta(18, 92) posterior follow from the general Beta properties (3.2):

\[\begin{split} E(\pi | (Y=14)) & = \frac{18}{18 + 92} \approx 0.164 \\ \text{Mode}(\pi | (Y=14)) & = \frac{18 - 1}{18 + 92 - 2} \approx 0.157 \; . \\ \end{split}\]

Better yet, to capture *both* the posterior trend and uncertainty in \(\pi\), we can construct a *range* of posterior plausible \(\pi\) values.
We call this range a **posterior credible interval (CI)** for \(\pi\).
For example, we noticed earlier that the proportion of museum artists that are Gen X or younger is most likely between 10% and 24%.
This range captures the most plausible values of \(\pi\) while eliminating more extreme and unlikely scenarios.
In fact, 0.1 and 0.24 are the 0.025th and 0.975th **posterior quantiles**, thus mark the middle 95% of posterior plausible \(\pi\) values.
Mathematically speaking, the area under the posterior pdf (8.1) between 0.1 and 0.24 is 0.95:

\[\int_{0.1}^{0.24} f(\pi|(y=14)) d\pi \; \approx \; 0.95 \; .\]
We can confirm these calculations using `qbeta()`

:

```
# 0.025th & 0.975th quantiles of the Beta(18,92) posterior
qbeta(c(0.025, 0.975), 18, 92)
1] 0.1009 0.2379 [
```

This range of posterior plausible \(\pi\) values, (0.1, 0.24), forms a **95% credible interval** for \(\pi\) and is represented by the shaded region in Figure 8.2.
The **interpretation** of this interval is intuitive: there’s a 95% posterior probability that between 10% and 24% of museum artists are Gen X or younger.
Please stop for a moment.
Does this interpretation feel natural and intuitive?
Thus a bit anticlimactic?
If so, we’re happy you feel that way – it means you’re thinking like a Bayesian.
In Section 8.5 we’ll come back to just how special this result is.

Before moving on to the next step in our posterior analysis, we want to point out that this “middle 95%” approach isn’t the *only* option for calculating posterior credible intervals!
The first tweak we could make is to the 95% **credible level** (Figure 8.3).
For example, a 50% posterior credible interval would draw our focus to a smaller range of some of the most plausible \(\pi\) values.
In the other direction, increasing the span of our interval to the middle 99% would only kick out the least plausible 1%, providing us with a fuller picture of what values of \(\pi\) are plausible (though possibly unlikely).
Though a 95% level is a common choice among practitioners, it is somewhat arbitrary and simply ingrained through decades of tradition.
**There’s no one “right” credible level.**
Throughout this book, we’ll sometimes use 50% or 80% or 95% levels, depending upon the context of the analysis.
Each provide a different snapshot of our posterior understanding.

It’s also not necessary to report the *middle* 95% of posterior plausible values.
In fact, if our posterior pdf were extremely skewed, the middle 95% might eliminate some of the most plausible values.
If you squint at the 50% and 95% credible intervals in Figure 8.3, you’ll see some evidence of this in the ever-so-slightly lopsided nature of the shaded region in our ever-so-slightly non-symmetric posterior.
We might instead report the 95% of posterior values with the **highest posterior density**.
You can explore this idea in the exercises, though we won’t lose sleep over it here.
Mainly, this method will only produce meaningfully different results than the middle 95% approach in extreme cases, when the posterior is extremely skewed.

**Posterior Credible Intervals**

Let parameter \(\pi\) have posterior pdf \(f(\pi | y)\). A posterior credible interval (CI) provides a range of posterior plausible values of \(\pi\), thus a summary of posterior trend and uncertainty. For example, a middle 95% CI is constructed by the 2.5th and 97.5th posterior quantiles

\[(\pi_{0.025}, \pi_{0.975}) \; .\]

Thus there’s a 95% posterior probability that \(\pi\) is in this range:

\[P(\pi \in (\pi_{0.025}, \pi_{0.975}) | (Y=y)) = \int_{\pi_{0.025}}^{\pi_{0.975}} f(\pi|y)d\pi = 0.95 \; .\]

## 8.2 Posterior hypothesis testing

### 8.2.1 One-sided tests

Hypothesis testing is another common task in a posterior analysis.
For example, suppose we read an article claiming that less than 20% of museum artists are Gen X or younger.
Does our posterior model of \(\pi\) support this claim?
Your instinct here might be to note that the 95% credible interval for \(\pi\), (0.1, 0.24), is *mostly* below 0.20.
Thus the claim is at least partially *plausible*.
This is a great start!
To more rigorously evaluate just *how* plausible it is that \(\pi < 0.2\), we can calculate the **posterior probability** of this scenario (represented by the shaded region in Figure **??**).

Mathematically, this probability corresponds to the area under the posterior pdf below 0.20:

\[P(\pi < 0.20 \; | \; (Y = 14)) = \int_{0}^{0.20}f(\pi | (y=14))d\pi \;.\]

We perform this calculation using `pbeta()`

below, revealing strong evidence in favor of our claim.
There’s a roughly 84.9% posterior chance that Gen Xers account for less than 20% of modern art museum artists.

```
# Posterior probability that pi < 0.20
<- pbeta(0.20, 18, 92)
post_prob
post_prob1] 0.849 [
```

This analysis of our claim is refreshingly straightforward.
We simply calculated the posterior probability of the scenario of interest.
Though not always necessary, practitioners often *formalize* this procedure into a **hypothesis testing** framework.
For example, we can frame our analysis as two competing hypotheses: the **null hypothesis** \(H_0\) contends that *at least* 20% of museum artists are Gen X or younger (the status quo here) whereas our **alternative hypothesis** \(H_a\) contends that this figure is *below* 20%.
In mathematical notation:

\[\begin{split} H_0: & \; \; \pi \ge 0.20 \\ H_a: & \; \; \pi < 0.20 \\ \end{split}\]

Note that \(H_a\) claims that \(\pi\) lies on *one* side of 0.20 (\(\pi < 0.20\)) as opposed to \(\pi \ne 0.20\).
Thus, we call this a **one-sided hypothesis test**.

We’ve already shown that the posterior probability of the alternative hypothesis, \(P(H_a \; | \; (Y=14))\), is roughly 0.849.
Thus the posterior probability of the null hypothesis, \(P(H_0 \; | \; (Y=14))\), is roughly 0.151.
Putting this together, the **posterior odds** that \(\pi < 0.2\) are roughly 5.62.
That is, our posterior assessment is that \(\pi\) is nearly 6 times more likely to be *below* 0.20 than to be *above* 0.20:

\[\text{Posterior odds } = \frac{P(H_a \; | \; (Y=14))}{P(H_0 \; | \; (Y=14))} \approx 5.62 \]

```
# Posterior odds
<- post_prob / (1 - post_prob)
post_odds
post_odds1] 5.622 [
```

Don’t forget that these posterior odds represent our *updated* understanding of \(\pi\) upon observing the survey data (\(Y =\) 14 of \(n =\) 100 sampled artists were Gen X or younger).
*Prior* to sampling these artists, we had a much higher assessment of Gen X representation at major art museums (Figure 8.4).

Specifically, the **prior probability** that \(\pi < 0.20\), \(P(H_a)\), was only 0.0856:

```
# Prior probability that pi < 0.20
<- pbeta(0.20, 4, 6)
prior_prob
prior_prob1] 0.08564 [
```

Or, in mathematical notation,

\[P(H_a) = \int_0^{0.20} f(\pi) d\pi \approx 0.0856\]

where \(f(\pi)\) is the Beta(4,6) prior pdf. Thus the prior probability of the null hypothesis, \(P(H_0)\), is roughly 0.914. It follows that the prior odds of Gen X representation being below 0.20 were roughly only 1 in 10:

\[\text{Prior odds } = \frac{P(H_a)}{P(H_0)} \approx 0.093 \; .\]

```
# Prior odds
<- prior_prob / (1 - prior_prob)
prior_odds
prior_odds1] 0.09366 [
```

The **Bayes Factor (BF)** compares the posterior odds to the prior odds, hence provides insight into just how much our understanding about Gen X representation *evolved* upon observing our sample data:

\[\text{Bayes Factor} = \frac{\text{Posterior odds }}{\text{Prior odds }} \; .\]

In our example, the Bayes Factor is roughly 60, revealing that upon observing the artwork sample, the posterior odds of our hypothesis about Gen Xers are roughly 60 times higher than the prior odds! Or, our confidence in this hypothesis jumped quite a bit.

```
# Bayes factor
<- post_odds / prior_odds
BF
BF1] 60.02 [
```

We summarize the Bayes Factor below, including some guiding principles for its interpretation.

**Bayes Factor**

In a hypothesis test of two competing hypotheses, \(H_a\) vs \(H_0\), the Bayes Factor is an odds ratio for \(H_a\):

\[\text{Bayes Factor} = \frac{\text{Posterior odds}}{\text{Prior odds}} = \frac{P(H_a | Y) / P(H_0 | Y)}{P(H_a) / P(H_0)} \; .\]

As a ratio, it’s meaningful to compare the Bayes Factor (BF) to 1. To this end, consider three possible scenarios:

- BF = 1: The plausibility of \(H_a\)
*didn’t change*in light of the observed data. - BF > 1: The plausibility of \(H_a\)
*increased*in light of the observed data. Thus the greater the Bayes Factor, the more convincing the evidence for \(H_a\). - BF < 1: The plausibility of \(H_a\)
*decreased*in light of the observed data.

Bringing it all together, the posterior probability (0.85) and Bayes Factor (60) establish *fairly convincing* evidence in favor of the claim that less than 20% of artists at major modern art museums are Gen X or younger.
Did you wince in reading that sentence?
The term “fairly convincing” might seem a little wishy-washy.
In the past, you might have learned specific cut-offs that distinguish between “statistically significant” and “not statistically significant” results, or allow you to “reject” or “fail to reject” a hypothesis.
However, this practice provides false comfort.
Reality is not so black-and-white.
For this reason, the broader statistics community (both frequentist and Bayesian) advocates *against* the practice of making rigid conclusions using universal rules and *for* a more nuanced practice which takes into account the context and potential implications of each individual hypothesis test.
Thus, there is no magic, one-size-fits-all cut-off for what Bayes Factor or posterior probability evidence is big enough to filter claims into “true” or “false” categories.
In fact, what we have is more powerful than a binary decision – we have a holistic measure of our level of uncertainty about the claim.
This level of uncertainty can inform our next steps.
In our art example, do we have ample evidence for our claim?
*We’re* convinced.

### 8.2.2 Two-sided tests

Especially when working in new settings, it’s not always the case that we wish to test a one-directional claim about some parameter \(\pi\).
For example, consider a new art researcher that simply wishes to test *whether or not* 30% of major museum artists are Gen X or younger.
This hypothesis is **two-sided**:

\[\begin{split} H_0: & \; \; \pi = 0.3 \\ H_a: & \; \; \pi \ne 0.3 \\ \end{split}\]

When we try to hit this two-sided hypothesis test with the same hammer we used for the one-sided hypothesis test, we quickly run into a problem.
Since \(\pi\) is *continuous*, the prior and posterior probabilities of \(H_0\) are both *zero*.
Thus the posterior odds, prior odds, and consequentially the Bayes factor are all undefined.
For example,

\[\text{Posterior odds } = \frac{P(H_a \; | \; (Y=14))}{P(H_0 \; | \; (Y=14))} = \frac{1}{0} = \text{ nooooo!}\]

No problem.
There’s not one recipe for success.
To that end, try the following self quiz.^{52}

Recall that the 95% posterior credible interval (CI) for \(\pi\) is (0.1, 0.24). Does this CI provide ample evidence that \(\pi\) differs from 0.3?

If you answered “yes,” then you intuited a reasonable approach to two-sided hypothesis testing.
The hypothesized value of \(\pi\) (here 0.3) is “substantially” outside the posterior credible interval, thus we have ample evidence in favor of \(H_a\).
The fact that 0.3 is so far above the credible interval makes us pretty confident that the proportion of museum artists that are Gen X or younger is *not* 0.3.
Yet what’s “substantial” or clear in one context might be different than what’s “substantial” in another.
With that in mind, it is best practice to define “substantial” *ahead of time*, *before* seeing any data.
For example, we might consider any proportion outside the 0.05 window around 0.3 to be meaningfully different from 0.3 in the context of artist representation.
This essentially adds a little buffer into our hypotheses, either \(\pi\) is between 0.25 and 0.35 or it’s not:

\[\begin{split} H_0: & \; \; \pi \in (0.25, 0.35) \\ H_a: & \; \; \pi \notin (0.25, 0.35) \\ \end{split}\]

With this defined buffer in place, we can more rigorously claim belief in \(H_a\) since the *entire* hypothesized range for \(\pi\), (0.25, 0.35), lies above its 95% credible interval.
Note also that since \(H_0\) no longer includes a singular hypothesized value of \(\pi\), its corresponding posterior and prior probabilities are no longer 0.
Thus, just as we did in the *one-sided* hypothesis testing setting, we could (but won’t here) supplement our posterior credible interval analysis above with a Bayes Factor calculation.

## 8.3 Posterior prediction

A third common task in a posterior analysis is the **posterior prediction** of new outcomes.
For example, suppose we get our hands on 20 more artworks.
What number would you *predict* are done by artists that are Gen X or younger?
Your knee-jerk reaction might be: ‘I *got* this one. It’s 3!’
This is a reasonable place to start.
After all, our best posterior guess was that roughly 16% of major museum artists are Gen X or younger (\(\text{Mode}(\pi | (Y=14)) \approx\) 0.16) and 16% of 20 is roughly 3.
However, this calculation ignores two sources of potential variability in our prediction: **sampling variability** and **posterior variability** in \(\pi\).

First, consider **sampling variability**.
When we randomly sample 20 artists, we don’t expect *exactly* 3 (16%) of these to be Gen X or younger.
Rather, the number will fluctuate depending upon which random sample we happen to get.
Second, there’s **posterior variability** in \(\pi\): 0.16 isn’t the only posterior plausible value.
Rather, \(\pi\) might be anywhere between roughly 0.1 and 0.24 (Figure 8.1).
Thus, when making predictions about the new sample of 20 artworks, we need to consider what outcomes we’d expect to see under *each* possible \(\pi\) while accounting for the fact that some \(\pi\) are more plausible than others.

Let’s specify these concepts with some math. First, let \(Y' = y'\) be the (yet unknown) number of the 20 new artworks that are done by Gen X or younger artists. Thus conditioned on \(\pi\), the randomness in \(Y'\) can be modeled by \(Y'|\pi \sim \text{Bin}(20,\pi)\) with pdf

\[f(y'|\pi) = P(Y' = y' | \pi) = \binom{20}{ y'} \pi^{y'}(1-\pi)^{20 - y'}\]

where \(y'\) can be any number of artists in \(\{0,1,...,20\}\).
Further, recall that the posterior model of \(\pi\) given the *original* data (\(Y =\) 14) is Beta(18, 92) with pdf \(f(\pi|(y=14))\) from (8.1).
Combining these two pieces, weighting \(f(y'|\pi)\) by \(f(\pi|(y = 14))\),

\[f(y'|\pi) f(\pi|(y=14)) \; ,\]

captures the chance of observing \(Y' = y'\) for the given \(\pi\) (\(f(y'|\pi)\)) while taking into account the posterior plausibility of \(\pi\) (\(f(\pi|(y=14))\)). Figure 8.5 illustrates this idea, plotting the weighted behavior of \(Y'\) for just three possible values of \(\pi\): the 2.5th posterior quantile (0.1), the posterior mode (0.16), and the 97.5th posterior quantile (0.24). Naturally, we see that the greater \(\pi\) is, the greater \(Y'\) tends to be: when \(\pi = 0.1\) the most likely value of \(Y'\) is 2, whereas when \(\pi = 0.24\) the most likely value of \(Y'\) is 5. Also notice that since values of \(\pi\) as low as 0.1 or as high as 0.24 are not very plausible, the values of \(Y'\) that might be generated under these scenarios are given less weight (i.e. the sticks are much shorter) than those that are generated under \(\pi =\) 0.16.

Putting this all together, the **posterior predictive model** of \(Y'\), the number of the 20 new artists that are Gen X, takes into account both the sampling variability in \(Y'\) and posterior variability in \(\pi\).
Specifically, the posterior predictive pmf calculates the overall chance of observing \(Y' = y'\) across *all* possible \(\pi\) from 0 to 1 by averaging across the chance of observing \(Y' = y'\) for any *given* \(\pi\):

\[f(y'|(y=14)) = \int_0^1 f(y'|\pi) f(\pi|(y=14)) d\pi \; .\]

Some calculus, which we leave for the exercises, establishes a formula for this pmf:

\[f(y'|(y=14)) = \left(\!\begin{array}{c} 20 \\ y' \end{array}\!\right) \frac{\Gamma(110)}{\Gamma(18)\Gamma(92)} \frac{\Gamma(18+y')\Gamma(112-y')} {\Gamma(130)} \;\;\; \text{ for } y' \in \{0,1,\ldots,20\}\;.\]

For example, plugging into this formula, there’s a 0.2217 posterior predictive probability that 3 of the 20 new artists are Gen X:

\[f((y'=3)|(y=14)) = \left(\!\begin{array}{c} 20 \\ 3 \end{array}\!\right) \frac{\Gamma(110)}{\Gamma(18)\Gamma(92)} \frac{\Gamma(18+3)\Gamma(112-3)} {\Gamma(130)} = 0.2217 \;.\]

If you’ve had some calculus experience, you might try to confirm this formula for \(f(y'|(y=14))\) (it’s fun!).
If not, no problem.
The first key is to recognize that \(f(y'|(y=14))\) depends only upon \(y'\), our new data point.
Yet we also see the influence of our Beta(18,92) posterior model for \(\pi\) (through parameters 18 and 92), hence the original prior and data, on \(f(y'|(y=14))\).
The second key is to be able to interpret the *meaning* of this posterior predictive pmf.
To this end, \(f(y'|(y=14))\) is illustrated in Figure 8.6.

Though it looks similar to the model of \(Y'\) when we assume that \(\pi\) equals the posterior mode of 0.16 (Figure 8.5), the posterior predictive model puts more relative weight on the smaller and larger values of \(Y'\) that we might expect when \(\pi\) deviates from 0.16. Thinking like Bayesians, this entire posterior predictive model provides us with a prediction of how many of the next 20 artworks will be done by Gen X or younger artists. Though the most likely number is 3, it could plausibly be anywhere between, say, 0 and 8.

**Posterior predictive model**

Let \(Y'\) denote a new outcome of variable \(Y\). Further, let pdf \(f(y'|\pi)\) denote the dependence of \(Y'\) on \(\pi\) and posterior pdf \(f(\pi|y)\) denote the posterior plausibility of \(\pi\) given the original data \(Y = y\). Then the posterior predictive model for \(Y'\) has pdf

\[f(y'|y) = \int f(y'|\pi) f(\pi|y) d\pi \; .\]

In words, the overall chance of observing \(Y' = y'\) weights the chance of observing this outcome under *any* possible \(\pi\) (\(f(y'|\pi)\)) by the posterior plausibility of \(\pi\) (\(f(\pi|y)\)).

## 8.4 Posterior analysis with MCMC

It’s great to know that there’s some theory behind Bayesian posterior analysis.
And when we’re working with models that are as straightforward as the Beta-Binomial, we can directly implement this theory – that is, we can calculate exact posterior credible intervals, probabilities, and predictive models.
Yet we’ll soon be leaving this nice territory and entering scenarios in which we cannot specify posterior models, let alone calculate exact summaries of their features.
Recall from Chapter 6, that in these scenarios we can *approximate* posteriors using MCMC methods.
In this section, we’ll explore how the resulting Markov chain sample values can also be used to approximate specific posterior *features*, hence be used to conduct posterior analysis.

Below we run four parallel Markov chains for 10,000 iterations each. After tossing out the first 5,000 iterations of all four chains, we end up with four separate Markov chain samples of size 5,000, \(\left\lbrace \pi^{(1)}, \pi^{(2)}, \ldots, \pi^{(5000)}\right\rbrace\), or a combined Markov chain sample size of 20,000.

```
# STEP 1: DEFINE the model
<- "
art_model data {
int<lower=0, upper=100> Y;
}
parameters {
real<lower=0, upper=1> pi;
}
model {
Y ~ binomial(100, pi);
pi ~ beta(4, 6);
}
"
# STEP 2: SIMULATE the posterior
<- stan(
art_sim model_code = art_model, data = list(Y = 14),
chains = 4, iter = 5000*2, seed = 84735)
```

The trace plots (left) and density plots (middle) of the parallel chains indicate stability.
Further, the density plot of the combined 20,000 chain values (right) provides an accurate posterior *approximation*:

```
# Parallel trace plots & density plots
mcmc_trace(art_sim, pars = "pi", size = 0.5)
mcmc_dens_overlay(art_sim, pars = "pi")
# Combined density plot
mcmc_dens(art_sim, pars = "pi")
```

As icing on the cake, the effective sample size ratio is satisfyingly high (suggesting that our dependent chains are behaving “enough” like an independent sample) and the `Rhat`

value is effectively 1 (suggesting extreme stability across our parallel chains):

```
# Quick summary statistics of the Markov chain
neff_ratio(art_sim, pars = "pi")
1] 0.378
[rhat(art_sim, pars = "pi")
1] 1 [
```

We can now use this simulation, with confidence, to **estimate** any feature of the Beta(18, 92) posterior model (Figure 8.1) by the corresponding feature of the Markov chain approximation (Figure 8.7, right).
To this end, the `tidy()`

function in the **broom.mixed** package provides some handy statistics for the combined 20,000 Markov chain values stored in `art_sim`

:

```
tidy(art_sim, conf.int = TRUE, conf.level = 0.95)
# A tibble: 1 x 5
term estimate std.error conf.low conf.high<chr> <dbl> <dbl> <dbl> <dbl>
1 pi 0.162 0.0352 0.101 0.239
```

First, the `estimate`

reports that the **median** of our 20,000 Markov chain values is 0.162.
Further, the 2.5th and 97.5th **quantiles** of the Markov chain values are 0.101 and 0.239, respectively.
We can visualize these quantiles, hence the approximate middle 95% credible interval for \(\pi\), using the `mcmc_areas()`

function in the **bayesplot** package:

```
# Shade in the middle 95% interval
mcmc_areas(art_sim, pars = "pi",
prob = 0.95, point_est = "mean")
```

Importantly, these Markov chain features are quite similar to the theoretical posterior properties we calculated in Section 8.1, such as the 95% credible interval (0.1, 0.24). This hopefully gives you some peace of mind that, when needed, Markov chain simulation can produce accurate estimates of posterior features.

Though a nice first stop, the `tidy()`

function doesn’t necessarily provide every summary statistic of interest.
For example, it doesn’t report the *mode* of our Markov chain sample values.
No problem!
We can calculate summary statistics *directly* from the Markov chain values.
The first step is to convert an *array* of the four parallel chains into a single *data frame* of the combined chains:

```
# Store the 4 chains in 1 data frame
<- as.data.frame(art_sim,
art_chains_df pars = "lp__", include = FALSE)
```

With the chains in data frame form, we can proceed as usual, using our `dplyr`

tools to transform and summarize.
For example, we can directly calculate the sample mean, median, mode, and quantiles of the combined Markov chain values:

```
# Calculate posterior summaries of pi
%>%
art_chains_df summarize(post_mean = mean(pi),
post_median = median(pi),
post_mode = sample_mode(pi),
lower_95 = quantile(pi, 0.025),
upper_95 = quantile(pi, 0.975))
post_mean post_median post_mode lower_95 upper_951 0.1642 0.1624 0.1598 0.1011 0.2388
```

We can also use these raw chain values to tackle the next task in our posterior analysis – testing the claim that less than 20% of major museum artists are Gen X (i.e. \(\pi < 0.20\)).
We calculated the “official” posterior probability of this scenario to be \(P(\pi < 0.20 | (Y=14)) \approx\) 0.849.
Our posterior simulation produces a very close **approximation**, with roughly 84.6% of the Markov chain sample values being below 0.20:

```
# Tabulate pi values that are below 0.20
%>%
art_chains_df mutate(exceeds = pi < 0.20) %>%
tabyl(exceeds)
exceeds n percentFALSE 3080 0.154
TRUE 16920 0.846
```

Finally, we can utilize our Markov chain values to approximate the posterior predictive model of \(Y'\), the number of the next 20 sampled artists that will be Gen X or younger.
Simulating this model also helps us build intuition for the underlying theory.
Recall that the posterior predictive model reflects the relative chance of observing \(Y' = y'\) across all possible values of \(\pi\) while taking into account the posterior plausibility of \(\pi\).
Thus to *approximate* this model, we can use `rbinom()`

to simulate one observation of \(Y'\) from *each* of the 20,000 \(\pi\) values in our Markov chain:

```
# Set the seed
set.seed(84735)
# Predict a value of Y' for each pi value in the chain
<- art_chains_df %>%
art_chains_df mutate(y_predict = rbinom(length(pi), size = 20, prob = pi))
# Check it out
%>%
art_chains_df head(3)
pi y_predict1 0.1301 4
2 0.1755 4
3 0.2214 4
```

The resulting collection of 20,000 predictions closely approximates the true posterior predictive distribution (Figure 8.6):

```
# Plot the 20,000 predictions
ggplot(art_chains_df, aes(x = y_predict)) +
stat_count()
```

## 8.5 Benefits of being Bayesian

The point of this book is for you to explore the Bayesian approach to statistical analysis.
However, it’s helpful to contrast what you’ve learned here with what you might have learned in past *frequentist* studies.
As you might have experienced, the toughest part of a Bayesian analysis is building the posterior model.
Once we have that piece in place, it’s fairly straightforward and intuitive to utilize this posterior for estimation, hypothesis testing, and prediction.
The same is not as true for frequentist analyses.
Rather, we often have to rely on formulas behind black box R functions to perform the analogous frequentist calculations.

But where Bayesian analysis really outshines the frequentist alternative is the ease with which its results can be interpreted.
As we discussed way back in Chapter 1, a **Bayesian analysis** assesses the uncertainty regarding an *unknown* parameter \(\pi\) in light of observed data \(Y\).
For example, consider the artist study in which \(\pi\) represented the proportion of all museum artists that are Gen X or younger and we observed that \(Y = 14\) of 100 sampled artists were in this generation.
In light of this observed data, we determined that there was a 0.849 posterior probability that \(\pi\) is below 0.20,

\[P((\pi < 0.20) \; | \; (Y = 14)) = 0.849 \;.\]

This calculation doesn’t make sense to a frequentist – \(\pi\) either exceeds 0.20 or it doesn’t, thus this probability can only be 0 or 1.
Thus a **frequentist analysis**, flipping the script, assesses the uncertainty of the observed data \(Y\) in light of *assumed* values of \(\pi\).
On its face, this doesn’t make much sense.
We *observed* the data, thus have no uncertainty about \(Y\).
It’s also quite unnatural.
For example, the frequentist counterpart to the Bayesian posterior probability above is the **p-value**, which we can calculate using black box techniques:

\[P((Y \le 14) | (\pi = 0.20)) = 0.08 \;.\]

The opposite order of the conditioning in this probability, \(Y\) given \(\pi\) instead of \(\pi\) given \(Y\), leads to a different interpretation than the Bayesian probability: **if** \(\pi\) were only 0.20, there’s only an 8% chance we’d have observed a sample in which at most \(Y = 14\) of 100 artists were Gen X.
It’s not our writing here that’s awkward, it’s the p-value.
No wonder the p-value gets so much grief.
Though it doesn’t give us *no* information, it doesn’t give us the information that we most want – thus can be a mind bender to interpret.
This is the authors’ strong opinion and it’s totally fine if you prefer the frequentist approach here.

## 8.6 Chapter summary

In Chapter 8, you learned how to turn a posterior model into answers. That is, you utilized posterior models, exact or approximate, to perform three posterior analysis tasks for an unknown parameter \(\pi\):

**posterior estimation**

A posterior**credible interval**provides a range of posterior plausible values of \(\pi\), thus a sense of both the posterior trend and uncertainty in \(\pi\).**posterior hypothesis testing**

**Posterior probabilities**provide insight into corresponding hypotheses regarding \(\pi\).**posterior prediction**

The**posterior predictive model**for a new data point \(Y\) takes into account both the*sampling variability*in \(Y\) and the*posterior variability*in \(\pi\).

## 8.7 Exercises

### 8.7.1 Conceptual exercises

**Exercise 8.1 (Posterior Analysis)**What are the three common tasks in a posterior analysis?

**Exercise 8.2 (Warming up)**

- In estimating some parameter \(\lambda\), what are drawbacks to only reporting the posterior trend?
- The 95% credible interval for \(\lambda\) is (1,3.4). How would you interpret this in one sentence?

**Exercise 8.3 (Credible interval or hypothesis testing?)**In each situation below, indicate whether calculating a

*credible interval*would suffice in addressing the issue of interest, or whether we should conduct a

*hypothesis test*.

- Your friend Trichelle claims that more than 40% of dogs at the dog park do not have a dog license.
- Your professor is interested in learning about the proportion of students at a large university who have heard of Bayesian Statistics.
- An environmental justice advocate wants to prove that more than 60% of voters in their state support a new regulation.
- Sarah is studying Ptolemy’s
*Syntaxis Mathematica*text and wants to investigate the number of times that Ptolemy uses a certain mode of argument per page of text. Based on Ptolemy’s other writings she thinks it will be about 3 times per page. Rather than reading all of 13 volumes of*Syntaxis Mathematica*, Sarah takes a random sample of 90 pages.

**Exercise 8.4 (Bayes Factor)**Answer questions about Bayes Factors from your friend Enrique who has a lot of frequentist statistics experience, but is new to Bayes.

- What are
*posterior odds*? - What are
*prior odds*? - What’s a
*Bayes Factor*and why we might want to calculate it?

**Exercise 8.5 (Posterior prediction: Concepts)**

- What two types of variability do posterior predictive models incorporate? Define each type such that your non-Bayesian statistics friends could understand.
- Describe a real-life situation in which it would be helpful to carry out posterior prediction.
- Is a posterior predictive model conditional on just the data, just the parameter, or on both the data and the parameter?

### 8.7.2 Practice exercises

**Exercise 8.6 (Credible intervals: Part I)**For each situation, find the appropriate credible interval using the “middle” approach.

- a 95% credible interval for \(\pi\) with \(\pi|y \sim \text{Beta}(4,5)\)
- a 60% credible interval for \(\pi\) with \(\pi|y \sim \text{Beta}(4,5)\)
- a 95% credible interval for \(\lambda\) with \(\lambda|y \sim \text{Gamma}(1,8)\)

**Exercise 8.7 (Credible intervals: Part II)**For each situation, find the appropriate credible interval using the “middle” approach.

- a 99% credible interval of \(\lambda\) with \(\lambda|y \sim \text{Gamma}(1,5)\)
- a 95% credible interval of \(\mu\) with \(\mu|y \sim \text{Normal}(10,4.1)\)
- an 80% credible interval of \(\mu\) with \(\mu|y \sim \text{Normal}(-3,2)\)

**Exercise 8.8 (Credible intervals: Highest posterior density)**There’s more than one approach to constructing a 95% credible interval. The “middle 95%” approach reports the range of the middle 95% of the posterior density, from the 2.5th to the 97.5th percentile. The “highest posterior density” approach reports the 95% of posterior values with the highest posterior densities.

- Let \(\lambda|y \sim \text{Gamma}(1,5)\). Construct the 95% highest posterior density credible interval for \(\lambda\). Represent this interval on a sketch of the posterior pdf.
- Repeat part a using the middle 95% approach.
- Compare the two intervals from parts a and b. Are they the same? If not, how do they differ and which is more appropriate here?
- Let \(\mu|y \sim \text{Normal}(-13,1.7)\). Construct the 95% highest posterior density credible interval for \(\mu\).
- Repeat part d using the middle 95% approach.
- Compare the two intervals from parts d and e. Are they the same? If not, why not?

**Exercise 8.9 (Hypothesis tests: Part I)**For parameter \(\pi\), suppose you have a Beta(1,0.8) prior model and a Beta(4,3) posterior. You wish to test the null hypothesis that \(\pi \le 0.4\) versus the alternative that \(\pi > 0.4\).

- What is the posterior probability for the alternative hypothesis?
- What are the posterior odds?
- What are the prior odds?
- What is the Bayes Factor?
- Putting this together, explain your conclusion about these hypotheses to someone who is unfamiliar with Bayesian statistics.

**Exercise 8.10 (Hypothesis tests: Part II)**Repeat Exercise 8.9 for the following scenario. For parameter \(\mu\), suppose you have a Normal(10,100) prior model, a Normal(4,12) posterior, and you wish to test \(H_0: \mu \ge 5.2\) versus \(H_a: \mu < 5.2\).

**Exercise 8.11 (Posterior predictive Beta-Binomial: With calculus)**As discussed in Section 8.3, it is sometimes possible to derive an exact posterior predictive model. Such is the case with the conjugate models we have studied thus far. To begin, suppose we observe \(Y=y\) successes in \(n\) trials where \(Y | \pi \sim \text{Bin}(n,\pi)\) and \(\pi\) has a Beta(\(\alpha, \beta\)) prior.

- Identify the posterior pdf of \(\pi\) given the observed data \(Y = y\), \(f(\pi|y)\). NOTE: This pdf will depend upon \((y, n, \alpha, \beta, \pi)\).
- Suppose we conduct \(n'\)
*new*trials (where \(n'\) might differ from our original number of trials \(n\)) and let \(Y' = y'\) be the observed number of successes in these new trials. Identify the conditional pmf of \(Y'\) given \(\pi\), \(f(y'|\pi)\). NOTE: This pmf will depend upon \((y', n', \pi)\). - Identify the posterior predictive pmf of \(Y'\), \(f(y'|y)\). NOTE: This pmf, found by multiplying the above two answers and integrating with respect to \(\pi\), will depend upon \((y, n, y', n', \alpha, \beta)\).
- As with the example in Section 8.3, suppose your posterior model of \(\pi\) is based on a prior model with \(\alpha = 4\) and \(\beta = 6\) and an observed \(y = 14\) successes in \(n = 100\) original trials. We plan to conduct \(n' = 20\)
*new*trials. Specify the posterior predictive pmf of \(Y'\), the number of successes we might observe in these 20 trials. - Continuing the previous part, suppose instead we plan to conduct \(n' = 4\)
*new*trials. Specify*and sketch*the posterior predictive pmf of \(Y'\), the number of successes we might observe in these 4 trials.

**Exercise 8.12 (Posterior predictive Gamma-Poisson: With calculus)**Suppose we have observed count data, \(Y = y\), where \(Y|\lambda \sim \text{Pois}(\lambda)\) and \(\lambda\) has a Gamma(\(s, r\)) prior.

- Identify the posterior pdf of \(\lambda\) given the observed data \(Y = y\), \(f(\lambda|y)\).
- Let \(Y' = y'\) be the number of events that will occur in a
*new*observation period (of the same length as the original). Identify the conditional pmf of \(Y'\) given \(\lambda\), \(f(y'|\lambda)\). - Identify the posterior predictive pmf of \(Y'\), \(f(y'|y)\). NOTE: This will depend upon \((y, y', s, r)\).
- Suppose your posterior model of \(\lambda\) is based on a prior model with \(s = 50\) and \(r = 50\) and \(y = 7\) events in your original observation period. Specify the posterior predictive pmf of \(Y'\), the number of events we might observe in the next observation period.
- Sketch and discuss the posterior predictive pmf from part d.

**Exercise 8.13 (Posterior predictive Normal-Normal: With calculus)**Let \(Y = y\) be an observed data point from a Normal model with mean \(\mu\) and standard deviation \(\sigma\). Further, suppose \(\sigma\) is known and that \(\mu\) is unknown with a Normal(\(\theta, \tau^2\)) prior.

- Identify the posterior pdf of \(\mu\) given the observed data, \(f(\mu|y)\).
- Let \(Y' = y'\) be the value of a
*new*data point. Identify the conditional pdf of \(Y'\) given \(\mu\), \(f(y'|\mu)\). - Identify the posterior predictive pdf of \(Y'\), \(f(y'|y)\).
- Suppose \(y = -10\), \(\sigma = 3\), \(\theta = 0\), and \(\tau = 1\). Specify the posterior predictive pdf of \(Y'\).

### 8.7.3 Application & simulation exercises

**Exercise 8.14 (Climate change: estimation)**Let \(\pi\) denote the proportion of U.S. adults that do not believe in climate change. To learn about \(\pi\), we’ll use survey data on \(n\) adults and count up the number of these that don’t believe in climate change, \(Y\).

- Explain which Bayesian model is appropriate for this analysis: Beta-Binomial, Gamma-Poisson, or Normal-Normal.
- Specify and discuss your own prior model for \(\pi\).
- For the remainder of the exercise, we’ll utilize the authors’ Beta(1,2) prior for \(\pi\). How does your prior understanding differ from that of the authors?
- Using the
`pulse_of_the_nation`

data from the**bayesrules**package, report the sample proportion of surveyed adults with the opinion that`climate_change`

is`Not Real At All`

. - In light of the Beta(1,2) prior and data, calculate and interpret a (middle) 95% posterior credible interval for \(\pi\). NOTE: You’ll first need to specify your posterior model of \(\pi\).

**Exercise 8.15 (Climate change: hypothesis testing)**Continuing the analysis from Exercise 8.14, suppose you wish to test \(H_0:\) \(\pi \le 0.1\) versus \(H_a:\) \(\pi > 0.1\).

- What decision might you make about these hypotheses utilizing the credible interval from the previous exercise?
- Calculate and interpret the prior odds of \(H_a\).
- Calculate and interpret the posterior odds of \(H_a\).
- Calculate and interpret the Bayes Factor for your hypothesis test.
- Putting this together, explain your conclusion about \(\pi\).

**Exercise 8.16 (Penguins: estimation)**Let \(\mu\) denote the typical flipper length (in mm) among the

*Adelie*penguin species. To learn about \(\mu\), we’ll utilize flipper measurements \((Y_1,Y_2,\dots,Y_n)\) on a sample of Adelie penguins.

- Explain which Bayesian model is appropriate for this analysis: Beta-Binomial, Gamma-Poisson, or Normal-Normal.
- Your prior understanding is that the average flipper length for all Adelie penguins is about 200mm, but you aren’t very sure. It’s plausible that the average could be as low a 140mm or as high as 260mm. Specify an appropriate prior model for \(\mu\).
- The
`penguins_bayes`

data in the**bayesrules**package contains data on the flipper lengths for a sample of three different penguin species. For the*Adelie*`species`

, how many data points are there and what’s the sample mean`flipper_length_mm`

? - In light of your prior and data, calculate and interpret a (middle) 95% posterior credible interval for \(\mu\). NOTE: You’ll first need to specify your posterior model of \(\mu\).

**Exercise 8.17 (Penguins: hypothesis testing)**Let’s continue our analysis of \(\mu\), the typical flipper length (in mm) among the

*Adelie*penguin species.

- You hypothesize that the average Adelie flipper length is either less than 200 or greater than 220 mm. State this as a formal hypothesis test (using \(H_0\), \(H_a\), and \(\mu\) notation).
**Note:**This is a two-sided hypothesis test! - What decision might you make about these hypotheses utilizing the credible interval from the previous exercise?
- Calculate and interpret the Bayes Factor for your hypothesis test.
- Putting this together, explain your conclusion about \(\mu\).

**Exercise 8.18 (Loons: estimation)**The

*loon*is a species of bird common to the Ontario region of Canada. Let \(\lambda\) denote the typical number of loons observed by a birdwatcher across a 100 hour observation period. To learn about \(\lambda\), we’ll utilize bird counts \((Y_1,Y_2,\dots,Y_n)\) collected in \(n\) different outings.

- Explain which Bayesian model is appropriate for this analysis: Beta-Binomial, Gamma-Poisson, or Normal-Normal.
- Your prior understanding is that the typical rate of loon sightings is 2 per 100 hours with a standard deviation of 1 per 100 hours. Specify an appropriate prior model for \(\lambda\) and explain your reasoning.
- The
`loons`

data in the**bayesrules**package contains loon counts in different 100 hour observation periods. How many data points do we have and what’s the average loon count per 100 hours? - In light of your prior and data, calculate and interpret a (middle) 95% posterior credible interval for \(\lambda\). NOTE: You’ll first need to specify your posterior model of \(\lambda\).

**Exercise 8.19 (Loons: hypothesis testing)**Let’s continue our analysis of \(\lambda\), the typical rate of loon sightings in a 100 hour observation period.

- You hypothesize that birdwatchers should anticipate a rate of less than 1 loon per 100 observation hours. State this as a formal hypothesis test (using \(H_0\), \(H_a\), and \(\lambda\) notation).
- What decision might you make about these hypotheses utilizing the credible interval from the previous exercise?
- Calculate and interpret the Bayes Factor for your hypothesis test.
- Putting this together, explain your conclusion about \(\lambda\).

**Exercise 8.20 (Posterior analysis with MCMC: Estimation)**Consider the Bayesian model of \(\pi\) below, constructed from the observation of \(Y = 23\) successes in \(n = 50\) trials:

\[\begin{equation} \begin{split} Y | \pi & \sim \text{Bin}(50, \pi) \\ \pi & \sim \text{Beta}(7, 3) \\ \end{split} \;\; \Rightarrow \;\; \pi|(Y = 23) \sim \text{Beta}(30,30) (\#ex:ch8-bb) \end{equation}\]

- Simulate the posterior model of \(\pi\) with
**rstan**using 4 chains and 10000 iterations per chain. - Produce trace plots and overlaid density plots of the values for each of the four chains.
- Report the effective sample size ratio and R-hat values for your simulation, explaining what these values mean in context.
- Calculate the
*exact*70% and 95% posterior credible intervals for \(\pi\) using the Beta(30,30) posterior model. - Utilize your MCMC simulation to
*approximate*the 70% and 95% posterior credible intervals for \(\pi\). NOTE: You can use the`tidy()`

shortcut function for the 95% interval, but are also encouraged to calculate the CIs directly from your chain values.

**Exercise 8.21 (Posterior analysis with MCMC: Prediction)**

- Continuing with Bayesian model
**??**, suppose you were to run 14 more trials and let \(Y'\) be the number of successes in those trials. Use your MCMC simulation to approximate the posterior predictive model of \(Y'\). Construct a histogram visualization of this model. - Discuss your posterior predictive model of \(Y'\): how many successes do you anticipate? what’s a reasonable range of possible \(Y'\) values?
- Approximate the probability that we observe at least 9 successes in the next 14 trials.

**Exercise 8.22 (Posterior analysis with MCMC: Hypothesis testing)**

- Continuing with Bayesian model
**??**, we want to test the hypothesis that \(\pi > 0.48\). Calculate the*exact*posterior probability of this hypothesis using the Beta(30,30) posterior model and interpret this posterior probability in words. - Utilize your MCMC simulation to
*approximate*the posterior probability that \(\pi > 0.48\).

Answer: yes↩︎