Chapter 3 The Beta-Binomial Bayesian Model

Let’s get back on the presidential campaign trail with Michelle. In Section 2.2 we saw that Michelle won the Iowa caucus. In fact, she even went on to secure her political party’s nomination! Her next challenge is to win the presidential election. Suppose you’re Michelle’s campaign manager for the state of Minnesota. As such, you’ve conducted 30 different polls throughout the election season. Though Michelle’s support has hovered around 45%, she polled at around 35% in the dreariest days and around 55% in the best days on the campaign trail (Figure 3.1 (left)).

The results of 30 previous polls of Minnesotans’ support of Michelle for president (left) and a corresponding continuous prior model for \(\pi\), her current election support (right).

FIGURE 3.1: The results of 30 previous polls of Minnesotans’ support of Michelle for president (left) and a corresponding continuous prior model for \(\pi\), her current election support (right).

Elections are dynamic, thus Michelle’s support is always in flux. Yet past polls provide prior information about \(\pi\), the proportion of Minnesotans that currently support Michelle. In fact, we can reorganize this information into a formal prior probability model of \(\pi\). We worked a similar example in Section 2.3, in which context \(\pi\) was Kasparov’s probability of beating Deep Blue at chess. In that case, we greatly over-simplified reality to fit within the framework of introductory Bayesian models. Mainly, we assumed that \(\pi\) could only be 0.2, 0.5, or 0.8, the corresponding chances of which were defined by a discrete probability model. However, in the reality of Michelle’s election support and Kasparov’s chess skill, \(\pi\) can be any value between 0 and 1.

We can reflect this reality and conduct a more nuanced Bayesian analysis by constructing a continuous probability model of \(\pi\). To this end, the continuous probability density function26 (pdf) \(f(\pi)\) in Figure 3.1 (right) preserves the trends, variability, and overall information in the past polls. Though it looks quite different, the role of this continuous pdf is the same as for the discrete probability mass function (pmf) \(f(\pi)\) in Table 2.4: to specify all possible values of \(\pi\) and the relative plausibility of each. That is, \(f(\pi)\) provides answers to ‘what values can \(\pi\) take and which are more plausible than others?’ Here, \(f(\pi)\) reflects the fact that Michelle’s support \(\pi\) can be anywhere between 0 and 1, but is most likely around 0.45.

With the prior model in hand, you collect data that provides extra insight into \(\pi\): in a new poll of \(n = 50\) potential voters, \(Y = 30\) (60%) support Michelle. We can translate this polling data into insights about Michelle’s support via the likelihood function of \(\pi\). The likelihood function, scaled to integrate to 1 for the purposes of comparing it to the prior, is shown in Figure 3.2.27

The prior model of \(\pi\) along with the (scaled) likelihood function of \(\pi\) given the new poll results in which \(Y = 30\) of \(n = 50\) polled Minnesotans support Michelle.

FIGURE 3.2: The prior model of \(\pi\) along with the (scaled) likelihood function of \(\pi\) given the new poll results in which \(Y = 30\) of \(n = 50\) polled Minnesotans support Michelle.

Recall that the likelihood function specifies the relative likelihood of different values of \(\pi\) between 0 and 1 producing this observed poll result. Thus the greater its likelihood, the more compatible a value of \(\pi\) is with the observed polling data. The fact that the likelihood function in Figure 3.2 is greatest when \(\pi = 0.6\) suggests that the 60% support for Michelle among polled voters is most likely when her underlying support is also at 60%. This makes sense! The further that \(\pi\) is from 0.6, the less compatible it is with the observed poll. It’s extremely unlikely that we would’ve observed a 60% support rate in the new poll if, in fact, Michelle’s underlying support were as low as 30% or as high as 90%.

We also see in Figure 3.2 that your prior and likelihood don’t completely agree. Constructed from old polls, the prior is a bit more pessimistic about Michelle’s election support than the likelihood, which was constructed from the latest poll. Yet both of these insights are valuable to our analysis! Just as much as we shouldn’t ignore the new poll in favor of the old, we also shouldn’t throw out our bank of prior information in favor of the newest thing (also great life advice). Thinking like Bayesians, we can construct a posterior model of \(\pi\) which combines the information from the prior with that from the likelihood.

Which plot below reflects the correct posterior model of Michelle’s election support \(\pi\)?


Plot B is the only plot in which the posterior model of \(\pi\) strikes a balance between the relative pessimism of the prior and optimism of the likelihood. This, the correct posterior model, is plotted in Figure 3.3 and provides an updated look into Minnesotans’ support for Michelle. First, the posterior being centered at \(\pi = 0.5\) suggests that Michelle’s support is equally likely to be above or below the 50% threshold required to win Minnesota. Further, combining information from the prior and likelihood, the range of posterior plausible values has narrowed: we can be fairly certain that Michelle’s support is somewhere between 35% and 65%.

The prior model, likelihood function, and posterior model of \(\pi\), Michelle’s election support.

FIGURE 3.3: The prior model, likelihood function, and posterior model of \(\pi\), Michelle’s election support.

All this to say that the spirit of Bayesian thinking is constant. No matter if our parameter \(\pi\) is continuous or discrete, the posterior model of \(\pi\) combines insights from the prior and likelihood. Directly ahead, you can look forward to digging into the details and building the election model we’ve observed here. You’ll then generalize this work to the foundational Beta-Binomial Bayesian model. The power of the Beta-Binomial lies in its broad applications. Mainly, Michelle’s election support \(\pi\) isn’t the only variable of interest that lives on [0,1]. You might also imagine Bayesian analyses in which we’re interested in modeling the proportion of people that use public transit, the proportion of trains that are delayed, the proportion of people that prefer cats to dogs, and so on. The Beta-Binomial model provides the tools we need to study the proportion of interest, \(\pi\), in each of these settings.

  • Utilize and tune continuous priors. In Chapter 3, you will move from a simpler (and less realistic) discrete prior for \(\pi\), to a continuous Beta prior model. You will examine the properties of the Beta and how to tune this model to reflect your prior information about \(\pi\).
  • Interpret and communicate features of prior and posterior models using properties such as mean, mode, and variance.
  • Construct the Beta-Binomial model for proportion \(\pi\). By the end of Chapter 3, you will have built the prior, likelihood, and posterior for one of the most foundational models in Bayesian statistics!

The code throughout this chapter will require the following packages:

# Load packages
library(bayesrules)
library(tidyverse)

3.1 The Beta prior model

In building the Bayesian election model of Michelle’s election support among Minnesotans, \(\pi\), we begin as usual: with the prior. Specifically, we need to translate the picture of our prior understanding in Figure 3.1 (right) into a formal probability model of \(\pi\). It’s natural to turn to the Beta probability model here since, like Michelle’s support \(\pi\), a Beta random variable is continuous and restricted to live on [0,1]. In this section, you’ll explore the properties of the Beta model and how to tune the Beta to reflect our prior understanding of Michelle’s support \(\pi\).

3.1.1 Beta foundations

Let’s begin with a general definition of the Beta probability model.

The Beta model

Let \(\pi\) be a random variable which can take any value between 0 and 1, ie. \(\pi \in [0,1]\). Then the variability in \(\pi\) might be well modeled by a Beta model with shape parameters \(\alpha > 0\) and \(\beta > 0\):

\[\pi \sim \text{Beta}(\alpha, \beta)\]

The Beta model is specified by continuous pdf

\[\begin{equation} f(\pi) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \pi^{\alpha-1} (1-\pi)^{\beta-1} \;\; \text{ for } \pi \in [0,1] \tag{3.1} \end{equation}\] where \(\Gamma(z) = \int_0^\infty x^{z-1}e^{-y}dx\) and \(\Gamma(z + 1) = z \Gamma(z)\). Fun fact: when \(z\) is a positive integer, then \(\Gamma(z)\) simplifies to \(\Gamma(z) = (z-1)!\).

Beta(\(\alpha,\beta\)) pdfs \(f(\pi)\) are plotted under a variety of tunings for shape parameters \(\alpha\) and \(\beta\) (black curve). The mean and mode of each model is represented by a blue solid line and dashed line, respectively.

FIGURE 3.4: Beta(\(\alpha,\beta\)) pdfs \(f(\pi)\) are plotted under a variety of tunings for shape parameters \(\alpha\) and \(\beta\) (black curve). The mean and mode of each model is represented by a blue solid line and dashed line, respectively.

This model is best understood by playing around. Check out Figure 3.4 which plots the Beta pdf \(f(\pi)\) under a variety of shape parameters, \(\alpha\) and \(\beta\). The first thing your eye might pick up on is the fact that \(f(\pi)\) is always continuous. Thus the Beta model inherits the properties of general continuous probability models.

Continuous probability models

Let \(\pi\) be a continuous random variable with pdf \(f(\pi)\). Then \(f(\pi)\) has the following properties:

  • \(\int_\pi f(\pi)d\pi = 1\), ie. the area under \(f(\pi)\) is 1
  • \(f(\pi) \ge 0\)
  • \(P(a < \pi < b) = \int_a^b f(\pi) d\pi\) when \(a \le b\)

Interpreting \(f(\pi)\)

It’s possible that \(f(\pi) > 1\), thus a continuous pdf cannot be interpreted as a probability. Rather, \(f(\pi)\) can be used to compare the plausibility of two different values of \(\pi\): the greater \(f(\pi)\), the more plausible the corresponding value of \(\pi\).

Your eye also likely picked up on the flexible shapes the Beta pdf can take. We can tune the Beta to reflect the behavior in \(\pi\) by tweaking the two shape parameters \(\alpha\) and \(\beta\). For example, notice that when we set \(\alpha = \beta = 1\) (middle left plot), the Beta model is flat from 0 to 1. In this setting, the Beta model is equivalent to perhaps a more familiar model, the standard Uniform model.

The standard Uniform model

When it’s equally plausible for \(\pi\) to take on any value between 0 and 1, we can model \(\pi\) by the standard Uniform model

\[\pi \sim \text{Unif}(0,1)\]

with pdf \(f(\pi) = 1\) for \(\pi \in [0,1]\). The Unif(0,1) model is a special case of the Beta(\(\alpha,\beta\)) when \(\alpha = \beta = 1\).

Take a minute to see if you can identify some other patterns in how shape parameters \(\alpha\) and \(\beta\) impact the trend and variability in the Beta model.28

  1. How would you describe the trend of a Beta(\(\alpha,\beta\)) model when \(\alpha = \beta\)?
    a) Right-skewed with \(\pi\) tending to be less than 0.5.
    b) Symmetric with \(\pi\) tending to be around 0.5.
    c) Left-skewed with \(\pi\) tending to be greater than 0.5.

  2. How would you describe the trend of a Beta(\(\alpha,\beta\)) model when \(\alpha > \beta\)?
    a) Right-skewed with \(\pi\) tending to be less than 0.5.
    b) Symmetric with \(\pi\) tending to be around 0.5.
    c) Left-skewed with \(\pi\) tending to be greater than 0.5.

  3. For which model is there greater variability in the plausible values of \(\pi\), Beta(20,20) or Beta(5,5)?

We can support our observations of the trend and variability in \(\pi\) with numerical measurements. The mean (or “expected value”) and mode of \(\pi\) provide measures of trend. Conceptually speaking, the mean captures the average value of \(\pi\) whereas the mode captures the most plausible value of \(\pi\), ie. the value of \(\pi\) at which pdf \(f(\pi)\) is maximized. These measures are represented by the solid and dashed vertical lines, respectively, in Figure 3.4. Notice that when \(\alpha\) is less than \(\beta\) (top row), the Beta pdf is right skewed, thus the mean exceeds the mode of \(\pi\) and both are below 0.5. The opposite is true when \(\alpha\) is greater than \(\beta\) (bottom row). When \(\alpha\) and \(\beta\) are equal (center row), the Beta pdf is symmetric around a common mean and mode of 0.5. These trends reflect the formulas for the mean (denoted \(E(\pi)\)) and mode for a Beta(\(\alpha, \beta\)) variable \(\pi\):

\[\begin{equation} E(\pi) = \frac{\alpha}{\alpha + \beta} \;\; \text{ and } \;\; \text{Mode}(\pi) = \frac{\alpha - 1}{\alpha + \beta - 2} \; . \tag{3.2} \end{equation}\]

Figure 3.4 also reveals patterns in the variability of \(\pi\). For example, with values that tend to be closer to the mean of 0.5, the variability in \(\pi\) is smaller for the Beta(20,20) model than for the Beta(5,5) model. We can measure the variability of a Beta(\(\alpha,\beta\)) random variable \(\pi\) by variance

\[\begin{equation} \text{Var}(\pi) = \frac{\alpha \beta}{(\alpha + \beta)^2(\alpha + \beta + 1)} \;. \tag{3.3} \end{equation}\]

Roughly speaking, variance measures the typical squared distance between possible \(\pi\) values and the mean, \(E(\pi)\). Since the variance thus has squared units, it’s typically easier to work with the standard deviation which measures the typical (unsquared) difference between all possible \(\pi\) and \(E(\pi)\):

\[\text{SD}(\pi) := \sqrt{\text{Var}(\pi)} \; .\]

Though these types of manipulations aren’t in the spirit of this book, the formulas for measuring trend and variability, (3.2) and (3.3), don’t magically pop out of nowhere. They are obtained by applying general definitions of mean, mode, and variance to the Beta pdf (3.1). These definitions are provided here – feel free to skip them without consequence.

Measuring trend and variability

Let \(\pi\) be a continuous random variable with pdf \(f(\pi)\). Consider two common measures of the trend in \(\pi\). The mean or expected value of \(\pi\) captures the weighted average of \(\pi\), where each possible \(\pi\) value is weighted by its corresponding pdf value:
\[E(\pi) = \int \pi \cdot f(\pi)d\pi\] The mode of \(\pi\) captures the most plausible value of \(\pi\), ie. the value of \(\pi\) for which the pdf is maximized:
\[\text{Mode}(\pi) = \text{argmax}_\pi f(\pi)\]

Next, consider two common measures of the variability in \(\pi\). The variance in \(\pi\) roughly measures the typical or expected squared distance of possible \(\pi\) values from their mean:

\[\text{Var}(\pi) = E((\pi - E(\pi))^2) = E(\pi^2) - [E(\pi)]^2\]

The standard deviation in \(\pi\) roughly measures the typical or expected distance of possible \(\pi\) values from their mean:

\[\text{SD}(\pi) := \sqrt{\text{Var}(\pi)} \; .\]

NOTE: If \(\pi\) were discrete, we’d replace \(\int\) with \(\sum\).

3.1.2 Tuning the Beta prior

With a sense for how the Beta(\(\alpha,\beta\)) model works, let’s tune the shape parameters \(\alpha\) and \(\beta\) to reflect our prior information about Michelle’s election support \(\pi\). We saw in Figure 3.1 (left) that across 30 previous polls, Michelle’s average support was around 45 percentage points, though she roughly polled as low as 25 and as high as 65 percentage points. Our Beta(\(\alpha,\beta\)) prior should have similar trends and variability. For example, we want to pick \(\alpha\) and \(\beta\) for which \(\pi\) tends to be around 0.45, \(E(\pi) = \alpha/(\alpha + \beta) \approx 0.45\). Or, after some rearranging,

\[\alpha \approx \frac{9}{11} \beta \; .\]

We consider Beta models with \(\alpha\) and \(\beta\) pairs that meet this proportionality such as Beta(9,11), Beta(18,22), Beta(27,33) and so on. Through some trial and error within these constraints and plotting these candidate models using the plot_beta() function in the bayesrules package, we find that the features of the Beta(45,55) model closely match the trend and variability in the previous polls:

plot_beta(45,55)
Plot of probability density function for Beta(45,55).

FIGURE 3.5: Plot of probability density function for Beta(45,55).

Thus a reasonable prior model for Michelle’s election support is

\[\pi \sim \text{Beta}(45,55)\]

with prior pdf \(f(\pi)\) following from plugging 45 and 55 into (3.1),

\[\begin{equation} f(\pi) = \frac{\Gamma(100)}{\Gamma(45)\Gamma(55)}\pi^{44}(1-\pi)^{54} \;\; \text{ for } \pi \in [0,1] \; . \tag{3.4} \end{equation}\]

By (3.2), this model specifies that Michelle’s election support is most likely around 45 percentage points, with prior mean and prior mode

\[\begin{equation} E(\pi) = \frac{45}{45 + 55} = 0.4500 \;\; \text{ and } \;\; \text{Mode}(\pi) = \frac{45 - 1}{45 + 55 - 2} = 0.4490 \;. \tag{3.5} \end{equation}\]

Further, by (3.3), the potential variability in \(\pi\) is described by a prior variance of 25 percentage points2, or a prior standard deviation of 5 percentage points:

\[\begin{equation} \text{Var}(\pi) = \frac{45 \cdot 55}{(45 + 55)^2(45 + 55 + 1)} = 0.0025 \;. \tag{3.6} \end{equation}\]

3.2 The Binomial likelihood

In the second step of our Bayesian analysis of Michelle’s election support \(\pi\), you’re ready to collect some data. You plan to conduct a new poll of \(n = 50\) Minnesotans and record \(Y\), the number that support Michelle. The results depend upon, thus will provide insight into, \(\pi\). To model the dependence of \(Y\) on \(\pi\), we can make the following assumptions about the poll: 1) the voters answer the poll independently of one another; and 2) the probability that any polled voter supports your candidate Michelle is \(\pi\). It follows from our work in Section 2.3.2 that, conditional on \(\pi\), \(Y\) is Binomial. Specifically,

\[Y | \pi \sim \text{Bin}(50, \pi)\]

with conditional pmf \(f(y|\pi)\) defined for \(y \in \{0,1,2,...,50\}\),

\[\begin{equation} f(y|\pi) = P((Y=y) | \pi) = {50 \choose y} \pi^y (1-\pi)^{50-y} \; . \tag{3.7} \end{equation}\]

Given its importance in our Bayesian analysis, it’s worth re-emphasizing the details provided by the Binomial model. To begin, the conditional pmf \(f(y|\pi)\) provides answers to a hypothetical question: if Michelle’s support were some given value of \(\pi\), then how many of the 50 polled voters \(Y = y\) might we expect to support her? This pmf is plotted under a range of possible \(\pi\) in Figure 3.6. These plots formalize our understanding that if Michelle’s support \(\pi\) were low (top row), the polling result \(Y\) is also likely to be low. If her support were high (bottom row), \(Y\) is also likely to be high.

The pmf \(f(y|\pi)\) of a Bin(50, \(\pi\)) model is plotted for values of \(\pi \in \{0.1, 0.2, \ldots, 0.9\}\). The pmfs at the observed value of polling data \(Y = 30\) are highlighted in blue.

FIGURE 3.6: The pmf \(f(y|\pi)\) of a Bin(50, \(\pi\)) model is plotted for values of \(\pi \in \{0.1, 0.2, \ldots, 0.9\}\). The pmfs at the observed value of polling data \(Y = 30\) are highlighted in blue.

In reality, we ultimately observe that the poll was a huge success: \(Y = 30\) of \(n = 50\) (60%) polled voters support Michelle! This result is highlighted in blue among the pmfs in Figure 3.6. To focus on just these results that match the observed polling data, we extract and compare these blue lines in a single plot (Figure 3.7). These represent the likelihoods of each potential level of Michelle’s support being \(\pi\) in \(\{0.1,0.2,\ldots,0.9\}\) given the observed polling data, \(Y = 30\). In fact, these are just a few points along the complete continuous likelihood function \(L(\pi | (y=30))\) defined for any \(\pi\) between 0 and 1 (black curve).

The likelihood function, \(L(\pi | (y = 30))\), of Michelle’s election support \(\pi\) given the observed poll in which \(Y = 30\) of \(n = 50\) polled Minnesotans supported her. The blue vertical lines represent the likelihood evaluated at \(\pi\) in \(\{0.1,0.2,\ldots,0.9\}\).

FIGURE 3.7: The likelihood function, \(L(\pi | (y = 30))\), of Michelle’s election support \(\pi\) given the observed poll in which \(Y = 30\) of \(n = 50\) polled Minnesotans supported her. The blue vertical lines represent the likelihood evaluated at \(\pi\) in \(\{0.1,0.2,\ldots,0.9\}\).

Recall that the likelihood function is defined by turning the conditional Binomial pmf on its head. Now treating \(Y = 30\) as observed data and \(\pi\) as unknown (matching the reality of our situation), we can rethink of \(f((y = 30) | \pi)\) as the likelihood of any given \(\pi\) given the data. Specifically, the likelihood function \(L(\pi | (y = 30))\) follows from plugging \(y = 30\) into (3.7):29

\[\begin{equation} L(\pi | (y=30)) = {50 \choose 30} \pi^{30} (1-\pi)^{20} \; \; \text{ for } \pi \in [0,1] \; . \tag{3.8} \end{equation}\]

It is now easier to see \(L(\pi|(y = 30))\) as a function of \(\pi\) that provides insight into the relative compatibility of different \(\pi \in [0,1]\) with the observed polling data \(Y = 30\). The fact that \(L(\pi|(y=30))\) drops off for values of \(\pi\) below 0.4 means that values of \(\pi\) in this range aren’t compatible with the polling data. Mainly, given that she polled as high as 60% in a survey of 50 voters, it’s unlikely that Michelle’s underlying support is below 40%. It’s similarly unlikely that Michelle’s support exceeds 80%. Further, the likelihood function \(L(\pi | (y=30))\) is maximized when \(\pi = 0.6\). Thus, given that Michelle polled at 60%, the most likely value of her underlying support is also 60%.

3.3 The Beta posterior model

We now have the two foundational pieces of our Bayesian model in place – the Beta prior model for Michelle’s support \(\pi\) and the Binomial likelihood model of the dependence of polling data \(Y\) on \(\pi\):

\[\begin{split} Y | \pi & \sim \text{Bin}(50, \pi) \\ \pi & \sim \text{Beta}(45, 55) \\ \end{split}\]

Upon observing the poll in which 60% of voters supported Michelle (\(Y = 30\)), our goal is to construct the posterior model of \(\pi\). We already revealed the posterior punchline in the introduction. You can reproduce this figure using the plot_beta_binomial() function in the bayesrules package:

plot_beta_binomial(alpha = 45, beta = 55, y = 30, n = 50)

As expected, the posterior model strikes a balance between the prior and likelihood. In this case, it’s slightly “closer” to the prior than to the likelihood. (We’ll gain intuition for why this is the case in the next chapter.) You might also recognize something new: like the prior, the posterior model of \(\pi\) is continuous and lives on [0,1]. That is, like the prior, the posterior appears to be a Beta(\(\alpha,\beta\)) model where the shape parameters have been updated to combine information from the prior and data. This is indeed the case! Conditioned on the observed poll results (\(Y = 30\)), the posterior model of Michelle’s election support is Beta(75, 75):

\[\pi | (Y = 30) \sim \text{Beta}(75,75)\]

with a corresponding conditional pdf which follows from (3.1):

\[\begin{equation} f(\pi | (y = 30)) = \frac{\Gamma(150)}{\Gamma(75)\Gamma(75)}\pi^{74} (1-\pi)^{74} \;\; \text{ for } \pi \in [0,1] \; . \tag{3.9} \end{equation}\]

Before backing up this claim with some math, let’s examine the evolution in your understanding of Michelle’s election support. The summarize_beta_binomial() function in the bayesrules package summarizes the trend and variability in the prior and posterior models of your election support \(\pi\). These calculations follow directly from applying the prior and posterior Beta parameters into (3.2) and (3.3):

summarize_beta_binomial(alpha = 45, beta = 55, y = 30, n = 50)
      model alpha beta mean  mode      var
1     prior    45   55 0.45 0.449 0.002450
2 posterior    75   75 0.50 0.500 0.001656

A comparison illuminates the influence of the observed polling data on the posterior model. Mainly, after observing the poll in which 30 of 50 people supported Michelle, the posterior mean of her underlying support \(\pi\) nudged up from approximately 45% to 50%:

\[E(\pi) = 0.45 \;\; \text{ vs } \;\; E(\pi | (Y = 30)) = 0.50 \; .\]

Further, the variability within the model decreased, indicating a narrower range of posterior plausible \(\pi\) values in light of the polling data:

\[\text{Var}(\pi) \approx 0.0025 \;\; \text{ vs } \;\; \text{Var}(\pi | (Y = 30)) \approx 0.0017 \; .\]

If you’re happy taking our word that the posterior model of \(\pi\) is Beta(75,75), you can do so and still be prepared for the next material in the book. However, we strongly recommend that you consider the magic from which the posterior is built. Going through the process can help you further develop intuition for Bayesian modeling. As with our previous Bayesian models, the posterior conditional pdf of \(\pi\) strikes a balance between the prior pdf \(f(\pi)\) and the likelihood function \(L(\pi|(y = 30))\) via Bayes’ Rule (2.11):

\[f(\pi | (y = 30)) = \frac{f(\pi)L(\pi|(y = 30))}{f(y = 30)}.\]

Recall from the discussion in Section 2.3.5 that \(f(y = 30)\) is a normalizing constant, that is, a constant across \(\pi\) which ensures that posterior pdf \(f(\pi | (y = 30))\) is scaled to integrate to 1. We don’t need to calculate the normalizing constant in order to construct the posterior model. Rather, we can simplify the posterior construction by utilizing the fact that the posterior is proportional to the product of the prior (3.4) and likelihood (3.8):

\[\begin{split} f(\pi | (y = 30)) & \propto f(\pi) L(\pi | (y=30)) \\ & = \frac{\Gamma(100)}{\Gamma(45)\Gamma(55)}\pi^{44}(1-\pi)^{54} \cdot {50 \choose 30} \pi^{30} (1-\pi)^{20} \\ & = \left[\frac{\Gamma(100)}{\Gamma(45)\Gamma(55)}{50 \choose 30} \right] \cdot \pi^{74} (1-\pi)^{74} \\ & \propto \pi^{74} (1-\pi)^{74} \; . \\ \end{split}\]

In the final line of our calculation, we made a big simplification: we dropped all constants that don’t depend upon \(\pi\). We don’t need these. Rather, it’s the dependence of \(f(\pi | (y=30))\) on \(\pi\) that we care about:

\[f(\pi | (y=30)) = c\pi^{74} (1-\pi)^{74} \propto \pi^{74} (1-\pi)^{74} \; .\]

We could complete the definition of this posterior pdf by calculating the normalizing constant \(c\) for which the pdf integrates to 1:

\[1 = \int f(\pi | (y=30)) d\pi = \int c \cdot \pi^{74} (1-\pi)^{74} d\pi \;\; \Rightarrow \; \; c = \frac{1}{\int \pi^{74} (1-\pi)^{74} d\pi}.\] But again, we don’t need to do this calculation. The pdf of \(\pi\) is defined by its structural dependence on \(\pi\), that is, the kernel of the pdf. Notice here that posterior pdf \(f(\pi|(y=30))\) has the same kernel as the normalized Beta(75,75) pdf in (3.9):

\[f(\pi | (y=30)) = \frac{\Gamma(150)}{\Gamma(75)\Gamma(75)} \pi^{74} (1-\pi)^{74} \propto \pi^{74} (1-\pi)^{74} \; .\]

The fact that the posterior pdf \(f(\pi | (y=30))\) matches the pdf of a Beta(75,75) verifies our claim that \(\pi | (Y=30) \sim \text{Beta}(75,75)\). Magic! For an extra bit of practice in identifying the posterior model of \(\pi\) from an unnormalized posterior pdf or kernel, take the following quiz.30

For each scenario below, identify the correct Beta posterior model of \(\pi \in [0,1]\) from its unnormalized pdf.

  1. \(f(\pi|y) \propto \pi^{3 - 1}(1-\pi)^{12 - 1}\)
  2. \(f(\pi|y) \propto \pi^{11}(1-\pi)^{2}\)
  3. \(f(\pi|y) \propto 1\)

Now, instead of identifying a model from a kernel, practice identifying the kernels of models.31

Identify the kernels of each pdf below.

  1. \(f(\pi|y) = ye^{-\pi y}\) for \(\pi > 0\)
    1. \(y\)
    2. \(e^{-\pi}\)
    3. \(ye^{-\pi}\)
    4. \(e^{-\pi y}\)
  2. \(f(\pi|y) = \frac{2^y}{(y-1)!} \pi^{y-1}e^{-2\pi}\) for \(\pi > 0\)
    1. \(\pi^{y-1}e^{-2\pi}\)
    2. \(\frac{2^y}{(y-1)!}\)
    3. \(e^{-2\pi}\)
    4. \(\pi^{y-1}\)
  3. \(f(\pi) = 3\pi^2\) for \(\pi \in [0,1]\)

3.4 The Beta-Binomial model

In the previous section we developed the foundational Beta-Binomial model for Michelle’s election support \(\pi\). In doing so, we assumed a specific Beta prior (Beta(45,55)) and a specific polling result (\(Y=30\) of \(n=50\) polled voters supported your candidate) within a specific context. This was a special case of the more general Beta-Binomial model:

\[\begin{split} Y | \pi & \sim \text{Bin}(n, \pi) \\ \pi & \sim \text{Beta}(\alpha, \beta) \\ \end{split}\]

This general model has vast applications, applying to any setting having a parameter of interest \(\pi\) that lives on [0,1] with any tuning of a Beta prior and any data \(Y\) which is the number of “successes” in \(n\) fixed, independent trials, each having probability of success \(\pi\). For example, \(\pi\) might be a coin’s tendency toward Heads and data \(Y\) records the number of Heads observed in a series of \(n\) coin flips. Or \(\pi\) might be the proportion of adults that use social media and we learn about \(\pi\) by sampling \(n\) adults and recording the number \(Y\) that use social media. No matter the setting, upon observing \(Y = y\) successes in \(n\) trials, the posterior of \(\pi\) can be described by a Beta model which reveals the influence of the prior (through \(\alpha\) and \(\beta\)) and data (through \(y\) and \(n\)):

\[\begin{equation} \pi | (Y = y) \sim \text{Beta}(\alpha + y, \beta + n - y) \; . \tag{3.10} \end{equation}\]

Measures of posterior trend and variability follow from (3.2) and (3.3):

\[\begin{equation} \begin{split} E(\pi | (Y=y)) & = \frac{\alpha + y}{\alpha + \beta + n} \\ \text{Mode}(\pi | (Y=y)) & = \frac{\alpha + y - 1}{\alpha + \beta + n - 2} \\ \text{Var}(\pi | (Y=y)) & = \frac{(\alpha + y)(\beta + n - y)}{(\alpha + \beta + n)^2(\alpha + \beta + n + 1)}\\ \end{split} \tag{3.11} \end{equation}\]

Importantly, notice that the posterior follows a different parameterization of the same probability model as the prior – both the prior and posterior are Beta models with different tunings. In this case, we say that the Beta(\(\alpha, \beta\)) model is a conjugate prior for the corresponding Bin(\(n,\pi\)) likelihood model. Our work below will highlight that conjugacy simplifies the construction of the posterior, thus can be a desirable property in Bayesian modeling.

Conjugate prior

We say that \(f(\pi)\) is a conjugate prior for \(L(\pi|y)\) if the posterior, \(f(\pi|y) \propto f(\pi)L(\pi|y)\), is from the same model family as the prior.

The construction of the posterior for the general Beta-Binomial model is very similar to that of the election-specific model. First, the Beta prior pdf \(f(\pi)\) is defined by (3.1) and the likelihood function \(L(\pi|y)\) is defined by (2.6), the conditional pmf of the Bin(\(n,\pi\)) model:

\[\begin{equation} f(\pi) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)}\pi^{\alpha - 1}(1-\pi)^{\beta - 1} \;\; \text{ and } \;\; L(\pi|y) = {n \choose y} \pi^{y} (1-\pi)^{n-y} \; . \tag{3.12} \end{equation}\]

Putting these two pieces together, the posterior pdf follows from Bayes’ Rule:

\[\begin{split} f(\pi | y) & \propto f(\pi)L(\pi|y) \\ & = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)}\pi^{\alpha - 1}(1-\pi)^{\beta - 1} \cdot {n \choose y} \pi^{y} (1-\pi)^{n-y} \\ & \propto \pi^{(\alpha + y) - 1} (1-\pi)^{(\beta + n - y) - 1} \; .\\ \end{split}\]

Again, we’ve dropped normalizing constants which don’t depend upon \(\pi\) and are left with the unnormalized structure of the posterior pdf. Note that this shares the same structure as the normalized Beta(\(\alpha + y\), \(\beta + n - y\)) pdf,

\[f(\pi|y) = \frac{\Gamma(\alpha+\beta+n)}{\Gamma(\alpha+y)\Gamma(\beta+n-y)}\pi^{(\alpha + y) - 1} (1-\pi)^{(\beta + n - y) - 1}.\]

Thus we’ve verified our claim that the posterior model of \(\pi\) given an observed \(Y = y\) successes in \(n\) trials is \(\text{Beta}(\alpha + y, \beta + n - y)\).

3.5 Simulating the Beta-Binomial

Using Section 2.3.6 as a guide, let’s simulate the posterior model of Michelle’s support \(\pi\). We begin by simulating 10,000 values of \(\pi\) from the Beta(45,55) prior using rbeta() and, subsequently, a potential Bin(50,\(\pi\)) poll result \(Y\) from each \(\pi\) using rbinom():

set.seed(84735)
michelle_sim <- data.frame(pi = rbeta(10000, 45, 55)) %>% 
  mutate(y = rbinom(10000, size = 50, prob = pi))

Among the 10,000 pairs of \(\pi\) and \(y\) values, we can focus on just those that match our poll results in which \(Y = 30\) of 50 polled voters supported Michelle:

michelle_posterior <- michelle_sim %>% 
  filter(y == 30)

The remaining set of \(\pi\) values provides an approximation of our Beta(75,75) posterior model of \(\pi\):

ggplot(michelle_posterior, aes(x = pi)) + 
  geom_histogram(color = "white", binwidth = 0.025)

Now, since only 211 of our 10,000 simulations matched our observed \(Y = 30\) data, this posterior approximation might be improved by upping our original simulations from 10,000 to, say, 50,000:

nrow(michelle_posterior)
[1] 211

3.6 Example: Milgram’s behaviorial study of obedience

In a 1963 issue of The Journal of Abnormal and Social Psychology, Stanley Milgram described a study in which he investigated the propensity of people to obey orders from authority figures, even when those orders may harm other people (Milgram 1963). In the paper, Milgram describes the study as:

consist[ing] of ordering a naive subject to administer electric shock to a victim. A simulated shock generator is used, with 30 clearly marked voltage levels that range from IS to 450 volts. The instrument bears verbal designations that range from Slight Shock to Danger: Severe Shock. The responses of the victim, who is a trained confederate of the experimenter, are standardized. The orders to administer shocks are given to the naive subject in the context of a `learning experiment’ ostensibly set up to study the effects of punishment on memory. As the experiment proceeds the naive subject is commanded to administer increasingly more intense shocks to the victim, even to the point of reaching the level marked Danger: Severe Shock.

In other words, study participants were given the task of testing another participant (who was in truth a trained actor) on their ability to memorize facts. If the actor didn’t remember a fact, the participant was ordered to administer a shock on the actor and to increase the shock level with every subsequent failure. Unbeknownst to the participant, the shocks were fake and the actor was only pretending to register pain from the shock. Shockingly, among the 40 participants in Milgram’s study, 26 (65%) administered what they thought to be the maximum shock to the actor.

3.6.1 A Bayesian analysis

We can translate Milgram’s study into the Beta-Binomial framework. The parameter of interest here is \(\pi\), the chance that a person would obey authority (in this case, administering the most severe shock), even if it meant bringing harm to others. Since Milgram passed away in 1984, we don’t have the opportunity to ask him about his understanding of \(\pi\) prior to conducting the study. Thus we’ll diverge from the actual study here, and suppose that another psychologist helped carry out this work. Prior to collecting data, they indicated that a Beta(1,10) model accurately reflected their understanding about \(\pi\), developed through previous work. Next, let \(Y\) be the number of the 40 study participants that would inflict the most severe shock. Assuming that each participant behaves independently of the others, we can model the dependence of \(Y\) on \(\pi\) using the Binomial. In summary, we have the following Beta-Binomial Bayesian model:

\[\begin{split} Y | \pi & \sim \text{Bin}(40, \pi) \\ \pi & \sim \text{Beta}(1,10) \; . \\ \end{split}\]

Before moving ahead with our analysis, let’s pause to examine the psychologist’s prior model:

# Beta(1,10) prior
plot_beta(alpha = 1, beta = 10)

What does the prior model reveal about the psychologist’s prior understanding of \(\pi\)?

  1. They don’t have an informed opinion.
  2. They’re fairly certain that a large proportion of people will do what authority tells them.
  3. They’re fairly certain that only a small proportion of people will do what authority tells them.

The correct answer to this quiz is c! The psychologist’s prior trend is low, with a prior mode of 0 and low variability. Thus the psychologist is fairly certain that very few people will just do whatever authority tells them. Of course, the psychologist’s understanding will evolve upon seeing the results of Milgram’s study. Before doing this together, try utilizing the general formulation in (3.10) to build the psychologist’s posterior model of \(\pi\).

26 of the 40 study participants inflicted what they understood to be the maximum shock. In light of this data, what’s the psychologist’s posterior model of \(\pi\):

\[\pi | (Y = 26) \sim \text{Beta}(\text{???}, \text{???})\]

Plugging the prior parameters (\(\alpha = 1\), \(\beta = 10\)) and data (\(y = 26\), \(n = 40\)) into (3.10) establishes the psychologist’s posterior model of \(\pi\):

\[\pi | (Y = 26) \sim \text{Beta}(27, 24) \; .\]

This posterior is summarized and plotted below, contrasted with the prior and likelihood. Given the strong evidence in the study data, note that the psychologist’s understanding evolved quite a bit from their prior (less than ~25% of people would inflict the most severe shock) to their posterior (between ~30% and ~70% of people would inflict the shock).

plot_beta_binomial(alpha = 1, beta = 10, y = 26, n = 40)

summarize_beta_binomial(alpha = 1, beta = 10, y = 26, n = 40)
      model alpha beta    mean   mode      var
1     prior     1   10 0.09091 0.0000 0.006887
2 posterior    27   24 0.52941 0.5306 0.004791

3.6.2 The role of ethics in statistics and data science

In working through the previous example, we hope you were a bit distracted by your inner voice – this experiment seems ethically dubious. You wouldn’t be alone in this thinking. Stanley Milgram is a controversial historical figure. We chose the above example to not only practice building Beta-Binomial models, but to practice taking a critical eye to our work and the work of others.

Every data collection, visualization, analysis, and communication engenders both harms and benefits to individuals and groups, both direct and indirect. As statisticians and data scientists, it is critical to always consider these harms and benefits. We encourage you to ask yourself the following questions each time you work with data:

  • What are this study’s potential benefits to society? To the study participants?
  • What are this study’s potential risks to society? To the study participants?
  • What ethical issues might arise when generalizing observations on the study participants to a larger population?
  • Who is included and excluded in conducting this study? What are the corresponding risks and benefits? Are individuals in groups that have been historically (and currently) marginalized put at greater risk?
  • Were the people who might be affected by your study involved in the study? If not, you may not be qualified to evaluate these questions.
  • What is the personal story or experience of each subject represented by a row of data?

The importance of considering the context and implications for your statistical and data science work cannot be overstated. As statisticians and data scientists, we are responsible for considering these issues so as not to harm individuals and communities of people. Fortunately, there are many resources available to learn more:

3.7 Chapter summary

In Chapter 3, you built the foundational Beta-Binomial model for \(\pi\), an unknown proportion that can take any value between 0 and 1:

\[\begin{split} Y | \pi & \sim \text{Bin}(n, \pi) \\ \pi & \sim \text{Beta}(\alpha, \beta) \\ \end{split} \;\; \Rightarrow \;\; \pi | (Y = y) \sim \text{Beta}(\alpha + y, \beta + n - y) \; .\]

This model reflects the three pieces common to every Bayesian analysis:

  1. Prior model
    The Beta prior model for \(\pi\) can be tuned to reflect the relative prior plausibility of each \(\pi \in [0,1]\).

\[f(\pi) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)}\pi^{\alpha - 1}(1-\pi)^{\beta - 1}\]

  1. Likelihood model
    To learn about \(\pi\), we collect data \(Y\), the number of successes in \(n\) independent trials, each having probability of success \(\pi\). The dependence of \(Y\) on \(\pi\) is summarized by the Binomial likelihood model.

\[L(\pi|y) = {n \choose y} \pi^{y} (1-\pi)^{n-y}\]

  1. Posterior model
    Via Bayes’ Rule, the conjugate Beta prior combined with the Binomial likelihood produce a Beta posterior model for \(\pi\). The updated parameters of the Beta posterior \((\alpha + y, \beta + n - y)\) reflect the influence of the prior (via \(\alpha\) and \(\beta\)) and the observed data (via \(y\) and \(n\)).

\[f(\pi | y) \propto f(\pi)L(\pi|y) \propto \pi^{(\alpha + y) - 1} (1-\pi)^{(\beta + n - y) - 1} \; .\]

3.8 Exercises

3.8.1 Practice: Beta prior models

Exercise 3.1 (Tune your Beta prior: Take I) In tuning a prior model, you may find that trial and error combined with plotting (using plot_beta()) works well. Other times, you might set up and solve a system of equations that define the properties of your prior. In each situation below, tune a Beta(\(\alpha,\beta\)) model that accurately reflects the given prior information. In many of these situations, there’s no single “right” answer, but rather multiple “reasonable” answers.
  1. Your friend applied to a job and tells you: “I think I have a 40% chance of getting the job, but I’m pretty unsure.” When pressed further, they put their chances between 20% and 60%.
  2. A scientist has created a new test for a rare disease. They expect that the test is accurate 80% of the time with a variance of 0.05.
  3. Your aunt Jo is a successful mushroom hunter. She boasts: “I expect to find enough mushrooms to feed myself and my co-workers at the auto-repair shop 90% of the time, but if I had to give you a likely range it would be between 85% and 100% of the time.”
  4. Your friend Sal (who is a touch hyperbolic) just interviewed for a job, and doesn’t know how to describe their chances of getting an offer. They say “I couldn’t read my interviewer’s expression! I either really impressed them and they are absolutely going to hire me, or I made a terrible impression and they are burning my resume as we speak.”
Exercise 3.2 (Tune your Beta prior: Take II) As in the previous exercise, tune an appropriate Beta(\(\alpha,\beta\)) prior model for each situation below.
  1. Your friend tells you “I think that I have a 80% chance of getting a full night of sleep tonight, and I am pretty certain.” When pressed further, they put their chances between 70% and 90%.
  2. A scientist has created a new test for a rare disease. They expect that it’s accurate 90% of the time with a variance of 0.08.
  3. Max loves to play the video game Animal Crossing. They tell you: “The probability that I play Animal Crossing in the morning is somewhere between 75% and 95%, but most likely around 20%.”
  4. The bakery in East Hampton, Massachusetts often runs out of croissants on Sundays. Ben guesses that by 10am on a Sunday, there is a 30% chance they have run out of croissants, but is pretty unsure about that guess.
Exercise 3.3 (It’s OK to admit you don’t know) You want to specify a Beta prior for a situation in which you have no idea about the parameter of interest \(\pi\). In other words, you think \(\pi\) is equally likely to be anywhere between 0 and 1.
  1. What Beta model should you specify? Plot that model using plot_beta().
  2. What is the mean of the Beta prior that you specified? Explain why that does or does not align with having no clue.
  3. What is the variance of the Beta prior that you specified?
  4. Give an example of a Beta prior that has a smaller variance than the one you specified, plot that prior using plot_beta().
  5. Give an example of a Beta prior that has a larger variance than the one you specified, and plot that prior using plot_beta().
Exercise 3.4 (Which Beta? Take I) Six Beta pdfs are plotted below. Match each to one of the following models: Beta(0.5, 0.5), Beta(1,1), Beta(2,2), Beta(6,6), Beta(6,2), Beta(0.5, 6).

Exercise 3.5 (Which Beta? Take II) Six Beta pdfs are plotted below. Match each to one of the following models: Beta(1, 0.3), Beta(2,1), Beta(3,3), Beta(6,3), Beta(4,2), Beta(5, 6).

Exercise 3.6 (Beta properties: Take I) Let’s examine the properties of the Beta models in the “Which Beta? Take I” exercise.
  1. Which Beta model has the smallest mean? The biggest mean? Provide visual evidence that supports your claims and calculate the corresponding means.
  2. Which Beta model has the smallest variance? The biggest variance? Provide visual evidence that supports your claims and calculate the corresponding variances.
Exercise 3.7 (Beta properties: Take II) Let’s examine the properties of the Beta models in the “Which Beta? Take II” exercise.
  1. Which Beta model has the smallest mean? The biggest mean? Provide visual evidence that supports your claims and calculate the corresponding means.
  2. Which Beta model has the smallest variance? The biggest variance? Provide visual evidence that supports your claims and calculate the corresponding variances.
Exercise 3.8 (Using R for Beta)
  1. Use the plot_beta() function to plot the six Beta models in the “Which Beta? Take I” exercise.
  2. Use the summarize_beta() function to confirm your answers to “Beta properties: Take I” exercise.
Exercise 3.9 (Challenge: establishing features of the Beta model) Let random variable \(\pi\) follow a Beta(\(\alpha, \beta\)) model. Equations (3.2) and (3.3) provide formulas for calculating the mean, mode, and variance of \(\pi\). Confirm these properties by applying the following definitions of mean, mode, and variance directly to the Beta pdf \(f(\pi)\) as defined by (3.1):

\[\begin{split} E(\pi) & = \int \pi f(\pi) d\pi \\ \text{Mode}(\pi) & = \text{argmax}_\pi f(\pi) \\ \text{Var}(\pi) & = E\left[(\pi - E(\pi))^2\right] = E(\pi^2) - \left[E(\pi)\right]^2 \\ \end{split}\]

Exercise 3.10 (Interpreting priors) What do you call a sweet carbonated drink: pop, soda, coke, or something else? Let \(\pi\) be the proportion of U.S. residents that prefer the term “pop.” Two different beverage salespeople from different regions of the country have different priors for \(\pi\).32 The first salesperson works in North Dakota and specifies a Beta(8,2) prior. The second works in Louisiana and specifies a Beta(1,20) prior.
  1. Calculate the prior mean, mode, variance of \(\pi\) (\(E(\pi)\), \(\text{Mode}(\pi)\), and \(\text{Var}(\pi)\)), for both salespeople.
  2. Utilize plot_beta() to plot the prior pdfs for both salespeople.
  3. Use your work in parts (a) and (b) to compare, in words, the salespeoples’ prior understandings about the proportion of U.S. residents that say “pop.”

3.8.2 Practice: Beta-Binomial models

Exercise 3.11 (Different priors, different posteriors) Continuing our carbonated drink analysis, we poll 50 U.S. residents and 12 (24%) prefer the term “pop.”
  1. Specify the unique posterior model of \(\pi\) for both salespeople. Though you can directly apply the general Beta-Binomial framework we built throughout the chapter, we also encourage you to construct these posteriors “from scratch.”
  2. Utilize plot_beta_binomial() to plot the prior, likelihood, and posterior for both salespeople.
  3. Summarize and compare the salespeoples’ posterior understanding of \(\pi\).
Exercise 3.12 (Regular bike ridership) A university’s transportation office wants to know what proportion of students are regular bike riders so that they can install an appropriate number of bike racks. Since the university is located in sunny Southern California, office staff think that 1 in 4 students are regular bike riders on average. They also believe the mode of the proportion of regular bike riders is \(\frac{5}{22}\).
  1. What Beta model reflects the prior ideas of the transportation office staff? Plot this model using plot_beta().
  2. The transportation office randomly selects 50 students and it turns out that 15 of them are regular bike riders. What is the posterior model for the proportion of students who are regular bike riders?
  3. What is the mean, mode, and variance of the posterior model?
  4. Does the posterior model more closely reflect the prior information or the data? Explain your reasoning.
Exercise 3.13 (Same-sex marriage) The Supreme Court of the United States ruled in Obergefell v. Hodges in 2015 that it was illegal for states to deny the right to same-sex marriage. In 2017, a Pew Research survey found that 10.2% of LGBT adults in the U.S. were married to a same-sex spouse.33 Now it’s the 2020s, and Bayard34 guesses that the percent of LGBT adults in the U.S. who are married to a same-sex spouse has most likely increased to about 15%, but could reasonably range from 10% to 25%.
  1. Identify a Beta model that reasonably reflects Bayard’s prior ideas. Plot this model using plot_beta().
  2. Bayard wants to update his prior, so he randomly selects 90 US LGBT adults and 30 of them are married to a same-sex partner. What is the posterior model for the proportion of LGBT U.S. adults who are married to a same-sex partner?
  3. What is the mean, mode, and variance of the posterior model?
  4. Does the posterior model more closely reflect the prior information or the data? Explain your reasoning.
Exercise 3.14 (Knowing someone who is transgender) In September 2016 a Pew Research survey found that 30% of U.S. adults are aware that they know someone who is transgender.35 It is now the 2020s, and Sylvia36 believes that this percentage has increased since the survey was taken, but she doesn’t know by how much. Sylvia’s best guess is that the current percent of people who know someone who is transgender is between 35% and 60%.
  1. Identify a Beta model that reasonably reflects Sylvia’s prior ideas. Plot this model using plot_beta().
  2. Sylvia wants to update her prior, so she randomly selects 200 US adults and 80 of them are aware that they know someone who is transgender. What is the posterior model for the proportion of U.S. adults who are aware that they know someone who is transgender? Plot this posterior using plot_beta().
  3. What is the mean, mode, and variance of the posterior model?
  4. Describe in words how the prior Beta model compares to the posterior Beta model.
Exercise 3.15 (Summarizing the Beta-Binomial: Take I) Below is output from the summarize_beta_binomial() function. Write the corresponding input code.
      model alpha beta   mean   mode      var
1     prior     2    3 0.4000 0.3333 0.040000
2 posterior    11   24 0.3143 0.3030 0.005986
Exercise 3.16 (Summarizing the Beta-Binomial: Take II) Below is output from the summarize_beta_binomial() function. Write the corresponding input code.
      model alpha beta   mean   mode       var
1     prior     1    2 0.3333 0.0000 0.0555556
2 posterior   100    3 0.9709 0.9802 0.0002719
Exercise 3.17 (Plotting the Beta-Binomial: Take I) Below is output from plot_beta_binomial() function.

  1. Describe the prior model in words.
  2. Does the data, as visualized by the likelihood, agree with the prior? Briefly explain your reasoning.
  3. Describe the posterior model in words.
  4. Does the posterior model more closely agree with the likelihood or the prior model? Briefly explain your reasoning.
  5. Provide the specific plot_beta_binomial() code you would use to produce a similar plot.
Exercise 3.18 (Plotting the Beta-Binomial: Take II) Below is output from plot_beta_binomial() function.

  1. Describe the prior model in words.
  2. Does the data, as visualized by the likelihood, agree with the prior? Briefly explain your reasoning.
  3. Describe the posterior model in words.
  4. Does the posterior model more closely agree with the likelihood or the prior model? Briefly explain your reasoning.
  5. Provide the specific plot_beta_binomial() code you would use to produce a similar plot.
Exercise 3.19 (More Beta-Binomial)
  1. Patrick has a Beta(3,3) prior for \(\pi\), the probability that someone in their town attended a protest in June 2020. Their sample data of 40 people in their town found that 30 people attended a protest. Summarize Patrick’s analysis using the summarize_beta_binomial() and plot_beta_binomial() functions.
  2. Harold has the same prior as Patrick, Beta(3,3), but lives in a different town. In their sample, 15 out of 20 people attended a protest. Summarize Harold’s analysis using the summarize_beta_binomial() and plot_beta_binomial() functions.
  3. How do Patrick and Harold’s posterior models compare? Briefly explain what causes these similarities and differences.

  1. We follow convention by using “pdf” to distinguish the continuous \(f(\pi)\) here from a discrete “pmf.”↩︎

  2. The scaled likelihood function is calculated by \(L(\pi|y) / \int_0^1 L(\pi|y)d\pi\).↩︎

  3. Answers: 1. b; 2. c; 3. Beta(5,5)↩︎

  4. As an example, \(L((\pi = 0.6) | (y = 30)) = {50 \choose 30} 0.6^{30} 0.4^{20} \approx 0.115\), which matches what we see in the figure.↩︎

  5. Answer: a. Beta(3,12); b. Beta(12,3); c. Beta(1,1) or, equivalently, Unif(0,1)↩︎

  6. Answers: 1. d; 2. a; 3. \(y^2\)↩︎

  7. http://popvssoda.com/↩︎

  8. https://news.gallup.com/poll/212702/lgbt-adults-married-sex-spouse.aspx?utm_source=alert&utm_medium=email&utm_content=morelink&utm_campaign=syndication↩︎

  9. Henry Louis Gates Jr. writes about civil rights pioneer Bayard Rustin here: https://www.pbs.org/wnet/african-americans-many-rivers-to-cross/history/100-amazing-facts/who-designed-the-march-on-washington/↩︎

  10. https://www.pewforum.org/2016/09/28/5-vast-majority-of-americans-know-someone-who-is-gay-fewer-know-someone-who-is-transgender/↩︎

  11. To learn about Sylvia Rivera, a gay and transgender rights activist: https://en.wikipedia.org/wiki/Sylvia_Rivera↩︎