- 1 Welcome
- 2 Becoming a Data-Driven Business Analyst
- 3 The Computing Environment
- 4 R: Basic Usage
- 5 R Packages: causact, tidyverse, etc.
- 6 dplyr: Manipulating Data Frames
- 7 dplyr: Data Manipulation For Insight
- 8 ggplot2: Data Visualization Using The Grammar of Graphics
- 9 ggplot2: The Four Stages of Visualization
- 10 Representing Uncertainty
- 11 Joint Distributions Tell You Everything
- 12 Graphical Models Tell Joint Distribution Stories
- 13 Bayesian Inference On Graphical Models
- 14 Generative DAGs As Prior Joint Distributions
- 15 Install Tensorflow, greta, and causact
- 16 greta: Bayesian Updating And Probabilistic Statements About Posteriors
- 17 causact: Quick Inference With Generative DAGs
- 18 The beta Distribution
- 19 Parameter Estimation
- 20 Posterior Predictive Checks
- 21 Decision Making
- 22 A Simple Linear Model
- 23 Linear Predictors and Inverse Link Functions
- 24 Multi-Level Modelling
- 25 Compelling Decisions and Actions Under Uncertainty
- 26 Your Journey Continues

In the *Bayesian Inference* chapter, we calculated (by hand) a posterior distribution using a single data point on whether the one tested Chili’s store increased sales. In this original Chili’s story, we limited ourselves to just two possible models of success:

- The Pessimist Model: \(\theta = 20\)%
- The Optimist Model: \(\theta = 70\)%

We subsequently represented the Chili’s narrative using the generative DAG of Figure 16.1, but we never did get a posterior distribution for this model. In this chapter, we use the computer to remedy this issue.

Typically, you would go to https://greta-stats.org/ for more information about installing `greta`

along with other necessary software like `TensorFlow`

and `Diagrammer`

. Currently though, as of January 2020, there are some issues with that install process. Refer to the install greta chapter at http://causact.com/ for the most up-to-date version of this chapter.

To manually use Bayes rule to calculate the posterior distribution for the infinite possible values of \(\theta\) is impossible. That would simply take too long. Instead, we will rely on the power of the R eco-system to deliver us a representative sample from the joint posterior distribution. In fact, the software we will use allows us tremendous flexibility; seemingly accomodating any generative DAG consisting of the most commonly used probability distributions. In the next chapter, we will explore commonly used probability distributions and their roles as priors in simple models. In other words, we as analysts supply a generative DAG (our model and prior) and some data. Then, the computer spits out the posterior - using Bayes rule behind the scenes to properly update our prior assumptions to place more plausibility on model parameters that are consistent with the observed data.

To get the computer to do our work, The greta package was developed by Nick Golding who maintains the https://greta-stats.org/ website where you can find further documentation. we need to translate our data and generative DAG into a computational model (i.e. a computer-world model that calculates posterior distributions). The first way will learn to do this is using the intuitive and fantastic `greta`

package within R. Here are the six steps needed to have `greta`

yield a posterior distribution:

For **Step 2**, the following probability distribution names are recognized by `greta`

: `uniform, normal, lognormal, bernoulli, binomial, beta_binomial, negative_binomial, hypergeometric, poisson, gamma, inverse_gamma, weibull,exponential, pareto, student, laplace, beta, cauchy, chi_squared, logistic, f, multivariate_normal, multinomial, categorical, dirichlet, dirichlet_multinomial, wishart, lkj_correlation.`

Many of these will be introduced in subsequent chapters and all others can be found on Wikipedia. Do not let the fancy names scare you, they are just ways of compactly representing uncertainty.

**Step 1 - Tell greta about your observed random variables (i.e. data) using as_data()**

```
library(greta)
## assume first two stores are a success and
## the third store is not (i.e. make up some data)
storeData = c(1,1,0)
## tell greta about your observed data X
## recall that X represents sales increase data)
x = as_data(storeData)
## you can safely ignore the plate for observed data
## R automatically knows how many observed realizations
## are in the data because of the storeData vector length
```

**Step 2 - Tell greta about your prior for unobserved random variables**

```
## tell greta about what values of theta seem
## plausible prior to observing the data
theta = uniform(0,1)
## note: greta requires you use the assigment operator (=)
## the ~ of the DAG is inferred because uniform is the name
## of a probability distribution
```

**Step 3 - Tell greta how observed RV’s from step 1 are generated using unobserved RV’s from step 2**

```
## use the distribution() function to say how the distribution
## of an observed random variable might depend on an unobserved
## random variable. This represents the likelihood in Bayesian inference.
distribution(x) = bernoulli(theta)
```

**Step 4 - Define an R object that stores a complete greta model**

```
# use the model() function to tell greta which random variables
# you want to analyze in the posterior distribution
fit = model(theta)
# by convention, the object name fit
# is often used to store model definitions
```

**Step 5 - Verify the greta model looks okay**

where the following legend is used to understand `greta`

’s graphical model.

Take a moment to notice that `greta`

’s graphical model (Figure 16.3) is just a restating of the generative DAG in Figure 16.1. While I personally love `greta`

’s graphical model, the generative DAG that we have learned so far (e.g. Figure 16.1) is better for helping business stakeholders. The `greta`

graphical model is simply too confusing for those not trained to read them; for us though, it is a pretty awesome check that our computational model matches our generative DAG!

**Step 6 - Create a representative sample of the joint posterior distribution**

```
## this function might take some time
## it runs for about 20 seconds on my machine
## it stores the posterior distribution
## in an object called draws
draws = mcmc(fit)
```

This completes the updating from prior to posterior. Assuming all ran without error, we can now query our representative sample of the posterior distribution. This sample is often referred to simply as the *posterior distribution*.

When investigating the posterior distribution, please note that your representative sample may differ slightly from the sample shown here. Due to some inherent randomness in the way the posterior distribution is generated slight differences in our results are expected.

The posterior distribution is buried in the object we named `draws`

. Think of a *draw* as an individual sample or realization of the posterior distribution; hence, *draws* is a collection of realizations that collectively serve as our representative sample of the posterior distribution. At this point, please realize the posterior distribution is only expressed as a representative sample (revisit the Representing Uncertainty chapter for a comparison of the different ways of expressing uncertainty) - that is the best we can do for all but a few special cases. Also note that unlike *draw* and *draws* which are singular and plural, respectively - the word *sample* can refer to either the collection of draws or just one single draw of the joint posterior distribution.

To see a representative sample of the posterior distribution, we access the R object, `draws`

, created in step 6 above. Ignore the details of why the `draws`

object is a `list`

instead of a `data frame`

and simply use the `as.matrix()`

and `as_tibble()`

functions from the `tidyverse`

package to return a data frame:

```
library(tidyverse)
# extract samples as data frame
drawsDF = draws %>% as.matrix() %>% as_tibble()
drawsDF
```

```
## # A tibble: 4,000 x 1
## theta
## <dbl>
## 1 0.479
## 2 0.601
## 3 0.827
## 4 0.533
## 5 0.728
## 6 0.457
## 7 0.288
## 8 0.612
## 9 0.708
## 10 0.763
## # ... with 3,990 more rows
```

The `theta`

column of `drawsDF`

contains our representative sample of the posterior distribution. In this case, that representative sample includes 4,000 samples of `theta`

. Recall from Figure 16.1 that our initial generative model assumed `theta`

, the probability of success, was uniformly distributed between 0 and 1. Now, after observing 2 successes and 1 failure (see Step 1 above), we should expect our plausibility to favor \(\theta\) values closer to \(2/3\). Let’s visualize the plausibility using a histogram.

To get a refresher on histograms and density curves, check out this constructive video from Khan Academy: https://youtu.be/PUvUQMQ7xQk.

Using `ggplot`

, we can visualize how plausibility was reallocated from our uniform prior of \(\theta\) in light of the observed data; in other words, we can see the posterior distribution for \(\theta\) after observing 2 out of 3 stores increase sales:

```
## make histogram with bins spaced every 0.025
library(tidyverse)
ggplot(drawsDF, aes(x = theta)) +
geom_histogram(breaks = seq(0,1,0.025))
```

Notice how low values of \(\theta\) are represented less frequently than before and values from \(\theta = 50\%\) to \(\theta = 75\%\) seem to be represented more frequently. Given our training in the tidyverse, we can create a more effective visual; a comparison of prior and posterior (remember to google any functions/packages/etc. you are not familiar with):

```
library(tidyverse)
# named vector mapping colors to labels
colors = c("Uniform Prior" = "aliceblue",
"Posterior" = "cadetblue")
# use geom_density to create smoothed histogram
ggplot(drawsDF, aes(x = theta)) +
geom_area(stat = "function", fun = dunif,
alpha = 0.5, color = "black",
aes(fill = "Uniform Prior"))+
geom_density(aes(fill = "Posterior"),
alpha = 0.5) +
scale_fill_manual(values = colors) +
labs(y = "Probability Density",
fill = "Distribution Type")
```

Some takeaways from this graph regarding plausibility reallocation:

- Low values of \(\theta (\approx < 30\%)\) are now deemed less plausible; two out of three successes is simply inconsistent with low values of \(\theta\).
- Very high values of \(\theta (\approx > 90\%)\) are less plausbile; one failure in three tries is not likely to occur if \(\theta\) were to be super-high.
- Our best guess at \(\theta\) went from 0.5 (i.e. the mean of theta when uniformly distributed) to
`mean(drawsDF$theta) =`

0.598.

- We are still very unsure of \(\theta\) after only 3 stores. We can see this by looking at quantiles of the posterior distribution suggesting a 90% percentile interval from 0.241 to 0.899:

See here for more details about interpreting a quantile function: https://en.wikipedia.org/wiki/Quantile_function.

```
## 5% 50% 95%
## 0.2408417 0.6158816 0.8993699
```

This continued large band of uncertainty is a good thing. We only have three data points, we should not be very confident in \(\theta\). If we want a tighter interval of uncertainty, then we simply need to get more data. Exploring more data, different models, and other priors are all coming in subsequent chapters.

With a representative sample of the posterior joint distribution available to us, namely `drawsDF`

, we can expand on the strategies of the “Joint Distributions Tell Us Everything” chapter to make probabilistic statements from representative samples. For example, we might be curious to know \(P(\theta > 50\%)\). Using indicator functions, we can get our estimate\(^{**}\): ** Representative samples from posterior distributions will vary each time you run the code. Hence, your answer might be slightly different than the one shown here.

```
## estimate that theta > 50 percent
drawsDF %>%
mutate(indicatorFlag =
ifelse(theta > 0.5, TRUE, FALSE)) %>%
summarize(pctOfThetaVals = mean(indicatorFlag))
```

```
## # A tibble: 1 x 1
## pctOfThetaVals
## <dbl>
## 1 0.681
```

We would then state that “the probability that \(\theta\) is greater than \(50\%\) is approximately 68%”.

Similarly, we can answer more complicated queries. For example, what is the \(P(40\% < \theta < 60\%)\):

```
## estimate that 40% < theta < 60%
drawsDF %>%
mutate(indicatorFlag =
ifelse(theta > 0.4 & theta < 0.6,
TRUE, FALSE)) %>%
summarize(pctOfThetaVals = mean(indicatorFlag))
```

```
## # A tibble: 1 x 1
## pctOfThetaVals
## <dbl>
## 1 0.292
```

**The power of this workflow cannot be overstated** and you will use it frequently for making probabilistic statements. Statements that will come in handy when it is time to use data to inform decision making under uncertainty.

Make sure to check out the website: https://greta-stats.org/articles/get_started.html for more detailed information on using `greta`

.

If you are interested in understanding how the representative sample is computed, see Michael Betancourt’s excelllent videos on Scaleable Bayesian Inference with Hamiltonian Monte Carlo (HMC). HMC is the computational technique used by greta/tensorflow when computing the representative sample from the posterior distribution. The videos are available at https://betanalpha.github.io/speaking/.