The previous two chapters covered Bayes rule and generative DAGs. This chapter has you install necessary software:
tensorflow
to computationally automate Bayes rule,causact
to visually depict pretty generative DAGs, andgreta
to apply Bayes rule to generative DAGs informed by data stored in data frames.Figure 15.1: The greta logo. See https://greta-stats.org/ for more information on greta.
The below instructions will help you set up your computer environment to get greta
, along with the causact
package, running smoothly. While Python is required, these instructions do not assume any previous installation of Python and only assume you have a recent version of R/RStudio (R
-version 4.1.X or higher) and a machine capable of running TensorFlow. If your machine does not meet the system requirements in the margin, proceed to last section of this chapter to install greta
using RStudio Cloud.
greta & TensorFlow System Requirements: * Ubuntu 16.04 or later (64-bit) * macOS 10.12.6 (Sierra) or later (64-bit) (no GPU support) * macOS - Intel chips only - M1 ARM-based chips not supported yet * Windows 7 or later (64-bit) (Python 3 only) * Raspbian 9.0 or later See RStudio cloud install instructions below if requirements not met.
Before we get to the detailed installation instructions, it is instructive to understand why a software stack is needed in the first place. In our case, we use causact
to visually translate business narratives to a statistical models. To perform inference on any model, causact
relies on greta
which makes scalable Bayesian inference available to R
users without learning a separate coding language (e.g. TensorFlow, Stan, etc.). Behind the scenes, however, greta
relies on the Python-based TensorFlow library for its numerical computation engine.
To get the TensorFlow software stack\(^{**}\) ** A software stack is a collection of programs, applications, components and tools that work together to get a result. working properly, we need to make sure all component parts are compatible.
Figure 15.2: Two software stack possibilities. The stack on the left is a stable one. The stack on the right is what you would get installing more recent versions of each component; i.e. a falling stack that does not work.
Figure 15.2 shows two potential software stacks. Both include four key components:
Python: TensorFlow is a Python library and hence, needs a Python implementation installed on its host system in order to run. We will use miniconda, as needed, to get this.
TensorFlow: This is the heart of being able to do fast numerical computing. Once miniconda is installed, we will configure a conda
environment to get the version of TensorFlow and other Python libraries required by greta
.
TensorFlow Probability: An additional Python library built on top of TensorFlow specializing in probabilistic inference at scale.
greta: A very slick interface between R
and TensorFlow. This interface gives us the power of TensorFlow without the complexity and cognitive overhead of learning another language.
** If you do not meet the minimum system requirements or decide to abandon the installation process detailed below, you can access greta capabilities using RStudio in the cloud. Simply navigate to https://rstudio.cloud/create and account, and use the cloud install instructions for greta
and causact
at the end of this chapter.
As I write this, only the stack on the left is servicable. The stack on the right represents more recent software versions - which you might get following the easiest default install instructions - but they do not play well with one another. Simply stated, these other versions will not work together. So before leveraging greta
we will have to do some system configuration. The next section is an R
script that does this configuration for you. If issues arise installing greta
, you can also seek help on https://forum.greta-stats.org/. Please note that you need a 64-bit computer with about 10GB of free hard drive space.\(^{**}\)
This script installs greta
and causact
on your local machine (see the “RStudio Cloud Install” section for cloud installations). The script allows you to complete the installation process without ever leaving RStudio. Try it by running each line one at a time and awaiting the system’s response before continuing.
## INSTALLATION SCRIPT TO GET GRETA, CAUSACT,
## and TENSORFLOW ALL WORKING TOGETHER HAPPILY
## NOTE: Run each line one at a time using CTRL+ENTER.
## Await completion of one line
## before running the next.
## If prompted to "Restart R", say YES.
#### STEP 0: Restart R in a Clean Session
#### use RStudio menu: SESSION -> RESTART R
#### STEP 1: INSTALL R PACKAGES
install.packages("greta")
install.packages("causact")
#### STEP 2: INSTALL PYTHON DEPENDENCIES IN FINDABLE SPOT
::install_greta_deps()
greta## if asked to install minconda, please type "Y"
## and hit <ENTER> in the Console
## this can take up to 10 minutes
#### STEP 3: TEST THE INSTALLATION - must restart r first
## **** USE MENU: SESSION -> RESTART R
library(greta) ## should work without error if you restarted R..
library(causact)
= dag_create() %>%
graph dag_node("Normal RV",
rhs =normal(0,10))
%>% dag_render() ## see oval
graph = graph %>% dag_greta() ## see "running..."
drawsDF %>% dagp_plot(densityPlot = TRUE) ## see plot
drawsDF #### CONGRATS IF IT WORKS.
If the above script produced a plot in the last line - CONGRATS!!
RStudio Cloud (https://rstudio.cloud/) allows anyone with access to an internet browser to use R
and RStudio
. After setting up your account, you can get a working environment with greta
,causact
, and the tidyverse
packages installed by following the below code. If asked to install Miniconda, respond yes by typing a y
in the console.
RStudio cloud is useful for those with chromebooks or computers that seem underpowered for modern analytics. If you have a laptop that can handle it, then I recommend sticking to using your locally-installed RStudio.
### SETUP AN RSTUDIO CLOUD ACCOUNT
### AT https://rstudio.cloud/, THEN USE
### THIS INSTALL SCRIPT FOR INSTALLING
### CAUSACT,GRETA,TENSORFLOW ON RSTUDIO CLOUD
## Get R packages
install.packages("remotes")
install.packages("reticulate")
## Install older version as v2.7 had breaking changes
::install_version("tensorflow", version = "2.6.0",
remotesrepos = "http://cran.us.r-project.org")
## INSTALL PYTHON TENSORFLOW ENIVRONMENT
## If prompted to install Miniconda
## enter Y in console and then hit <ENTER>
::install_tensorflow(
tensorflowversion = "1.14.0",
extra_packages =
c("tensorflow-probability==0.7.0",
"numpy==1.16",
"pyyaml", "requests",
"Pillow", "pip"))
## Get R packages
install.packages("greta")
install.packages("tidyverse")
install.packages("causact")
### IF NO ERRORS, THEN TRY BELOW TEST SCRIPT
## TEST SCRIPT
library(greta) ## should work now
library(causact)
library(tidyverse)
= dag_create() %>%
graph dag_node("Normal RV","x",
rhs =normal(0,10))
%>% dag_render() ## see oval - ignore warning on RStudio Cloud
graph = graph %>% dag_greta() ## observe "running X chains ..."
drawsDF %>% ggplot() + geom_density(aes(x=x), fill = "darkgreen") ##see plot
drawsDF ## if NO ERRORS (warnings are okay), then installation is a success
The install instructions for this section should only be run once in your cloud account environment.