Chapter 15 Install Tensorflow and greta

Figure 15.1: The greta logo. See https://greta-stats.org/ for more information on greta.

The greta logo.  See https://greta-stats.org/ for more information on greta.

As of May 2020, there are version mismatches between greta 0.3.1 available via CRAN (i.e. install.packages("greta")) and more recent versions of TensorFlow (i.e. TensorFlow >= 1.15); the installation of greta is not as simple as it once was. The below instructions will help you set up your computer environment to get greta running. While Python is required, these instructions do not assume any previous installation of Python and only assume you have a recent version of R/RStudio (R-version 3.4.X or higher) and a machine capable of running TensorFlow (scroll down to see sidenote below if you need to run greta in the cloud).

TensorFlow System Requirements: * Ubuntu 16.04 or later (64-bit) * macOS 10.12.6 (Sierra) or later (64-bit) (no GPU support) * Windows 7 or later (64-bit) (Python 3 only) * Raspbian 9.0 or later

15.1 Software Stack for Using greta

greta makes Bayesian inference scalable, yet elegantly simple. For scalability, it uses Tensorflow as its numerical computation engine. For simplicity, the greta models are written in R, sparing us from learning an additional language. While this enables scaleable Bayesian data analysis inside the R-ecosystem, we do need to step outside of R/RStudio for installation of the Google TensorFlow software stack\(^{**}\). ** A software stack is a collection of programs, applications, components and tools that work together to get a result.

Figure 15.2: Two software stack possibilities. The stack on the left is a stable one. The stack on the right is what you would get installing more recent versions of each component; i.e. a falling stack that does not work.

Two software stack possibilities.  The stack on the left is a stable one.  The stack on the right is what you would get installing more recent versions of each component; i.e. a falling stack that does not work.

Figure 15.2 shows two potential software stacks. Both include four key components:

  1. Python: TensorFlow is a Python library and hence, needs a Python implementation installed on its host system in order to run. We will use miniconda to get this.

  2. TensorFlow: This is the heart of being able to do fast numerical computing. Once miniconda is innstalled, we will use conda commands from the "Anaconda Prompt` to get TensorFlow.

  3. TensorFlow Probability: An additional library built on top of TensorFlow specializing in probabilistic inference at scale. We will us ea pip command to get this.

  4. greta: A very slick interface between R and TensorFlow. This interface gives us the power of TensorFlow without the complexity overhead of learning another language. We will install greta using R commands.

As I write this, only the stack on the left is servicable. The stack on the right represents more recent software versions - which you might get following the easiest default install instructions - but they do not play well with one another. Simply stated, these other versions will not work together. So before leveraging greta we will have to do some system configuration and installation to get everything working. I anticipate that this will be made easier in the near future, but for now, this is a robust install process that should work regardless of your system. That being said, you do need a 64-bit computer with about 10GB of free hard drive space.\(^{**}\)

** If you do not meet the minimum system requirements or decide to abandon the installation process detailed below, you can access greta capabilities using RStudio in the cloud. Simply navigate to https://rstudio.cloud/project/685122 and use the browser-based version of RStudio with greta and causact already installed - i.e. you can start with library(greta) and have no need to install anything beyond that.

15.2 The Super Short Installation Instructions

For more detailed instructions with some explanation, skip this section. If you want the super short-version, here it is:

  1. Install 64-bit Miniconda from https://conda.io/miniconda.html to get Python if you do not have an Anaconda installation already.
  2. Open Anaconda prompt (Terminal on Mac) and execute the following commands:
    1. conda create -n r-tensorflow python=3.7 tensorflow=1.14 pyyaml requests Pillow pip numpy=1.16
    2. conda activate r-tensorflow
    3. pip install tensorflow-probability==0.7
  3. Open R/RStudio and execute the following commands:
    1. install.packages("greta")
    2. library(greta)

If all of the above works without error, then you are ready for greta.

15.3 The More Detailed Installation Instructions

Explanations for all of the steps above and additional details are given in the below instructions.

15.3.1 Installing Miniconda

Assuming you do not have conda capabilities already installed on your system, the best way to get an easily manageable Python distribution on your system is to use Miniconda. Figure 15.3 shows the installation page available at https://conda.io/miniconda.html. As I write this, the most recent version of Python is 3.8, but Tensorflow 1.14 does not play nice with Python 3.8. So even though it might look like you are downloading an incompatible version of Python, switching Python versions is easily done post-installation. Navigate to that page and download the Python 3.X 64-bit installer for your operating system. While it probably seems like you are downloading Python 3.7 (or a more recent release), we can control the Python version that TensorFlow will rely on later in this process.

Figure 15.3: The Miniconda download site.

The Miniconda download site.

Once downloaded, Windows users can double-click the file and follow the prompts - accept all defaults and install Miniconda. After successful install, you should have the Anaconda Prompt accessible via the Start Menu (Windows OS):

Figure 15.4: The Anaconda Prompt application will open a command prompt that can access the conda package manager for Python (see https://docs.conda.io/en/latest/miniconda.html for more information).

The Anaconda Prompt application will open a command prompt that can access the conda package manager for Python (see https://docs.conda.io/en/latest/miniconda.html for more information).

MAC users should complete the Miniconda installation with these steps: Alternative MAC installation instructions may be found here https://docs.conda.io/en/latest/miniconda.html and a video of yet another alternative install process is here - stop at 2:32 into the video after successfully executing conda help - https://youtu.be/bbIG5d3bOmk . Try the written steps of the chapter first before venturing into alternative installs.

  1. Open a Terminal window. If you don’t know how to do this click `* Applications -> Utilities -> Terminal*`
  2. Within Terminal “change directories into the folder where your Miniconda download was placed and execute the followingThe name of the Miniconda may be different, but you just need to run chmod in order to make the script executable. Look for the name in your downloads folder. Alternatively, right click on your script and chose Properties -> Permissions -> Allow executing file as program, leaves you with the same result as the chmod command in terminal.:
  1. cd ~/Downloads
  2. chmod +x Miniconda3-latest-MacOSX-x86_64.sh
  3. ./Miniconda3-latest-MacOSX-x86_64.sh
  1. Once the installation has begun, accept the license terms and the default installation location. Say yes when prompted whether or not the installer should prepend the Miniconda install location to your PATH.
  2. After installing Miniconda, close your current terminal and open another in order to activate the installation. You should have access to a new command within the terminal, namely the conda command.

If the steps are run successfully, verify the below command can be run without error:

conda list

Regardless of operating system, once accessed, the Anaconda command screen is awaiting your input:

Figure 15.5: The Anaconda Prompt application will open a command prompt that can access the conda package manager for Python (see https://conda.io/miniconda.html for more information).

The Anaconda Prompt application will open a command prompt that can access the conda package manager for Python (see https://conda.io/miniconda.html for more information).

15.3.2 Create r-tensorflow Environment Using Conda

The first thing you will do is create a conda environment ** pip is a supplemental Python package manager that we will use in addition to Conda because the version of tensorflow-probability we need is not available via Conda - a repository for a software stack where versions of various software packages are carefully maintained and coordinated (Conda does this coordination for you). Execute the following line at the Anaconda (Terminal) command line to create an environment named r-tensorflow that includes Python 3.7, tensorflow 1.14, a local version of pip, and some other required Python packages \(^{**}\):

conda create -n r-tensorflow python=3.7 tensorflow=1.14 pyyaml requests Pillow pip numpy=1.16

You will most likely be asked whether you wish to proceed, you should enter y and then press ENTER. This step creates the r-tensorflow environment where we will maintain our software stack.

15.3.3 Activating the r-tensorflow Environment

To make sure our subsequent package installations are done in the r-tensorflow environment (as opposed to some default global environment), we activate it:

conda activate r-tensorflow

After executing the appropriate command from above, the command prompt will now have the environment name (r-tensorflow) in parantheses:

Figure 15.6: Anaconda command prompt with the r-tensorflow environment activated.

Anaconda command prompt with the r-tensorflow environment activated.

15.3.4 Install tensorflow-probability

tensorflow-probability is built on top of TensorFlow. To get the right version of tensorflow-probability, we use the pip package manager because it is not available through Conda at this time. Please note that Conda is the better package manager for installing tensorflow - it installs a faster implementation (see https://towardsdatascience.com/stop-installing-tensorflow-using-pip-for-performance-sake-5854f9d9eb0c). Hence, using pip we install tensorflow-probability only using the following line at the command prompt:

pip install tensorflow-probability==0.7

Again, you will most likely be asked whether you wish to proceed, you should enter y and then press ENTER (note: do this whenever prompted - I will not be explicit about this below). Verify that tensorflow-probability 0.7 is installed in the r-tensorflow environment by typing conda list and then pressing ENTER.

Figure 15.7: Anaconda command prompt with the r-tensorflow environment activated.

Anaconda command prompt with the r-tensorflow environment activated.

You can close the Anaconda command prompt - we will not use it again as our TensorFlow environment is setup and complete. Next, we open RStudio to install greta.

15.3.5 Install greta

From within RStudio, execute the following lines:

install.packages("tidyverse")
install.packages("remotes")
remotes::install_github("flyaflya/causact") # update causact package used in book
install.packages("greta")
library(causact)
library(greta)

The first three lines install some prequisites for the causact and greta packages. The fourth line installs the causact package and the 5\(^{th}\) line installs the greta package. All of these lines do not need to be run again. The last two lines make package functionality available within your R session and as long as you do NOT see any error messages related to tensorflow or tensorflow-probability you are ready for Bayesian inference made possible by greta and causact. Messages regarding masked objects can be safely ignored.

15.4 Testing the Install

Just to make sure everything works, run the following within R:

library(greta)
library(causact)
library(tidyverse)
## simulate Bernoulli data with 72% prob of success
y <- rbern(n = 100, prob = 0.72) #data

## create prior for theta - prob of succes
theta = uniform(0,1)  #prior

## specify likelihood of data
distribution(y) <- bernoulli(prob = theta)  #likelihood

## create model - list parameters of interest
m <- model(theta)

## get representative sample of posterior distribution
draws <- mcmc(m)
## below line requires bayesplot package if you
## get an error running the below then install.packages("bayesplot")
draws %>% as.matrix() %>% as_tibble() %>% dagp_plot()
## if you see a plot of theta that looks like
## a hill or mountain, you are good to go.

If it runs, congratulations - you can now harness the power of greta and causact!!

15.5 Troubleshooting Tips

15.5.1 Finding The Right Python Environment

Sometimes RStudio can’t seem to find your r-tensorflow environment. You will get messages indicating tensorflow is not installed or that it is finding the wrong version of tensorflow or the missing/wrong version of tensorflow-probability. Assuming your conda list command from above showed all the right versions, the problem becomes RStudio is not looking at the r-tensorflow environment to run Python, TensorFlow, and all the other good stuff we worked so hard to install. To resolve this issue, at least temporarily, restart you R session (Session -> Restart R) and run the following prior to running library(greta):

reticulate::use_condaenv("r-tensorflow")

If you can then load greta without complaints about tensorflow, tensorflow-probability, and Python, you have solved your issue temporarily. You can use reticulate::use_condaenv("r-tensorflow") at the top of every script to get things working right. However, everytime you restart R, it will forget that this is the Python environment you want to use.

Given that the addition of this one line is often painful and easy to forget, we can set things up so that this line is run by default everytime you start R:

  1. Open .Rprofile file from within RStudio: file.edit(file.path("~", ".Rprofile"))
  2. Add this command to the .Rprofile file: reticulate::use_condaenv("r-tensorflow")
  3. Save and close the .Rprofile file.

Everytime you restart R, this .Rprofile file is run, including the added command, and it ensures your using the right Python environment.

15.5.2 Finding The Right Python Environment In Even Tougher Cases

Sometimes, you are forced to be explicit about finding the path to your “r-tensorflow” environment. Find the path by running this command at the Anaconda prompt (or Terminal for MAC users):

conda env list

You should discover your r-tensorflow environment is listed and its path is shown:

Figure 15.8: List of all conda environments and their paths.

List of all conda environments and their paths.

Take ** backslashes inside of a text or character sting are interpreted as escape characters - they are used to signal something special that is not simple text. the file path you see which ends with r-tensorflow (DON’T USE MINE SHOWN BELOW FOR DEMONSTRATION ONLY), and run the following command within RStudio while replacing any backslashes in the path with forward slashes\(^{**}\):

Sys.setenv(RETICULATE_PYTHON = "C:/Users/ajf/AppData/Local/Continuum/miniconda3/envs/r-tensorflow")

Note: Sometimes the actual Python executable is in …/envs/r-tensorflow/bin as opposed to just …/envs/r-tensorflow. Confirm on your system that the folder you use here does indeed contain a Python executable file.

This will temporarily tell R to use the python version and accompanying packages of the r-tensorflow environment. If it works, you can have this line run everytime your restart R by the following steps:

  1. Open .Rprofile file from within RStudio: file.edit(file.path("~", ".Rprofile"))
  2. Add this command to the file: Sys.setenv(RETICULATE_PYTHON = "<PATH FROM ABOVE THAT WORKED>") (replacing the path text as appropriate for your system)
  3. Save and close the .Rprofile file.

Restart your R session one more time (Session -> Restart R) and test the install. More hints and help can be found using the greta discussion forum at https://forum.greta-stats.org/ .