- 1 Welcome
- 2 Becoming a Data-Driven Business Analyst
- 3 The Computing Environment
- 4 R: Basic Usage
- 5 R Packages: causact, tidyverse, etc.
- 6 dplyr: Manipulating Data Frames
- 7 dplyr: Data Manipulation For Insight
- 8 ggplot2: Data Visualization Using The Grammar of Graphics
- 9 ggplot2: The Four Stages of Visualization
- 10 Representing Uncertainty
- 11 Joint Distributions Tell You Everything
- 12 Graphical Models Tell Joint Distribution Stories
- 13 Bayesian Inference On Graphical Models
- 14 Generative DAGs As Prior Joint Distributions
- 15 Install Tensorflow, greta, and causact
- 16 greta: Bayesian Updating And Probabilistic Statements About Posteriors
- 17 causact: Quick Inference With Generative DAGs
- 18 The beta Distribution
- 19 Parameter Estimation
- 20 Posterior Predictive Checks
- 21 Decision Making
- 22 A Simple Linear Model
- 23 Linear Predictors and Inverse Link Functions
- 24 Multi-Level Modelling
- 25 Compelling Decisions and Actions Under Uncertainty
- 26 Your Journey Continues

The beauty of the R-ecosystem is we do not have to build everything from scratch ourselves. We can leverage the work of others to use their datasets or their functions. These datasets and functions are distributed in `packages`

- usually a collection of related functions and/or datasets. Just like R and RStudio, packages are made freely available.

If interested about how and why software is free, check out the wikipedia page on free and open-source software (FOSS): https://en.wikipedia.org/wiki/Free_and_open-source_software).

For example, let’s say we wanted R to make a sound. With some googling, you might discover the `beep()`

function exists in the `beepr`

package. Note, that if you type `beep()`

`<ENTER>`

into the console without installing the `beepr`

package, you will get the following error message:

Figure 5.1: Installing packages is analogous to buying a toolbox of power tools. You only have to buy the toolbox once, then you can use any of its tools by taking the toolbox out. Likewise, you only have to install a package once on your computer; after that, you will use the library() function to take it out whenever you want it.

Error in beep() : could not find function “beep”

Go ahead and try it - please do not fear error messages.

Okay, so we need to get the package. Here is a code alternative to using RStudio’s user interface to get a package:

At this point, if you type `beep()`

`<ENTER>`

into the console, you will still get the same error message. This might seem strange, but let me introduce an analogy that might help.

As depicted in Figure 5.1, installing packages is analogous to buying a toolbox filled with tools. Note that *buying* a toolbox is not the same as *using* the tools in the toolbox. Once you’ve bought the toolbox, to use a tool inside of it, you first retrieve the toolbox from your basement/shed/garage/etc. Similarly, in R, `install.packages()`

gets you a toolbox of functions that you now own - you only need to do this once per computer. To use a function, retrieve your toolbox first using `library(packageName)`

; this makes the package’s functions available to use during your current session - the `library()`

command will need to be rerun anytime you restart R.

Here is an example of this workflow where the we use two commands:

`library(beepr)`

: Take out the`beepr`

toolbox (which we acquired earlier)`beep()`

: Make a sound (assuming your computer volume is audible)

The below code executes these two commands:

```
# run the library() command with every R session
# where ythe beep() function will be used
library(beepr) # take out the toolbox
beep() # use the tool you want
```

Now, if you are not at work, a physical library, or other quiet place, just have some fun trying these (note, you only use the `library`

function for a specific package once per session.):

You might think this `beepr`

is a strange, useless package. However, sometimes you will run code that takes a few seconds, minutes, or even days. It helps to play a noise to alert yourself when the script has finished.

Two of the packages we will rely on throughout this book are the `tidyverse`

package and the `causact`

package. The `tidyverse`

package is actually a collection of packages that includes packages that we will use like `dplyr`

for data manipulation and `ggplot2`

for data visualization. The `causact`

package will provide access to datasets that are used in this text, and more importantly, enable us to investigate our models of business processes, issues, and decisions.

The list of packages that get installed as part of the tidyverse include `ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, forcats, readxl,`

and `lubridate`

. See here for more info: https://www.tidyverse.org/packages/.

To get these packages, execute the following lines from within RStudio (put your cursor in the console to answer any prompts during installation):

The first line installs the `tidyverse`

collection of packages. The second line installs the `causact`

package.

Occassionally, you will want to install the development version of `causact`

because it will have a bug fix or a new feature that you need. To do so, first run `install.packages(remotes)`

and `library(remotes)`

. Then run `install_github(“flyaflya/causact”)`

to download a more up-to-date version of the package that is not available via the standard R package repository known as CRAN. For most use cases, the CRAN version of a package should be used first as it has been more throughly tested.

More information on R packages can be found at DataCamp’s “R Packages: A Beginner’s Guide”: https://www.datacamp.com/community/tutorials/r-packages-guide.