Chapter 5 R Packages: causact,tidyverse, etc.

The beauty of the R-ecosystem is we do not have to build everything from scratch ourselves. We can leverage the work of others to use their datasets or their functions. These datasets and functions are distributed in packages - usually a collection of related functions and/or datasets. Just like R and RStudio, packages are made freely available.

If interested about how and why software is free, check out the wikipedia page on free and open-source software (FOSS): https://en.wikipedia.org/wiki/Free_and_open-source_software).

5.1 Leveraging A Simple Package

For example, let’s say we wanted R to make a sound. With some googling, you might discover the beep() function exists in the beepr package. Note, that if you type beep() <ENTER> into the console without installing the beepr package, you will get the following error message:

Installing packages is analogous to buying a toolbox of power tools. You only have to buy the toolbox once, then you can use any of its tools by taking the toolbox out.  Likewise, you only have to install a package once on your computer; after that, you will use the library() function to take it out whenever you want it. Figure 5.1: Installing packages is analogous to buying a toolbox of power tools. You only have to buy the toolbox once, then you can use any of its tools by taking the toolbox out. Likewise, you only have to install a package once on your computer; after that, you will use the library() function to take it out whenever you want it.

Error in beep() : could not find function “beep”

Go ahead and try it - please do not fear error messages.

Okay, so we need to get the package. Here is a code alternative to using RStudio’s user interface to get a package:

# only run this line once on your computer
install.packages("beepr")

At this point, if you type beep() <ENTER> into the console, you will still get the same error message. This might seem strange, but let me introduce an analogy that might help.

As depicted in Figure 5.1, installing packages is analogous to buying a toolbox filled with tools. Note that buying a toolbox is not the same as using the tools in the toolbox. Once you’ve bought the toolbox, to use a tool inside of it, you first retrieve the toolbox from your basement/shed/garage/etc. Similarly, in R, install.packages() gets you a toolbox of functions that you now own - you only need to do this once per computer. To use a function, retrieve your toolbox first using library(packageName); this makes the package’s functions available to use during your current session - the library() command will need to be rerun anytime you restart R.

Here is an example of this workflow where the we use two commands:

  1. library(beepr): Take out the beepr toolbox (which we acquired earlier)
  2. beep(): Make a sound (assuming your computer volume is audible)

The below code executes these two commands:

# run the library() command with every R session
# where you want the beep() function to be available
library(beepr)  # take out the toolbox
beep()  # use the tool you want

Now, if you are not at work, a physical library, or other quiet place, just have some fun trying these (note, you only use the library function for a specific package once per session.):

You might think this beepr is a strange, useless package. However, sometimes you will run code that takes a few seconds, minutes, or even days. It helps to play a noise to alert yourself when the script has finished.

beep(sound = "sword")
beep(sound = "fanfare")
beep(sound = "complete")

5.2 Getting The causact And tidyverse Packages

Two of the packages we will rely on throughout this book are the tidyverse package and the causact package. The tidyverse package is actually a collection of packages that includes packages that we will use like dplyr for data manipulation and ggplot2 for data visualization. The causact package will provide access to datasets that are used in this text, and more importantly, enable us to investigate our models of business processes, issues, and decisions.

The list of packages that get installed as part of the tidyverse include ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, forcats, readxl, and lubridate. See here for more info: https://www.tidyverse.org/packages/.

To get these packages, execute the following lines from within RStudio (put your cursor in the console to answer any prompts during installation):

install.packages("tidyverse")
install.packages("causact")

The first line installs the tidyverse collection of packages. The second line installs the causact package.

Occassionally, you will want to install the development version of causact because it will have a bug fix or a new feature that you need. To do so, first run install.packages(remotes). Then run remotes::install_github() to download a more up-to-date version of the package that is not available via the standard R package repository known as CRAN. For most use cases, the CRAN version of a package should be used first as it has been more throughly tested.

5.3 Digging Deeper

More information on R packages can be found at DataCamp’s “R Packages: A Beginner’s Guide”: https://www.datacamp.com/community/tutorials/r-packages-guide.