The beauty of the R-ecosystem is we do not have to build everything from scratch ourselves. We can leverage the work of others to use their datasets or their functions. These datasets and functions are distributed in
packages - usually a collection of related functions and/or datasets. Just like R and RStudio, packages are made freely available.
If interested about how and why software is free, check out the wikipedia page on free and open-source software (FOSS): https://en.wikipedia.org/wiki/Free_and_open-source_software).
For example, let’s say we wanted R to make a sound. With some googling, you might discover the
beep() function exists in the
beepr package. Note, that if you type
<ENTER> into the console without installing the
beepr package, you will get the following error message:
Figure 5.1: Installing packages is analogous to buying a toolbox of power tools. You only have to buy the toolbox once, then you can use any of its tools by taking the toolbox out. Likewise, you only have to install a package once on your computer; after that, you will use the library() function to take it out whenever you want it.
Error in beep() : could not find function “beep”
Go ahead and try it - please do not fear error messages.
Okay, so we need to get the package. Here is a code alternative to using RStudio’s user interface to get a package:
At this point, if you type
<ENTER> into the console, you will still get the same error message. This might seem strange, but let me introduce an analogy that might help.
As depicted in Figure 5.1, installing packages is analogous to buying a toolbox filled with tools. Note that buying a toolbox is not the same as using the tools in the toolbox. Once you’ve bought the toolbox, to use a tool inside of it, you first retrieve the toolbox from your basement/shed/garage/etc. Similarly, in R,
install.packages() gets you a toolbox of functions that you now own - you only need to do this once per computer. To use a function, retrieve your toolbox first using
library(packageName); this makes the package’s functions available to use during your current session - the
library() command will need to be rerun anytime you restart R.
Here is an example of this workflow where the we use two commands:
library(beepr): Take out the
beeprtoolbox (which we acquired earlier)
beep(): Make a sound (assuming your computer volume is audible)
The below code executes these two commands:
Now, if you are not at work, a physical library, or other quiet place, just have some fun trying these (note, you only use the
library function for a specific package once per session.):
You might think this
beepr is a strange, useless package. However, sometimes you will run code that takes a few seconds, minutes, or even days. It helps to play a noise to alert yourself when the script has finished.
Two of the packages we will rely on throughout this book are the
tidyverse package and the
causact package. The
tidyverse package is actually a collection of packages that includes packages that we will use like
dplyr for data manipulation and
ggplot2 for data visualization. The
causact package will provide access to datasets that are used in this text, and more importantly, enable us to investigate our models of business processes, issues, and decisions.
The list of packages that get installed as part of the tidyverse include
ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, forcats, readxl, and
lubridate. See here for more info: https://www.tidyverse.org/packages/.
To get these packages, execute the following lines from within RStudio (put your cursor in the console to answer any prompts during installation):
The first line installs the
tidyverse collection of packages. The second line installs the
Occassionally, you will want to install the development version of
causact because it will have a bug fix or a new feature that you need. To do so, first run
install.packages(remotes). Then run
remotes::install_github(“flyaflya/causact”) to download a more up-to-date version of the package that is not available via the standard R package repository known as CRAN. For most use cases, the CRAN version of a package should be used first as it has been more throughly tested.
More information on R packages can be found at DataCamp’s “R Packages: A Beginner’s Guide”: https://www.datacamp.com/community/tutorials/r-packages-guide.