Functions and Packages Toolboxes (R as a garage full of tools)
SCRIPT 1.2 Use this to follow along on this page and for the Practice exercises below.
Overview
R is like a big garage filled with useful tools.
One of the best things about R is that it can be customized to accomplish a huge variety of kinds of tasks: perform all sorts of statistical analyses from simple to bleeding edge, produce professional graphs, format analyses into presentations, manuscripts and web pages, collaboration, GIS and mapping, and a lot more. The tools themselves are contained in toolboxes and in a given toolbox, the tools are related to each other, usually in a focus to the kind of tasks they are suited for.
The tools in R are functions, and the toolboxes are packages.
While R comes with a lot of packages, there are an enormous amount available for instant download at any time you want or need them (over 15,000 different packages at the moment). Making use of all these resources is usually a case of identifying a problem to solve, finding the package that can help you, and then learning how to use the tools. This page is all about introducing functions, packages and the R help system.
Contents
1.2.2 Using functions and getting help
1.2.4 Finding, downloading and using packages
1.2.1 Function tour
Think of functions in R as tools that do work for you.
Code for funtions is is simple once you get the idea, but you have to understand how they work to use them in the most powerful way. Also, to make the most of functions, you must get to know which ones perform common tasks, and how to use them. We will practice that in this section. We consider the USE of functions (for a given problem) as a separate issue from discovering a function, and here we focus on USE.
A generic might look like this: function_name()
. The function name is (obviously) function_name
and all functions must have the bracket notation ()
. There are some rules for function names and for naming R objects in general, but for now the most important thing to keep in mind is that details like capitalization are important, that is, R is case sensitive. Thus, function_name()
is not the same as Function_name()
or function_nmae()
(see what I did there?).
Functions are typically used by providing some information inside the brackets, usually data for the function to do work on or settings for the function. Function settings are assigned to function arguments and most functions have several arguments.
function_name(argument_1 = value_1, argument_2 = value_2, ...)
A general rule is that you INPUT information or data into function brackets that you want the function to do work and function OUTPUT is the work being done, sometimes including information output (like the results of a statistical test, or a plot).
Notice:
- Each argument has a unique name
- Argument values are assigned using the equals sign
=
, the assignment operator - Each argument is separated by a comma
,
- The
...
means there are additional arguments that can be used optionally (for now we can ignore those)
Function names
Finding functions by their names is often easy for very simple and common tasks. For example:
mean()
Calculates the arithmentic mean
log()
Calculates the log
sd()
Calculates the standard deviation
plot()
Draws plots
boxplot()
Draws boxplots
help()
Used to access help pages
You get the idea…
The most important thing here is that you would generally get the help page up as a reference to what arguments are required and how to customize your function use, that is, until you memorize the use and arguments for common functions.
1.2.2 Using functions and getting help
For now, let’s assume that you know:
- What tasks you want to do (maybe outlined with pseudocode), and
- What function(s) can perform those tasks.
Try this out in your own script:
## A workflow for using functions ####
## (make pseudocode of steps)
# Overall task: calculate the mean for a vector of numbers
# Step 1: Code the vector of data - c() function
# Step 2: Calculate the mean - mean() function
# Step 3: Plot the data - boxplot()
# Step 1: Code the vector of data - c() function
help(c) # We use this a lot - it "combines" numbers
c(2, 6, 7, 8.1, 5, 6)
# Step 2: Calculate the mean - mean() function
help(mean)
# Notice under Usage, the "x" argument
# Notice under Arguments, x is a numeric vector
mean(x = c(2, 6, 7, 8.1, 5, 6)) # Easy
# Step 3: Plot the data - boxplot()
help(boxplot) # x again!
boxplot(x = c(2, 6, 7, 8.1, 5, 6))
# Challenge 1: Add a title to you boxplot using the argument "main"
# Challenge 2: Add an axis label to the y-axis - can you find the argument?
This boxplot would look a lot better with a main title and a y-axis label…
1.2.3 R packages
There are a lot of R packages. These are “toolboxes” often built in the spirit of identifying a problem, literally making a tools that solves the problem, and then sharing the tool for other to use as well. In fact, all official R packages are “open source”, meaning that you may use them freely, but also you can improve them and add functionality. This section is about the basics of R packages that are additional to the base R installation.
Typically, you only download a package once you identify you need to use functions in it. There are are several ways to accomplish this. We are going to practice 2 different ways, one with R code that is simple and will work no matter how you use R, and one that uses menus in RStudio.
1.2.4 Finding, downloading and using packages
Finding packages happens a variety of ways in practice. A package may be recommended to you, you might be told to use a particular package for a task or assignment, or you may discover it on the web.
Installing and loading packages with code
There are 2 steps here - installing, then loading. Installing is very easy to do using the install.packages()
function. Loading a package making the functions in it available for use is done using the library()
package. Basic usage of these functions is:
# Step 1: install a package
help(install.packages) # just have a look
install.packages(pkgs = "package_name")
# The package is downloaded from a remote repository, often
# with additional packages that are required for use.
# Step 2: load a package
library("package_name")
# Challenge: Install and load the "ggplot2" package, and then use help() to look at the help page for the function ggplot(). What kind of R object is required by the "data" argument?
Installing and loading packages with the RStudio Packages tab
You can find the packages tab in RStudio in the lower left pane by default.
When you click on the Packages tab (A in the picture above), you can see a list of packages that are available to you (i.e., in RStudio desktop these have already been downloaded locally).
In order to load a package, you can find the package name in the list and click the radio button (B in the picture).
To install a package, you can click on the Install button (C in the image).
You should see the Installe Packages window, where you can enter the name of a package for installation, searching the official Comprehensive R Archive Network (Repository (CRAN)) by default:
1.2.5 Practice exercises
1 Explain in your own words what the freq argument in the hist() function does. It often helps to practice trail and error to understand what is happening with data. Try experimenting with this data vector with the hist() function to explore the freq argument:
c(1,2,4,3,5,6,7,8,6,5,5,5,3,4,5,7)
2 Tailor your code from the hist() example in problem 1 so that your histogram has a main title, axis labels, and set the col
argument to “blue”. We are just scratching the surface with plot customization - try to incorporate other arguments to make an attractive graph.
3 Use the mean() function on the following data vector
c(1,2,4,3,5,6,7,8,6,5,NA,5,3,4,5,7)
You will see an error message. The symbol “NA” has a special meaning in R, indicating a missing value. Use help() for the mean function and implement the na.rm argument to fix the problem. Show your code.
4 In your own words, what value is required for the “d” argument in the pwr.t.test() function in the {pwr} package? Show all code involved including any appropriate comment code required to answer this question. (hint: you will probably need to install the package, load it, and use ‘help()’ on the function name)
5 Every official R package has a webpage on the Comprehensive R Archive Network (CRAN) and there are often tutorials called “vignettes”. Google the CRAN page for the package {ggplot2} and find the vignette called “Aesthetic specifications”. Read the section right near the top called “Colour and fill”.
Follow the instructions to list all of the built-in colours in R and list them in the console. Ed’s personal favourite is “goldenrod”, index number [147]. Can you find the index number for “tomato2”?
6 Write a plausible practice question involving the use of the help menu for an R function.