<- c(-50, 1, 2, 3, 4, 5, 100, 101, 600)
my_numbers mean(my_numbers)
[1] 85.11111
R can be described as primarily (but not entirely) a “functional” programming language. Most of what you will typically do in R involves using functions, and that is the usual style of R coding. Functional programming can also be a good way to approach data analysis.
A function consists of the name of the function, followed by parentheses ()
, which may contain arguments (also called parameters). For example, here is a function call to the mean
function, which computes the mean of a numeric vector:
<- c(-50, 1, 2, 3, 4, 5, 100, 101, 600)
my_numbers mean(my_numbers)
[1] 85.11111
Functions may have many parameters, in which case we generally name the parameters when we call the function. For example, the mean
function has a second parameter called trim
, which specifies how much of the data to “trim” from each end before computing the mean. The default value is 0, meaning no trimming. But we can specify a different value if we want:
mean(my_numbers, trim = 0.2)
[1] 30.85714
Parameters are implicitly passed “in order” based on the way the function is defined, which you can see in the documentation (see the next section!). So if you don’t name the parameters, you have to pass them in the correct order. For example, this is equivalent to the previous call:
mean(my_numbers, 0.2)
[1] 30.85714
However, it’s usually better style to name the parameters, especially if there are many of them.
At the Console, you can type ?
followed by the name of the function (with no space in between) to bring up the help documentation for a function. For example, ?mean
will bring up the help page for the mean
function.
In R, functions are objects too! Just like data frames, vectors, lists, etc. So it should not surprise you that you can define a function in the same way. Here’s an example:
# Convert a temperature from Fahrenheit to Celsius
# temp_f: numeric
# returns: temp_f converted to Celsius
<- function(temp_f, round_digits = 2) {
f2c # compute temp_c
<- (temp_f - 32)*(5/9)
temp_c # round temp_c
<- round(temp_c, digits = round_digits)
temp_c return(temp_c)
}
Notice that we used the R function called function
, followed by a block of code in curly braces {}
, to specify the code that is run when a function is called. The return
statement is optional; if you leave it out, R will return the value of the last expression evaluated in the function, but it’s probably better style to be clear by using the return
statement.
Let’s now discuss the parameters that our function expects as inputs, inside the parentheses. In this case, we have two parameters: temp_f
and round_digits
.
We can see from the fact that temp_f
has no default value that it is a required parameter. This means that if you call the function without providing a value for temp_f
, you will get an error. Whereas the second parameter, round_digits
, has a default value of 2. This means that if you call the function without specifying a value for round_digits
, that’s fine, and round_digits
will automatically be set to 2. For example, these are all valid ways we can call our f2c
function:
f2c(100) # round_digits will be 2
[1] 37.78
f2c(100, round_digits = 4) # round_digits will be 4
[1] 37.7778
By using the names of the parameters, we can pass them in any order. For example:
f2c(round_digits = 3, temp_f = 100)
[1] 37.778
By “scope” we mean where a variable is defined and can be accessed. Variables defined* inside a function ONLY exist within that function, and cannot be accessed outside the function.
Remeber that within the code for the f2c
function, we define a variable called temp_c
. But if we try to access temp_c
outside the f2c
function, we get an error:
temp_c
Error: object 'temp_c' not found
There are situations in R when you may want to define a small function locally. In some languages, this is called a “lambda function” or an “anonymous function”. Notice that we never even give the function a name; it’s only available in its limited local context:
do_something(some_data, function = function(x) x*3)
One of the most powerful features of R is that many functions are “vectorized”. This means that they can efficiently operate on entire vectors (or matrices, or data frames) at once, without the need for an explicit loop. For example, the mean
function can take a vector of numbers and return their mean:
<- c(1, 9, 3, 2, 11, 4)
v mean(v)
[1] 5
And it can also take a matrix:
<- matrix(c(1:15), nrow = 5, ncol = 3)
m mean(m)
[1] 8
Many mathematical functions are vectorized, such as sin
, cos
, log
, etc. For example:
log(v, base = 10)
[1] 0.0000000 0.9542425 0.4771213 0.3010300 1.0413927 0.6020600
Many other commonly-used functions such as is.na()
are also vectorized:
<- c(1, 2, NA, 4, NA, 5)
x is.na(x)
[1] FALSE FALSE TRUE FALSE TRUE FALSE