R Basics, Part 1: Data types and Functions
Overview
Teaching: 45 min
Exercises: 20 minQuestions
How are different kinds of data represented in R?
What is a variable in R?
How do I test assertions about data?
Objectives
Use and understand scalar data types
Use and understand vectors and lists
Define a variable
Assign data to a variable
Manage a workspace in an interactive R session
Use mathematical and comparison operators
Call functions
Manage packages
Variables in R
We can use variables to store data that we can then reference. A variable has a name, which is like a label, and a value.
Let’s look at an R expression that assigns a value to a variable:
height_cm <- 140
This statement has three parts:
- An assignment operator
<-
which assigns the value on the right to the variable on the left. - A variable name (
height_cm
) - A value (
140
)
Let’s take a look at what happened when you executed the statement height_cm <- 140
.
In your Environment pane (usually in the upper right area of R Studio), you should now see a “Values” section. This area is like a table of variable names and their values. You should see height_cm
listed with a value of 140
.
If you now type the just name of the variable, height_cm
at the console, R will evaluate the variable and print its current value. Try it.
Exercise 1
- Create a
height_inches
variable and set it to the value ofheight_cm
divided by 2.54 (1 inch = 2.54 cm).- Then, change the value of
height_cm
. Didheight_inches
recompute? Why or why not?Solution
height_inches <- height_cm / 2.54
height_inches
does not recompute. It was set to the *value* ofheight_cm
divided by 2.54, which is a number. It’s not actually a formula or function that’s tied to theheight_cm
variable. We’ll get to functions later!
Variable names
Different programming languages have slightly different rules around what characters may be used in variable names. In R, both numbers and letters can be used, as well as a dot (.
) and an underscore (_
).
Exercise
Can you determine whether a variable name may start with a number? Can a variable name start with a period or an underscore?
Why do you think R allows an underscore (
_
) but not a hyphen (-
) in variable names?Can you determine whether variable names are case-sensitive?
Solution
- something
- Something else
- And something else
Variable Types
In our example above, the variables we used stored numeric values. In real data, though, we have values that might be text, that might be true/false, might be empty, etc.
Let’s try creating variables with values that are other types:
study_id <- 'AB123X'
exposure <- TRUE
The idea here is that a variable not only stores data, but stores it as a particular type.
We can “ask” a variable what its type is, in R, using the typeof()
function, like this:
typeof(height_cm)
[1] "double"
R tells us that height_cm
is a “double”. What is a “double”? “double” refers to a double-precision floating-point number.
If we check the type of study_id
, we’ll see that it’s a character
type. character
is the term in R that describes text data.
If we check the type of exposure
, we’ll see that it’s a logical
type. logical
is the term in R that describes data that can be either TRUE
or FALSE
(or NA
).
Basic data types in R are:
- character
- numeric
- integer
- complex (for example,
4+3i
, wherei
represents the square root of -1) - logical
Functions
Earlier, we used a function, typeof()
. A function is a named set of code that can take zero or more parameters as input, performs some calculations or other processing, and (optionally) returns a value back.
Let’s look at some examples of functions:
- Some mathematical functions you might use that can take a single value are functions like:
sqrt()
,log()
,exp()
:
sqrt(415)
[1] 20.37155
x <- 524
log(x)
[1] 6.261492
exp(1)
[1] 2.718282
Mathematical functions are only one of many types of functions we’ll be using in R. For example, we can use functions that work on strings:
tolower('Capitalized Text')
[1] "capitalized text"
There are, of course, functions that take no parameters, and functions that take more than one parameter:
round(3.9472, digits = 2)
[1] 3.95
Notice in the above example that the digits
parameter is optional. If we leave it out, round()
still works, because it has a default value for digits
of 0:
round(3.9472)
[1] 4
Tip: Getting help on a function
Nobody (not even the authors of this material) has memorized all of the functions in R. Looking up how to use a function is typically something you would do frequently while using R.
While you can certainly search the internet to learn more about an R function, RStudio provides several ways to read about a fucntion:
At the RStudio Console, you can type
?
and then the name of a function. For example, you might type?round
and the Help pane, in the lower right area, will open up the documentation for that function.If you’re not sure which function you need, you can search for it by either:
- Typing in the search bar in the Help pane, or
- Typing
??
in the Console, followed by the text you want to search on.
Later we’ll create and use our own functions!
Moving between data types: Coercion
Sometimes you have data as one type, but would like to “force” it to be a different type. As an example, you might have “120” as text (character) data, but you’d like to treat it as a number, not text. We’ll see later that one place this can occur is when you’re reading data into R from a file, and R makes some assumptions about the types of the data values.
R has a set of functions that start with as.
which can help convert types:
as.numeric("120.6")
[1] 120.6
as.logical("True")
[1] TRUE
Exercise
What happens when you try to coerce something that doesn’t “fit” as the target type? For example:
- What happens when you try to coerce text with letters to a number?
- What happens when you try to coerce text other than “true” or “false” to a logical value?
- What happens when you try to coerce a floating-point number to an integer (and what function would you use to coerce to an integer?)
Solution
- This results in an NA (note the warning as well)
as.numeric("abc")
Warning: NAs introduced by coercion
[1] NA
- This also results in an NA
as.logical("maybe")
[1] NA
- You can use
as.integer()
and it will round the number to the nearest integer:as.integer("11.82")
[1] 11
Logical expressions
Earlier we created a variable and set its value to TRUE
. We can also evaluate R expressions whose value will be TRUE
or FALSE
(or NA
).
Although there are other types of expressions that evaluate to a logical value, logical expressions involving numerical quantities will often use operators like <
(less than), >
(greater than), <=
(less than or equal to), >=
(greater than or equal to), ==
(equal to), !=
(not equal to)
Discussion
- Why does R need a separate
==
operator to express equality? Why not use=
?Solution
In R,
=
is similar to<-
in that, depending on the context, it can set the value on the left to the value on the right. So an expression likex == 3
would returnTRUE
orFALSE
depending on whether or notx
is equal to 3, butx = 3
would setx
to 3.
We can also use logical operators in logical expressions, such as &
(and), |
(or), !
(not).
As with mathematical expressions, using parentheses can help clarify your code and avoid having to remember the order of operations:
heart_rate <- 72
blood_pressure_category <- "Normal"
(heart_rate < 96) & (blood_pressure_category == "Normal")
[1] TRUE
TODO: Make the rest a new section
Vectors and Atomic vectors
As R is built around working with statistics, an important construct in R is the vector. A vector is a list or sequence of values.
note coercion that occurs - exercise re: which types win
Functions, Revisited – with Vectors
TODO: Logicals - counting up values
TRUE + TRUE + FALSE + TRUE yields 4 so… you can do things like df$x <- c(4, 6, 9, 1, 0) sum(df$x > 5)
Key Points
Use RStudio to write and run R programs.
R has the usual arithmetic operators and mathematical functions.
Use c() to construct a vector
Use
<-
to assign values to variables.Use
ls()
to list the variables in a program.Use
rm()
to delete objects in a program.Use
install.packages()
to install packages (libraries).