This lesson is still being designed and assembled (Pre-Alpha version)

R Basics, Part 1: Data types and Functions

Overview

Teaching: 45 min
Exercises: 20 min
Questions
  • How are different kinds of data represented in R?

  • What is a variable in R?

  • How do I test assertions about data?

Objectives
  • Use and understand scalar data types

  • Use and understand vectors and lists

  • Define a variable

  • Assign data to a variable

  • Manage a workspace in an interactive R session

  • Use mathematical and comparison operators

  • Call functions

  • Manage packages

Variables in R

We can use variables to store data that we can then reference. A variable has a name, which is like a label, and a value.

Let’s look at an R expression that assigns a value to a variable:

height_cm <- 140

This statement has three parts:

Let’s take a look at what happened when you executed the statement height_cm <- 140.

In your Environment pane (usually in the upper right area of R Studio), you should now see a “Values” section. This area is like a table of variable names and their values. You should see height_cm listed with a value of 140.

If you now type the just name of the variable, height_cm at the console, R will evaluate the variable and print its current value. Try it.

Exercise 1

  1. Create a height_inches variable and set it to the value of height_cm divided by 2.54 (1 inch = 2.54 cm).
  2. Then, change the value of height_cm. Did height_inches recompute? Why or why not?

Solution

  1. height_inches <- height_cm / 2.54
  2. height_inches does not recompute. It was set to the *value* of height_cm divided by 2.54, which is a number. It’s not actually a formula or function that’s tied to the height_cm variable. We’ll get to functions later!

Variable names

Different programming languages have slightly different rules around what characters may be used in variable names. In R, both numbers and letters can be used, as well as a dot (.) and an underscore (_).

Exercise

  1. Can you determine whether a variable name may start with a number? Can a variable name start with a period or an underscore?

  2. Why do you think R allows an underscore (_) but not a hyphen (-) in variable names?

  3. Can you determine whether variable names are case-sensitive?

Solution

  1. something
  2. Something else
  3. And something else

Variable Types

In our example above, the variables we used stored numeric values. In real data, though, we have values that might be text, that might be true/false, might be empty, etc.

Let’s try creating variables with values that are other types:

study_id <- 'AB123X'
exposure <- TRUE

The idea here is that a variable not only stores data, but stores it as a particular type.

We can “ask” a variable what its type is, in R, using the typeof() function, like this:

typeof(height_cm)
[1] "double"

R tells us that height_cm is a “double”. What is a “double”? “double” refers to a double-precision floating-point number.

If we check the type of study_id, we’ll see that it’s a character type. character is the term in R that describes text data.

If we check the type of exposure, we’ll see that it’s a logical type. logical is the term in R that describes data that can be either TRUE or FALSE (or NA).

Basic data types in R are:

Functions

Earlier, we used a function, typeof(). A function is a named set of code that can take zero or more parameters as input, performs some calculations or other processing, and (optionally) returns a value back.

Let’s look at some examples of functions:

sqrt(415)
[1] 20.37155
x <- 524
log(x)
[1] 6.261492
exp(1)
[1] 2.718282

Mathematical functions are only one of many types of functions we’ll be using in R. For example, we can use functions that work on strings:

tolower('Capitalized Text')
[1] "capitalized text"

There are, of course, functions that take no parameters, and functions that take more than one parameter:

round(3.9472, digits = 2)
[1] 3.95

Notice in the above example that the digits parameter is optional. If we leave it out, round() still works, because it has a default value for digits of 0:

round(3.9472)
[1] 4

Tip: Getting help on a function

Nobody (not even the authors of this material) has memorized all of the functions in R. Looking up how to use a function is typically something you would do frequently while using R.

While you can certainly search the internet to learn more about an R function, RStudio provides several ways to read about a fucntion:

  • At the RStudio Console, you can type ? and then the name of a function. For example, you might type ?round and the Help pane, in the lower right area, will open up the documentation for that function.

  • If you’re not sure which function you need, you can search for it by either:

    • Typing in the search bar in the Help pane, or
    • Typing ?? in the Console, followed by the text you want to search on.

Later we’ll create and use our own functions!

Moving between data types: Coercion

Sometimes you have data as one type, but would like to “force” it to be a different type. As an example, you might have “120” as text (character) data, but you’d like to treat it as a number, not text. We’ll see later that one place this can occur is when you’re reading data into R from a file, and R makes some assumptions about the types of the data values.

R has a set of functions that start with as. which can help convert types:

as.numeric("120.6")
[1] 120.6
as.logical("True")
[1] TRUE

Exercise

What happens when you try to coerce something that doesn’t “fit” as the target type? For example:

  1. What happens when you try to coerce text with letters to a number?
  2. What happens when you try to coerce text other than “true” or “false” to a logical value?
  3. What happens when you try to coerce a floating-point number to an integer (and what function would you use to coerce to an integer?)

Solution

  1. This results in an NA (note the warning as well)
as.numeric("abc")
Warning: NAs introduced by coercion
[1] NA
  1. This also results in an NA
as.logical("maybe")
[1] NA
  1. You can use as.integer() and it will round the number to the nearest integer:
as.integer("11.82")
[1] 11

Logical expressions

Earlier we created a variable and set its value to TRUE. We can also evaluate R expressions whose value will be TRUE or FALSE (or NA).

Although there are other types of expressions that evaluate to a logical value, logical expressions involving numerical quantities will often use operators like < (less than), > (greater than), <= (less than or equal to), >= (greater than or equal to), == (equal to), != (not equal to)

Discussion

  1. Why does R need a separate == operator to express equality? Why not use =?

Solution

In R, = is similar to <- in that, depending on the context, it can set the value on the left to the value on the right. So an expression like x == 3 would return TRUE or FALSE depending on whether or not x is equal to 3, but x = 3 would set x to 3.

We can also use logical operators in logical expressions, such as & (and), | (or), ! (not).

As with mathematical expressions, using parentheses can help clarify your code and avoid having to remember the order of operations:

heart_rate <- 72
blood_pressure_category <- "Normal"

(heart_rate < 96) & (blood_pressure_category == "Normal")
[1] TRUE

TODO: Make the rest a new section

Vectors and Atomic vectors

As R is built around working with statistics, an important construct in R is the vector. A vector is a list or sequence of values.

note coercion that occurs - exercise re: which types win

Functions, Revisited – with Vectors

TODO: Logicals - counting up values

TRUE + TRUE + FALSE + TRUE yields 4 so… you can do things like df$x <- c(4, 6, 9, 1, 0) sum(df$x > 5)

Key Points

  • Use RStudio to write and run R programs.

  • R has the usual arithmetic operators and mathematical functions.

  • Use c() to construct a vector

  • Use <- to assign values to variables.

  • Use ls() to list the variables in a program.

  • Use rm() to delete objects in a program.

  • Use install.packages() to install packages (libraries).