Data Types and Data Structures

Basic data types

R has the following basic data types:

  • Numeric to represent numbers that may contain decimal points
  • Integer to represent integer (whole number) values
  • Complex to represent numbers with both a real and imaginary part: for example, 5 + 3i where i = \sqrt{-1}
  • Logical to represent TRUE and FALSE
  • Character to represent text data: for example, "abc" or '123'. Note that the single or double quotes aren’t part of the value, but they are used to enclose character values to ensure they are interpreted as data rather than code, and as text rather than numbers.

Vectors

A vector is an ordered collection of data values that are all the same data type.

We can use the c() (“combine”) function in R to combine a sequence of values into a vector. For example:

vector1 <- c(5, 5, 9)
vector1
[1] 5 5 9

By default, the index of each item is its numbered position (i.e.). But if we want, we can also label each position as well:

vector2 <- c(a = 5, b = 5, c = 9)
vector2
a b c 
5 5 9 

Remember that vectors contain values that are all the same type. What happens if we try to combine differently-typed values into a vector using c()?

Matrices

A matrix is a 2-dimensional structure that contains data that are all the same data type.

We can use the matrix() function in R to create a matrix from a sequence of values. For example:

matrix1 <- matrix(c(5, 5, 9, 7, NA, 3), nrow = 3, ncol = 2)
matrix1
     [,1] [,2]
[1,]    5    7
[2,]    5   NA
[3,]    9    3

Mixing types in Vectors or Matrices

What happens if we try to create a vector or matrix with mixed types?

vector2 <- c('a', 1, TRUE)
vector2
[1] "a"    "1"    "TRUE"

We see here (by the quotes) that all of the values were converted to character (text) type. When you try to mix types, R will coerce all of the values to the “least restrictive” type that can accommodate all of the values. The order of coercion would be:

logical >> integer >> numeric >> complex >> character

Coercion

We’ve used the term “coercion” to describe when a value is forced to another data type. R has many functions beginning with as. that you can use to coerce data to other types.

For example, to coerce any value to text, we can use as.character. For example:

as.character(32.5)
[1] "32.5"

When feasible, we can also convert data to more restrictive types. So for example, we can convert 0s and 1s to logical, using as.logical():

as.logical(c(1, 0, 0, NA, 1, 2))
[1]  TRUE FALSE FALSE    NA  TRUE  TRUE

Notice in this case that any number other than 0 becomes TRUE.

Another example:

as.numeric(c('32.5', '-5', 'some text'))
Warning: NAs introduced by coercion
[1] 32.5 -5.0   NA

Note the warning, caused by the fact that 'some text' cannot be converted to numeric.

Vector and Matrix operations

Vectors and matrices allow us to perform mathematical operations on entire sets of values at once. For example:

v1 <- c(1, 2, 3)
v2 <- c(4, 5, 6)
v1 + v2
[1] 5 7 9

or for a matrix:

m1 <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3)
m1**2 # square each element
     [,1] [,2] [,3]
[1,]    1    9   25
[2,]    4   16   36

Lists

Unlike vectors and matrices, lists allow mixing of types. Not only that, lists may have any structure that you like. For example, a list may contain vectors, matrices, and even other lists.

Let’s make our own lists using the list() function:

apple_info <- list(company = "Apple", ticker_symbol = "APPL",
              stock_price = 170.33,
              employees = c("Tim Cook", "Craig Federighi", "Jony Ive"),
              stock_history = data.frame(
                date = as.Date(c("2024-06-01", "2024-06-02", "2024-06-03")),
                price = c(168.23, 169.45, 170.33)
              ))
              
amazon_info <- list(company = "Amazon", ticker_symbol = "AMZN",
                   stock_price = 135.67,
                   employees = c("Andy Jassy", "Werner Vogels", "Adam Selipsky"),
                   stock_history = data.frame(
                     date = as.Date(c("2024-06-01", "2024-06-02", "2024-06-03")),
                     price = c(133.45, 134.56, 135.67)
                   ))

Now we can combine these lists into a larger list:

companies_info <- list(apple_info,
                       amazon_info)

How do we access elements of a list, where the elements aren’t named? In this case, each element is assigned a number, and we use double square brackets [[ ]] to access the elements. For example, to access the first element of companies_info, which is apple_info, we would use:

companies_info[[1]]
$company
[1] "Apple"

$ticker_symbol
[1] "APPL"

$stock_price
[1] 170.33

$employees
[1] "Tim Cook"        "Craig Federighi" "Jony Ive"       

$stock_history
        date  price
1 2024-06-01 168.23
2 2024-06-02 169.45
3 2024-06-03 170.33

Once we have the value of the 1st element of the list - which in our case is itself a list, we can use the $ operator to access elements of that list. For example, to get the company name of the first element of companies_info, we would use:

companies_info[[1]]$company
[1] "Apple"

A better practice would be to uniquely name each element of the list. For example, we could use the ticker symbol for each company as its index:

companies_info <- list(AAPL = apple_info,
                       GOOG = amazon_info)

Now we can access the elements by name:

apple_stock_history <- companies_info$AAPL$stock_history
apple_stock_history
        date  price
1 2024-06-01 168.23
2 2024-06-02 169.45
3 2024-06-03 170.33