When you read in data, there may be columns that you don’t need. Of the columns that you do need, you may want to rename them to be more descriptive (or less verbose), or the column names may not be in a format that is easy to work with. For example, they may contain spaces or special characters.
We can use the dplyr package to select and rename columns. The select() function allows us to choose which columns to keep, and the rename() function allows us to change column names. We can essentially do both, within the select() function, by using the syntax new_name = old_name:
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
subject_id first_name last_name age
1 P001 Arul Rao 28
2 P072 Zhe Liu 34
3 P213 Skylar Brown 45
Importantly, we can use the - (minus) notation to specify only the columns that we don’t want to keep. Imagine if you have 100 columns and you only want to drop a few of them. You can do this: