Data Wrangling

“Data wrangling” is the process of reshaping and/or restructuring data. The type of data wrangling you might do depends on what your next steps are. For example, data wrangling to produce a summary table might require a different target “shape” than if you wanted to perform statistical data analysis, or if you wanted to perform data visualization. So, it all depends, and there is MUCH to talk about here. But it’s unusual to NOT have to do at least some data wrangling after you start with raw data.

R tools for data wrangling

Although we can do some data wrangling in “base R”, let’s go right to using some libraries that can make data wrangling much easier.

We’re going to start with the dplyr package, which is part of the tidyverse set of packages, and it provides a “grammar” for data manipulation. The tidyr package (also part of the tidyverse) provides functions to help you “tidy” your data, which is a specific type of data wrangling. More on this to come.