You can’t do data science without data, and data aren’t going to wrangle themselves…
Data wrangling is the process of getting data in whatever form they exist and, through a variety of processes, turning those data into a form that suits your current needs. We’ll talk about how to get data in several common formats into R; how to transform, manage, and manipulate data in a cohesive way using
dplyr; what it means for data to be “tidy” and how to make them so; and what to do when your data are spread across multiple tables.
The topic is made up of the following components:
It has been argued that data carpentry is a better term than data wrangling. I only sorta like that, although it’s a useful analogy to consider.
The code that I produced working examples in lecture is here.