This assignment reinforces ideas in the building blocks topic.
Due: September 23 at 11:59pm
Please submit (via courseworks) the web address of the GitHub repo containing your work for this assignment; git commits after the due date will cause the assignment to be considered late.
R Markdown documents included as part of your solutions must not install packages, and should only load the packages necessary for your submission to knit.
Problem | Points |
---|---|
Problem 0.1 | 25 |
Problem 0.2 | 25 |
Problem 1 | 25 |
Problem 2 | 25 |
Optional survey | No points |
This “problem” focuses on the use of R Markdown to write reproducible reports, GitHub for version control, and R Projects to organize your work.
To that end:
p8105_hw1_YOURUNI
(e.g. p8105_hw1_ajg2202
for Jeff), but that’s not
requiredp8105_hw1_YOURUNI.Rmd
that renders to github_document
Your solutions to Problems 1 and 2 should be implemented in your .Rmd file, and your git commit history should reflect the process you used to solve these Problems.
For this Problem, we will assess adherence to the instructions above regarding repo structure, git commit history, and whether we are able to knit your .Rmd to ensure that your work is reproducible.
This “problem” focuses on correct styling for your solutions to Problems 1 and 2. We will look for:
library
calls; etc.)This problem focuses the use of inline R code, plotting, and the
behavior of ggplot
for variables of different types.
Use the code below to download the a package containing the
penguins
dataset:
install.packages("moderndive")
You only need to run this command once to install the package, and you can do so directly in the console. This code shouldn’t be executed by your RMarkdown file.
Load the moderndive
library, and use the following code
to load the early_january_weather
dataset:
data("early_january_weather")
Write a short description of the dataset using inline R code; accessing the dataset help file can be informative. In your discussion, please include:
nrow
and
ncol
)Make a scatterplot of temp
(y) vs time_hour
(x); color points using the humid
variable (adding
color = ...
inside of aes
in your
ggplot
code should help). Describe patterns that are
apparent in this plot.
Export your scatterplot to your project directory using
ggsave
.
This problem is intended to emphasize variable types and introduce coercion; some awareness of how R treats numeric, character, and factor variables is necessary for working with these data types in practice.
Create a data frame comprised of:
Try to take the mean of each variable in your dataframe. What works and what doesn’t?
Hint: for now, to take the mean of a variable in a dataframe, you
need to pull the variable out of the dataframe. Try loading the
tidyverse and using the pull
function.
In some cases, you can explicitly convert variables from one type to
another. Write a code chunk that applies the as.numeric
function to the logical, character, and factor variables (please show
this chunk but not the output). What happens, and why? Does this help
explain what happens when you try to take the mean?
If you’d like, you can complete this short survey after you’ve finished the assignment.