This assignment reinforces ideas in Iteration.
Due: November 20 at 11:59pm.
Please submit (via courseworks) the web address of the GitHub repo containing your work for this assignment; git commits after the due date will cause the assignment to be considered late.
R Markdown documents included as part of your solutions must not install packages, and should only load the packages necessary for your submission to knit.
|Optional survey||No points|
This “problem” focuses on structure of your submission, especially the use git and GitHub for reproducibility, R Projects to organize your work, R Markdown to write reproducible reports, relative paths to load data from local files, and reasonable naming structures for your files. To that end:
p8105_hw5_ajg2202for Jeff), but that’s not required
p8105_hw5_YOURUNI.Rmdthat renders to
Your solutions to Problems 1, 2, and 3 should be implemented in your .Rmd file, and your git commit history should reflect the process you used to solve these Problems.
For this Problem, we will assess adherence to the instructions above regarding repo structure, git commit history, and whether we are able to knit your .Rmd to ensure that your work is reproducible. Adherence to appropriate styling and clarity of code will be assessed in Problems 1+ using the style rubric.
This homework includes figures; the readability of your embedded plots (e.g. font sizes, axis labels, titles) will be assessed in Problems 1+.
Describe the raw data. Create a
city_state variable (e.g. “Baltimore, MD”) and then summarize within cities to obtain the total number of homicides and the number of unsolved homicides (those for which the disposition is “Closed without arrest” or “Open/No arrest”).
For the city of Baltimore, MD, use the
prop.test function to estimate the proportion of homicides that are unsolved; save the output of
prop.test as an R object, apply the
broom::tidy to this object and pull the estimated proportion and confidence intervals from the resulting tidy dataframe.
prop.test for each of the cities in your dataset, and extract both the proportion of unsolved homicides and the confidence interval for each. Do this within a “tidy” pipeline, making use of
purrr::map2, list columns and
unnest as necessary to create a tidy dataframe with estimated proportions and CIs for each city.
Create a plot that shows the estimates and CIs for each city – check out
geom_errorbar for a way to add error bars based on the upper and lower limits. Organize cities according to the proportion of unsolved homicides.
This zip file contains data from a longitudinal study that included a control arm and an experimental arm. Data for each participant is included in a separate file, and file names include the subject ID and arm.
Create a tidy dataframe containing data from all participants, including the subject ID, arm, and observations over time:
list.filesfunction will help
purrr::mapand saving the result as a new variable in the dataframe
Make a spaghetti plot showing observations on each subject over time, and comment on differences between groups.
The code chunk below loads the
iris dataset from the
tidyverse package and introduces some missing values in each column. The purpose of this problem is to fill in those missing values.
library(tidyverse) set.seed(10) iris_with_missing = iris %>% map_df(~replace(.x, sample(1:150, 20), NA)) %>% mutate(Species = as.character(Species))
There are two cases to address:
Write a function that takes a vector as an argument; replaces missing values using the rules defined above; and returns the resulting vector. Apply this function to the columns of
iris_with_missing using a
Please complete this survey regarding extra topics to cover at the end of the semester.
If you’d like, a you can complete this short survey after you’ve finished the assignment.