“The Instacart Online Grocery Shopping Dataset 2017”, Accessed from https://www.instacart.com/datasets/grocery-shopping-2017 on June 24, 2017.
The version of the Instacart data that we will use in this class can be found here.
Instacart is an online grocery service that allows you to shop online from local stores. In New York City, partner stores include Whole Foods, Fairway, and The Food Emporium. Instacart offers same-day delivery, and items that users purchase are delivered within 2 hours.
“The Instacart Online Grocery Shopping Dataset 2017” is an anonymized dataset with over 3 million online grocery orders from more than 200,000 Instacart users. However the dataset does not represent a random sampling of products, users, or purchases. Therefore, while the data allow examination of trends in online grocery purchasing, the results may not be generalizable to Instacart users more broadly.
“The Instacart Online Grocery Shopping Dataset 2017” website provides some summary results of interesting findings, including:
The original data is quite extensive, and the data linked to at the top of this page for use in the class represents a cleaned and limited version of the data. The dataset contains 1,384,617 observations of 131,209 unique users, where each row in the dataset is a product from an order. There is a single order per user in this dataset.
There are 15 variables in this dataset:
order_id: order identifier
product_id: product identifier
add_to_cart_order: order in which each product was added to cart
reordered: 1 if this prodcut has been ordered by this user in the past, 0 otherwise
user_id: customer identifier
eval_set: which evaluation set this order belongs in (Note that the data for use in this class is exclusively from the “train”
order_number: the order sequence number for this user (1=first, n=nth)
order_dow: the day of the week on which the order was placed
order_hour_of_day: the hour of the day on which the order was placed
days_since_prior_order: days since the last order, capped at 30, NA if
product_name: name of the product
aisle_id: aisle identifier
department_id: department identifier
aisle: the name of the aisle
department: the name of the department