Project proposal

Author

Proud Panda

library(tidyverse)

Dataset

#install.packages("tidytuesdayR")
library(tidytuesdayR)

gas_prices <- tt_load('2025-07-01')
---- Compiling #TidyTuesday Information for 2025-07-01 ----
--- There is 1 file available ---


── Downloading files ───────────────────────────────────────────────────────────

  1 of 1: "weekly_gas_prices.csv"
Warning: The `file` argument of `vroom()` must use `I()` for literal data as of vroom
1.5.0.
  
  # Bad:
  vroom("X,Y\n1.5,2.3\n")
  
  # Good:
  vroom(I("X,Y\n1.5,2.3\n"))
ℹ The deprecated feature was likely used in the readr package.
  Please report the issue at <https://github.com/tidyverse/readr/issues>.
weekly_gas_prices <- gas_prices$weekly_gas_prices

weekly_gas_prices
# A tibble: 22,360 × 5
   date       fuel     grade   formulation  price
   <date>     <chr>    <chr>   <chr>        <dbl>
 1 1990-08-20 gasoline regular all           1.19
 2 1990-08-20 gasoline regular conventional  1.19
 3 1990-08-27 gasoline regular all           1.25
 4 1990-08-27 gasoline regular conventional  1.25
 5 1990-09-03 gasoline regular all           1.24
 6 1990-09-03 gasoline regular conventional  1.24
 7 1990-09-10 gasoline regular all           1.25
 8 1990-09-10 gasoline regular conventional  1.25
 9 1990-09-17 gasoline regular all           1.27
10 1990-09-17 gasoline regular conventional  1.27
# ℹ 22,350 more rows

Our data set originates from the U.S. Energy Information Administration and contains weekly observations across multiple gasoline grades and fuel formulations starting in August 1980 and until June 2025. The dataset contains 22,360 rows and 5 variables (date, fuel, grade, formulation, and price), with each row representing a weekly gasoline price observation by fuel type, grade, and formulation. We chose this dataset because fuel prices underly most large business operations, so we’re curious how gas prices correlate with market trends we have observed over the years.

Questions

1. How has pricing of gasoline and diesel changed over time and if there is any differences in the rating across fuel types?

2. Is there seasonal change to fuel prices and if there is, how has this changed year to year.

Analysis plan

A plan for answering each of the questions including the variables involved, variables to be created (if any), external data to be merged in (if any).

1. How has pricing of gasoline and diesel changed over time?

Variables involved: date, fuel, price

Variables to be created: none

We plan to use a line plot to plot fuel on color, plot date on the x-axis, and price on the y-axis. We then are going to compare the difference between gasoline and diesel prices over time. To keep the initial analysis straightforward, we will first compare aggregate (“all) gasoline prices with diesel prices.

We will then conduct a more detailed analysis within gasoline with a second similar line plot comparing prices across grades (regular, midgrade, and premium). Since observations for gasoline, diesel, and each gasoline grade is only starting on November 28, 1994, we will limit our analysis to 1994-11-28 through 2025-06-23.

No external data is required to be merged in.

2. Is there seasonal change to fuel prices and if there is, how has this changed year to year?

Variables involved: date, fuel, price

Variables to be created: year, month

We plan to group the data by year, month, and fuel. We are then going to compute the mean monthly price. With this information we can then look at patterns throughout a given year and see if there are any seasonal patterns. We plan to use a faceted line plot with month on the x-axis, price on the y-axis, fuel separated by color, and faceted by year. Similar to Question 1, we will use the data from 1994-11-28 through 2025-06-23.

No external data is required to be merged in.