Project proposal

Author

cjc352, kfg29, glu3, ch976 (proj-01-dank-camp)

library(tidyverse)
library(skimr)

Dataset

CarbonsMajors emission data (high granularity)

Why we chose this data set:

Our generation, Generation Z, has grown up with the climate crisis looming over our heads. When our group examined the available data sets from TidyTuesday, there were more than a handful of perfectly viable choices for our project. In the end our team chose this data set because of its relevance to our futures and the possible stories that could be revealed through the data. On a more technical scale, the “parent_type” column is of particular interest to our team because of the possible socioeconomic implications that may arise from categorical patterns.

Dataset History/Provenance:

We chose to explore the Carbon Majors dataset, which tracks emissions from major fossil fuel and cement producers since 1854. The dataset was originally created by Richard Heede of the Climate Accountability Institute in 2013. In April 2024, InfluenceMap took over the maintenance and annual updates of the dataset, releasing it on carbonmajors.org. The dataset covers 122 of the world’s largest oil, gas, coal, and cement producers, including investor-owned companies, state-owned entities, and nation-states. It provides a comprehensive view of these entities’ contributions to greenhouse gas emissions, accounting for about 72% of global fossil fuel and cement emissions since the Industrial Revolution. The data is available in three granularity levels (low, medium, and high), offering different levels of detail on emissions and production.

Description:

This data set provides information on historical carbon emissions from 122 of the world’s largest oil, gas, coal, and cement producers spanning back to 1854. The data set provides information across various parent entities, including investor-owned companies, state-owned entities, and nation states. It also tracks emissions from different fossil fuel commodities, along with production volumes and the unit of production.

emissions <- read.csv("data/emissions_high_granularity.csv")


glimpse(emissions)
Rows: 31,597
Columns: 16
$ year                               <chr> "1962", "1963", "1964", "1965", "19…
$ parent_entity                      <chr> "Abu Dhabi National Oil Company", "…
$ parent_type                        <chr> "State-owned Entity", "State-owned …
$ reporting_entity                   <chr> "Abu Dhabi", "Abu Dhabi", "Abu Dhab…
$ commodity                          <chr> "Oil & NGL", "Oil & NGL", "Oil & NG…
$ production_value                   <chr> "0.9125", "1.825", "7.3", "10.95", …
$ production_unit                    <chr> "Million bbl/yr", "Million bbl/yr",…
$ product_emissions_MtCO2            <chr> "0.3389277", "0.677855399", "2.7114…
$ flaring_emissions_MtCO2            <chr> "0.005404077", "0.010808155", "0.04…
$ venting_emissions_MtCO2            <chr> "0.001298972", "0.002597944", "0.01…
$ own_fuel_use_emissions_MtCO2       <chr> "0.00E+00", "0.00E+00", "0.00E+00",…
$ fugitive_methane_emissions_MtCO2e  <chr> "0.018254082", "0.036508164", "0.14…
$ fugitive_methane_emissions_MtCH4   <chr> "0.000651931", "0.001303863", "0.00…
$ total_operational_emissions_MtCO2e <chr> "0.024957131", "0.049914263", "0.19…
$ total_emissions_MtCO2e             <chr> "0.363884831", "0.727769662", "2.91…
$ source                             <chr> "Abu Dhabi National Oil Company Ann…
skim(emissions)
Data summary
Name emissions
Number of rows 31597
Number of columns 16
_______________________
Column type frequency:
character 16
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
year 0 1 4 48 0 172 0
parent_entity 0 1 0 39 2 125 0
parent_type 0 1 0 22 2 5 0
reporting_entity 0 1 0 42 2 308 0
commodity 0 1 0 19 2 11 0
production_value 0 1 0 16 2 13835 0
production_unit 0 1 0 18 2 6 0
product_emissions_MtCO2 0 1 0 23 2 14523 0
flaring_emissions_MtCO2 0 1 0 23 2 9141 0
venting_emissions_MtCO2 0 1 0 23 2 9141 0
own_fuel_use_emissions_MtCO2 0 1 0 28 2 4372 0
fugitive_methane_emissions_MtCO2e 0 1 0 33 2 14217 0
fugitive_methane_emissions_MtCH4 0 1 0 32 2 14215 0
total_operational_emissions_MtCO2e 0 1 0 34 2 14217 0
total_emissions_MtCO2e 0 1 0 22 2 14523 0
source 0 1 0 372 2 2241 0
dim_text <- paste("The dataset has", nrow(emissions), "rows and", ncol(emissions), "columns.")
print(dim_text)
[1] "The dataset has 31597 rows and 16 columns."

Questions

  1. Do investor-owned companies, state-owned entities, or nation-state have significantly different emission patterns over time, and how are changes in those patterns are related to enacted major climate policy legislation? We chose these events as they should have global impacts on CO2 emissions.

Each of the two questions you come up with should involve more than two variables in order to answer. You should phrase them in a way that is within the scope of inference of your data. For example, if you have an observational dataset, you shouldn’t phrase your question in a causal way.

  1. Do investor-owned companies, state-owned entities, or nation-state have significantly different emission patterns over time, and how are changes in those patterns are related to enacted major climate policy legislation? We chose these events as they should have global impacts on CO2 emissions.
    • Our team wants to highlight the following environmental legislation:
      • Montreal Protocol (1987, Effective 1989)
      • U.S. Clean Air Act Amendments (1990)
      • Kyoto Protocol (1997, Implemented 2005)
      • European Union Emissions Trading System (EU ETS) (2005)
      • Paris Climate Agreement (2015, Implemented 2016)
      • UK Climate Change Act (2008)
      • UK Climate Change Act (2008)
  2. How do Scope 1 and Scope 3 emissions differ commodity type for non-investor-owned entities?
    • Scope 1 emissions: Emissions directly associated with the operation of an entity. Defined as the sum of the four scope 1 emissions (flaring, venting, own fuel use, and fugitive methane) that are associated with the extraction, storage, processing, and transportation of included commodities.
  • Scope 3 emissions: Emissions associated with the combustion of marketed products.

     (Definitions sourced from the Carbon Majors website)

Analysis plan

A plan for answering each of the questions including the variables involved, variables to be created (if any), external data to be merged in (if any).

Question 1 plan:

For this question we will be utilizing the parent_type, year, and total_emissions_ MtCO2e variables from our data set. Additionally, we will be also need to manually add in data for 3-4 relevant world events that had an effect on emissions legislation or public opinion. One example of legislation would be the Paris Agreement in 2015 which was an international treaty about climate change. Another landmark we could use is For this we will need the event name as well as the data it occurred. We plan to combine this data and visualize it in a time series line graph, with the significant world events depicted as vertical lines on the graph.

Question 2 plan:

For question 2 we will be utilizing the commodity, parent_type, total_operational_emissionsMtCO2e, and product_emissions_MtCO2 from our data set. These variables quantify the commodity used (natural gas, coal, etc), the year, and how much Carbon Dioxide emissions are created. We well need to filter the data when parent_type != “Investor-owned Company”. From there we will group the data by “commodity” category summarize for all categories’ operational and product emissions. We will then use a bar graph to visualize the overall operational and product emissions for each commodity type. Some possible data cleaning might include unit normalization across different entities.