Project proposal

Author

Dank-stan

library(tidyverse)

Dataset

Dataset Description

The Carbon Majors database compiles historical production data from 122 of the world’s largest producers of oil, gas, coal, and cement. It helps measure both direct emissions from operations and emissions resulting from the use of their products. The database covers 75 investor-owned companies, 36 state-owned companies, and 11 national entities, including 82 oil producers, 81 gas producers, 49 coal producers, and 6 cement producers. With records dating back to 1854, it accounts for over 1.42 trillion tonnes of CO₂ equivalent emissions, representing 72% of global fossil fuel and cement emissions since the Industrial Revolution began in 1751. This data is collected though The dataset is built primarily from self-reported production data (e.g., annual reports, SEC filings) but also incorporates third-party sources like the U.S. Energy Information Administration (EIA) and industry journals when necessary.

Dataset Origins and Gathering Process

Production data is standardized (e.g., oil in barrels, coal in tonnes). Emissions are estimated using IPCC default emission factors, adjusted for non-energy uses (e.g., petrochemicals that store carbon). Scope 3 emissions (88%) are calculated from net fossil fuel production (not sales) to avoid double counting. Scope 1 emissions include flaring, venting, fugitive methane, and own fuel use. The data is updated annually and was first compiled in 2013 by Richard Heede of the Climate Accountability Institute (CAI). It includes investor-owned companies, state-owned entities, and nation-states in certain cases (e.g., Soviet Union before 1991). Gaps in early data are interpolated where possible, but missing historical production is not estimated. The datasets goal is to attribute emissions directly to producers and track their impact on total global fossil fuel and cement emissions since the Industrial Revolution.

Why this dataset: This is an urgent, real-world topic. Climate change and corporate emissions remain a foremost issue for humanity, especially with the new administration leaving of the Paris climate accords. This visualization seeks to be a reminder of the dangers presented by climate change, as well as a call to action to still hold those most responsible accountable. Furthermore, the detailed data contained within allows for a nuanced comparative analysis of this data. To tackle this topic we then need to ask: What is the extent to which the major companies emit pollutants? What are the major pollutants that these companies use and emit? Which companies produce the most pollutants? By answering these questions we can then know where to concentrate our efforts in controlling the pollution levels and abate the climate crisis to some extent.

Dimensions of The Dataset

To answer our questions, we use the following variables contained in our dataset.There are 10 numerical variables and 2 categorical variables:

Numerical variables:

production_value – The amount of production (e.g., 0.9125). product_emissions_MtCO2 – Emissions from product-related activities, measured in megatons of CO₂. flaring_emissions_MtCO2 – CO₂ emissions due to gas flaring. venting_emissions_MtCO2 – CO₂ emissions from gas venting. own_fuel_use_emissions_MtCO2 – CO₂ emissions from fuel used in operations. fugitive_methane_emissions_MtCO2e – Methane leakage emissions, converted to CO₂ equivalent. fugitive_methane_emissions_MtCH4 – Methane emissions measured in megatons of CH₄. total_operational_emissions_MtCO2e – The sum of all operational emissions in CO₂ equivalent. total_emissions_MtCO2e – The overall emissions, incorporating additional factors.

Categorical Dimensions:

production_unit – The unit of production measurement (e.g., “Million bbl/yr”). source – The reference or data source (e.g., “Abu Dhabi National Oil Company Annual Report 1975, pp. 35-37”). This dataset primarily quantifies emissions across various categories for oil production while documenting the production scale and data source.

carbon_df <- read.csv("data/emissions_high_granularity.csv")

# column names
names(carbon_df)

 [1] "year"                               "parent_entity"                     
 [3] "parent_type"                        "reporting_entity"                  
 [5] "commodity"                          "production_value"                  
 [7] "production_unit"                    "product_emissions_MtCO2"           
 [9] "flaring_emissions_MtCO2"            "venting_emissions_MtCO2"           
[11] "own_fuel_use_emissions_MtCO2"       "fugitive_methane_emissions_MtCO2e" 
[13] "fugitive_methane_emissions_MtCH4"   "total_operational_emissions_MtCO2e"
[15] "total_emissions_MtCO2e"             "source"

# how many rows & columns
cat("Number of rows:", nrow(carbon_df), "\n")

Number of rows: 15797

cat("Number of columns:", ncol(carbon_df), "\n")

Number of columns: 16

#  the structure (data types and a preview of each column)
print(head(carbon_df, 10))

   year                  parent_entity        parent_type reporting_entity
1  1962 Abu Dhabi National Oil Company State-owned Entity        Abu Dhabi
2  1963 Abu Dhabi National Oil Company State-owned Entity        Abu Dhabi
3  1964 Abu Dhabi National Oil Company State-owned Entity        Abu Dhabi
4  1965 Abu Dhabi National Oil Company State-owned Entity        Abu Dhabi
5  1966 Abu Dhabi National Oil Company State-owned Entity        Abu Dhabi
6  1967 Abu Dhabi National Oil Company State-owned Entity        Abu Dhabi
7  1968 Abu Dhabi National Oil Company State-owned Entity        Abu Dhabi
8  1969 Abu Dhabi National Oil Company State-owned Entity        Abu Dhabi
9  1970 Abu Dhabi National Oil Company State-owned Entity        Abu Dhabi
10 1971 Abu Dhabi National Oil Company State-owned Entity        Abu Dhabi
   commodity production_value production_unit product_emissions_MtCO2
1  Oil & NGL           0.9125  Million bbl/yr               0.3389277
2  Oil & NGL           1.8250  Million bbl/yr               0.6778554
3  Oil & NGL           7.3000  Million bbl/yr               2.7114216
4  Oil & NGL          10.9500  Million bbl/yr               4.0671324
5  Oil & NGL          13.5050  Million bbl/yr               5.0161300
6  Oil & NGL          14.6000  Million bbl/yr               5.4228432
7  Oil & NGL          18.2500  Million bbl/yr               6.7785540
8  Oil & NGL          22.2650  Million bbl/yr               8.2698359
9  Oil & NGL          25.7325  Million bbl/yr               9.5577611
10 Oil & NGL          34.3100  Million bbl/yr              12.7436815
   flaring_emissions_MtCO2 venting_emissions_MtCO2 own_fuel_use_emissions_MtCO2
1              0.005404077             0.001298972                            0
2              0.010808155             0.002597944                            0
3              0.043232620             0.010391775                            0
4              0.064848929             0.015587663                            0
5              0.079980346             0.019224785                            0
6              0.086465239             0.020783551                            0
7              0.108081549             0.025979439                            0
8              0.131859490             0.031694915                            0
9              0.152394984             0.036631008                            0
10             0.203193312             0.048841345                            0
   fugitive_methane_emissions_MtCO2e fugitive_methane_emissions_MtCH4
1                         0.01825408                     0.0006519315
2                         0.03650816                     0.0013038630
3                         0.14603266                     0.0052154520
4                         0.21904898                     0.0078231780
5                         0.27016041                     0.0096485862
6                         0.29206531                     0.0104309040
7                         0.36508164                     0.0130386299
8                         0.44539960                     0.0159071285
9                         0.51476511                     0.0183844682
10                        0.68635348                     0.0245126243
   total_operational_emissions_MtCO2e total_emissions_MtCO2e
1                          0.02495713              0.3638848
2                          0.04991426              0.7277697
3                          0.19965705              2.9110786
4                          0.29948558              4.3666180
5                          0.36936554              5.3854955
6                          0.39931410              5.8221573
7                          0.49914263              7.2776966
8                          0.60895400              8.8787899
9                          0.70379110             10.2615522
10                         0.93838814             13.6820696
                                                         source
1  Abu Dhabi National Oil Company Annual Report 1975, pp. 35-37
2  Abu Dhabi National Oil Company Annual Report 1975, pp. 35-38
3  Abu Dhabi National Oil Company Annual Report 1975, pp. 35-39
4  Abu Dhabi National Oil Company Annual Report 1975, pp. 35-40
5  Abu Dhabi National Oil Company Annual Report 1975, pp. 35-41
6  Abu Dhabi National Oil Company Annual Report 1975, pp. 35-42
7  Abu Dhabi National Oil Company Annual Report 1975, pp. 35-43
8  Abu Dhabi National Oil Company Annual Report 1975, pp. 35-44
9  Abu Dhabi National Oil Company Annual Report 1975, pp. 35-45
10 Abu Dhabi National Oil Company Annual Report 1975, pp. 35-46

# summary stats for each column (for numeric columns, etc.)
summary(carbon_df)

      year      parent_entity      parent_type        reporting_entity  
 Min.   :1854   Length:15797       Length:15797       Length:15797      
 1st Qu.:1970   Class :character   Class :character   Class :character  
 Median :1993   Mode  :character   Mode  :character   Mode  :character  
 Mean   :1986                                                           
 3rd Qu.:2007                                                           
 Max.   :2022                                                           
  commodity         production_value   production_unit   
 Length:15797       Min.   :    0.00   Length:15797      
 Class :character   1st Qu.:   11.80   Class :character  
 Mode  :character   Median :   59.97   Mode  :character  
                    Mean   :  327.88                     
                    3rd Qu.:  246.38                     
                    Max.   :27192.00                     
 product_emissions_MtCO2 flaring_emissions_MtCO2 venting_emissions_MtCO2
 Min.   :   0.000        Min.   : 0.00000        Min.   : 0.00000       
 1st Qu.:   5.996        1st Qu.: 0.00000        1st Qu.: 0.00000       
 Median :  21.502        Median : 0.01591        Median : 0.04525       
 Mean   :  79.392        Mean   : 0.51723        Mean   : 0.46246       
 3rd Qu.:  62.192        3rd Qu.: 0.19725        3rd Qu.: 0.32972       
 Max.   :7769.222        Max.   :27.02687        Max.   :41.45866       
 own_fuel_use_emissions_MtCO2 fugitive_methane_emissions_MtCO2e
 Min.   : 0.0000              Min.   :  0.0000                 
 1st Qu.: 0.0000              1st Qu.:  0.6071                 
 Median : 0.0000              Median :  2.3511                 
 Mean   : 0.6887              Mean   :  8.8842                 
 3rd Qu.: 0.1624              3rd Qu.:  7.4017                 
 Max.   :83.2035              Max.   :877.6837                 
 fugitive_methane_emissions_MtCH4 total_operational_emissions_MtCO2e
 Min.   : 0.00000                 Min.   :  0.000                   
 1st Qu.: 0.02168                 1st Qu.:  0.752                   
 Median : 0.08397                 Median :  2.870                   
 Mean   : 0.31729                 Mean   : 10.553                   
 3rd Qu.: 0.26434                 3rd Qu.:  8.966                   
 Max.   :31.34585                 Max.   :877.684                   
 total_emissions_MtCO2e    source         
 Min.   :   0.000       Length:15797      
 1st Qu.:   7.209       Class :character  
 Median :  25.117       Mode  :character  
 Mean   :  89.944                         
 3rd Qu.:  72.255                         
 Max.   :8646.906

Questions

The two questions you want to answer.

Q1. “How have total operational emissions (total_operational_emissions_MtCO2e) evolved over time for major parent_entity groups, and which entities account for the largest share of these emissions in different years?

Q2. Q2. Among the parent entities identified in the first chart as the largest historical polluters, which specific emission sources (e.g. product, flaring, venting, own fuel use, or fugitive methane) drive their total_operational_emissions_MtCO2e?

Analysis plan

A plan for answering each of the questions including the variables involved, variables to be created (if any), external data to be merged in (if any).

The feedback we received from the other teams was very helpful. As pointed out, this project was originally attempting to answer both questions through the creation of a single graph. After realizing this was inappropriate from the feedback given to us by Dank Vibe, we have decided to change the initial visualization for the first question away from an alluvial diagram and into a more appropriate visualization genome, while keeping the alluvial diagram to answer a more refined version of our regional second question, one more appropriate for visualizing.

In regards to the feedback given to us by DANK TEA, we will attempt to implement normalized data to their specifications, as we believe this may be a strong way to approach the visualization. However, we are considering if such processing is necessary, as the purpose of the data set is to assign guilt to the largest emitters, regardless of efficiency. In this regards, we may still use un-normalized data if it becomes apparent that such pre-processing obscures the messaging that the original data set was built to convey.

Preparation and background for plan 1:

We plan to narrow our scope to the top 10 contributors for total operational emissions. To do this, we are going to normalize the data to account for the size of different companies since larger entities will naturally have a higher total emission than smaller ones.

Plan 1:

We will use a multiple-line graph. The x-axis represents the years the emissions have happened. The y-axis represents the total emissions. Each parent company will be represented by a line, with the trend representing the trend in total emissions through the years. The variables we shall use are the year, total_operational_emission, and source. We shall calculate this for the 10 companies with the highest total operational emissions. During this process, we will need to process the data into discrete time chunks. Rather than plotting all of the data from each year at once, we will have to bin together emitters that appear in multiple years (Decades, half decades, or another appropriate time frame) and analyze their trends as a whole, this will include merging the relevant emissions statistics together as well for comparison of more manageable time chunk comparisons.

Preparation and background for plan 2:

We will continue to use the normalized data as in plan 1 but in this case, we will be using yearly data. Additionally, this visualization will aim to view the graph(s) from plan 1 at a more detailed level. More specifically, we will be viewing the top contributing entities and looking at what the sources of their emissions are. That is product, flaring, venting, own fuel use, or fugitive methane. This way we can see what specific actions are potentially linked to higher total emissions and how that has changed from year to year.

Plan 2:

To identify the key emission sources driving total operational emissions, we will use an alluvial diagram. Entities previously identified as historically significant polluters will be positioned on the left side, while the corresponding pollution types will be displayed on the right. The width or thickness of each flow in the diagram will represent the total operational emissions of each entity, with segments directing emissions into their respective categories. This visualization will clarify how emissions are distributed, highlighting the most problematic sources and informing future regulatory responses. Additionally, this approach will help us identify trends in industrial production, pinpointing where emissions are most concentrated.

Variables for this question include:

Categorical: parent_entity Numerical: product_emissions_MtCO2, flaring_emissions_MtCO2, venting_emissions_MtCO2, own_fuel_use_emissions_MtCO2, fugitive_methane_emissions_MtCO2e, total_operational_emissions_MtCO2e

Why an alluvial flow chart:

An alluvial flow chart is ideal for communicating how emissions are distributed across entities and categories by visually mapping relationships between them. Its structured yet flexible design makes it easy to track emission flows, with width variations clearly indicating magnitude. This format enhances readability by grouping related emissions while preserving hierarchical complexity without overwhelming the viewer. By emphasizing proportional relationships, it allows for intuitive comparisons, making it easier to identify dominant polluters and major emission sources. The visual clarity helps stakeholders quickly grasp key insights, guiding data-driven decision-making. This makes it a powerful tool for effectively translating complex emission data into actionable information.