Project title

Author

Team name
Names of team members


Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Introduction - to dataset

Climate change has become an increasingly urgent threat, with fossil fuel combustion and cement production playing a pivotal role in contributing greenhouse gases to the atmosphere. The Carbon Majors database provides a comprehensive view of the world’s largest producers of oil, gas, coal, and cement, along with estimates of their direct (operational) and indirect (product-related) emissions. Originally compiled by the Climate Accountability Institute (CAI), this dataset traces emissions back to 1854, covering over 1.42 trillion tonnes of CO₂ equivalent.

In this project, we use the Carbon Majors database to explore how total operational emissions have evolved over time among major producers and to investigate which aspects of production (e.g., flaring, venting, own fuel use, fugitive methane) contribute most significantly to those emissions. By visualizing these trends, we aim to clarify which entities hold the largest share of historical emissions and emphasize the responsibility these producers bear in global climate change.

The Carbon Majors dataset compiles historical production data from 122 of the world’s largest producers of oil, gas, coal, and cement. Records date back to 1854 and account for over 1.42 trillion tonnes of CO₂ equivalent (CO₂e) emissions—representing 72% of global fossil fuel and cement emissions since the start of the Industrial Revolution.

Provenance: Data primarily from self-reported production (annual reports, SEC filings) with supplemental data from the U.S. Energy Information Administration and other industry journals. Scope: Covers investor-owned companies, state-owned companies, and national entities. Emissions calculations follow IPCC methods, with scope 3 (indirect, product-use) emissions comprising 88% of total emissions for many entities. Structure: 10 numerical fields (e.g., production_value, different emissions types) and 2 categorical fields (production_unit, source). Why this dataset? Climate change and corporate emissions are among the most pressing global issues. By focusing on historical emissions from major producers, we can better understand the distribution of responsibility and inform policy and public discourse.

Question 1 - re-word ## How have total operational emissions (total_operational_emissions_MtCO2e) evolved over time for major producer groups, and which entities account for the largest share of these emissions in different years?

Question 1 Introduction ### Introduction

Introduction This question focuses on understanding the historical trends in total operational emissions across top emitters. Since this dataset includes annual emissions from various sources (flaring, venting, fugitive methane, etc.), it is crucial to get a high-level view first: which companies emit the most overall, and how those emissions have changed over time? Understanding which parent entities dominate these operational emissions allows us to contextualize their overall impact.

We will isolate data relevant to total_operational_emissions_MtCO2e and group it by both year (or decade bins) and the parent entities. This approach helps us identify patterns or spikes in emissions and attribute them to specific companies.

Approach for Question 1 ### Approach

fill this in later

Analysis

(2-3 code blocks, 2 figures, text/code comments as needed) In this section, provide the code that generates your plots. Use scale functions to provide nice axis labels and guides. You are welcome to use theme functions to customize the appearance of your plot, but you are not required to do so. All plots must be made with ggplot2. Do not use base R or lattice plotting functions.

merged_df <- carbon_df |>
  select(
    year, 
    parent_entity,
    product_emissions_MtCO2,
    flaring_emissions_MtCO2,
    venting_emissions_MtCO2,
    own_fuel_use_emissions_MtCO2,
    fugitive_methane_emissions_MtCO2e,
    total_operational_emissions_MtCO2e,
    total_emissions_MtCO2e
  ) |>
  # 2. Group by both year and parent_entity.
  group_by(year, parent_entity) |>
  # 3. Sum up the numeric emissions columns for each group.
  summarise(across(
    .cols = c(
      product_emissions_MtCO2, 
      flaring_emissions_MtCO2, 
      venting_emissions_MtCO2, 
      own_fuel_use_emissions_MtCO2,
      fugitive_methane_emissions_MtCO2e,
      total_operational_emissions_MtCO2e,
      total_emissions_MtCO2e
    ),
    .fns = ~ sum(.x, na.rm = TRUE)
  ),
  .groups = "drop") |>
  # 4. Create the decade variable and filter out any rows with NA decade.
  mutate(decade = case_when(
    year >= 1962 & year < 1970 ~ "1962-1969",
    year >= 1970 & year < 1980 ~ "1970-1979",
    year >= 1980 & year < 1990 ~ "1980-1989",
    year >= 1990 & year < 2000 ~ "1990-1999",
    year >= 2000 & year < 2010 ~ "2000-2009",
    year >= 2010 & year <= 2022 ~ "2010-2022",
    TRUE ~ NA_character_
  )) |>
  filter(!is.na(decade))


yearly_op <- merged_df |>
  group_by(parent_entity, year, decade) |>
  summarise(yearly_emissions = max(total_emissions_MtCO2e, na.rm = TRUE), .groups = "drop")


decade_op <- yearly_op |>
  group_by(parent_entity, decade) |>
  slice_max(order_by = year, n = 1) |>      
  ungroup() |>
  rename(decade_total = yearly_emissions)

decade_op <- decade_op |>
  group_by(parent_entity) |>
  mutate(cumulative_emissions = cumsum(decade_total)) |>
  ungroup()

decade_op <- decade_op |>
  filter(!(parent_entity == "Former Soviet Union" & decade %in% c("1990-1999", 
                                                                 "2000-2009", 
                                                                 "2010-2022")))
get_top5_for_decade <- function(data, decade_label) {
  data |>
    filter(decade == decade_label) |>
    arrange(desc(cumulative_emissions)) |>
    slice_head(n = 5) 
}

df_1962_1969 <- get_top5_for_decade(decade_op, "1962-1969")
df_1970_1979 <- get_top5_for_decade(decade_op, "1970-1979")
df_1980_1989 <- get_top5_for_decade(decade_op, "1980-1989")
df_1990_1999 <- get_top5_for_decade(decade_op, "1990-1999")
df_2000_2009 <- get_top5_for_decade(decade_op, "2000-2009")
df_2010_2022 <- get_top5_for_decade(decade_op, "2010-2022")

# Combine the separate decade data frames
top5_by_decade <- bind_rows(
  df_1962_1969,
  df_1970_1979,
  df_1980_1989,
  df_1990_1999,
  df_2000_2009,
  df_2010_2022
)


top_contributors <- top5_by_decade |>
  group_by(decade) |>
  filter(decade == "2010-2022") |>
  arrange(desc(cumulative_emissions)) |>
  slice_head(n = 10) |>
  mutate(parent_entity = factor(parent_entity, levels = parent_entity[order(-cumulative_emissions)]))


parent_entities <- top_contributors |> 
  select(parent_entity) |>
   bind_rows(tibble(decade = "2010-2022", parent_entity = "China (Cement)"))
Adding missing grouping variables: `decade`
overall_df <- carbon_df |>
  mutate(entity = ifelse(parent_entity %in% parent_entities$parent_entity, "top_entities", "other_entities")) |>
  group_by(entity, year) |>
  summarise(yearly_operational = sum(total_operational_emissions_MtCO2e, na.rm = TRUE)) |>
  ungroup()
`summarise()` has grouped output by 'entity'. You can override using the
`.groups` argument.
# theme time! located below is our custom theme for this part of the project.
# this theme skews towards the warmer end of the color spectrum. why the warmer end? this data logically follows in an analysis of "global WARM-ing", so i belvied that such a colorpallet
# would be apropriate. this is morethan a pun, it is an attempt to make a logical conection with the users about the status of this project.
theme_warm_readable <- function(
  base_family = "Atkinson Hyperlegible", 
  # font chosen for visual accesisbility
  base_size = 14,
  ...
) {
  theme_minimal(base_family = base_family, base_size = base_size, ...) +
    theme(
      # Overall background color (warm, off-white)
      plot.background  = element_rect(fill = "#FDF8F4", color = NA),
      panel.background = element_rect(fill = "#FDF8F4", color = NA),
      
      # Keep major x grid lines, remove y grid lines
      panel.grid.minor = element_blank(),
      panel.grid.major.y = element_blank(),
      panel.grid.major.x = element_line(color = "grey65"),
      
      # Larger, bold title; slightly smaller, grey subtitle
      plot.title = element_text(face = "bold", size = rel(1.6), color = "black"),
      # (change THIS GREY later to make more ivisualy distinct?)
      plot.subtitle = element_text(face = "plain", size = rel(1.2), color = "black"),
      
      # Caption is italic, smaller, black (change later to make more ivisualy distinct?)
      plot.caption = element_text(face = "italic", size = rel(0.8), color = "grey40", hjust = 0),
      
      # Bold legend titles for clarity
      legend.title = element_text(face = "bold", size = rel(1.1)),
      legend.text  = element_text(size = rel(1.0)),
      
      # Bold axis titles
      axis.title = element_text(face = "bold", size = rel(1.1)),
      
      # Make axis text a touch larger and dark for contrast
      axis.text = element_text(size = rel(1.0), color = "black"),
      
      # Provide spacing for the x and y axis titles
      axis.title.x = element_text(margin = margin(t = 10), hjust = 0),
      axis.title.y = element_text(margin = margin(r = 10), hjust = 1),
      
      # Facet strip with a warm fill and a visible border
      strip.background = element_rect(fill = "#F3E3D3", color = "#BFA58A"),
      strip.text = element_text(face = "bold", size = rel(1.1), color = "black", hjust = 0),
      
      # A subtle border around the entire panel
      panel.border = element_rect(color = "#BFA58A", fill = NA, size = 0.6)
    )
}

#  warm palette with 7 colors, colors chosen for visual acesibility and proper contrast. 
# note: I dont belive that 7 colors were used in the chart, but an error thorows every time a chart is made without 7 included, so here you go i guess
warm_palette <- c(
  "#B35806", 
  "#E08214", 
  "#FDB863", 
  "#8073AC", 
  "#E7304D", 
  "#F2C545",
  "#8E6C16",
  "#D16A5E",
  "#A63603", 
  "#8856A7"   
)
ggplot(top5_by_decade, 
       aes(x = cumulative_emissions, 
           y = reorder(parent_entity, +cumulative_emissions), 
           fill = parent_entity)) +
  geom_bar(stat = "identity", show.legend = FALSE) +
  facet_wrap(~ decade, scales = "free_y") +
  labs(
    title = "Top 5 Historical Polluters per Decade (Cumulative Emissions)",
    subtitle = "Each facet shows the top 5 polluters' cumulative pollution at the decade's end.",
    x = "Cumulative Emissions (Million Metric Tons)",
    y = "Parent Entity"
  ) +
  # Flush bars at 0 and 30k, show ticks at 0, 10k, 20k, 30k
  scale_x_continuous(
    limits = c(0, 30000),
    expand = c(0, 0),
    breaks = seq(0, 30000, by = 10000),
    labels = function(x) ifelse(x == 0, "0", paste0(x / 1000, "k"))
  ) +
  coord_cartesian(clip = "off") +                
  theme_warm_readable(base_size = 14) +
  theme(
    plot.margin = margin(t = 10, r = 40, b = 10, l = 10)  # Add some margin on the right
  ) +
  scale_fill_manual(values = warm_palette)
Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead.

overall_df |>
  ggplot(aes(x = year, y = yearly_operational, fill = entity, group = entity)) +
  geom_area() +
  theme_warm_readable(base_size = 14) +
  scale_fill_manual(
    values = c(
      "other_entities" = "#E7304D",
      "top_entities"   = "#F2C545"
    ),
    labels = c(
      "other_entities" = "Other Entities",
      "top_entities"   = "Top Entities"
    )
  ) +
  labs(
    title    = "Emission Contribution of Top 5 Entities vs All Other Entities",
    subtitle = "China (Coal) and China (Cement) Merged",
    x        = "Year",
    y        = "Yearly Operational Emissions (Metric Tons)",
    fill     = "Entity Category"
  ) +
  # Make the area flush on left/right:
  scale_x_continuous(limits = c(1960, 2022), expand = c(0, 0)) +
  # Make area flush on top/bottom:
  scale_y_continuous(expand = c(0, 0))
Warning: Removed 182 rows containing non-finite outside the scale range
(`stat_align()`).

Discussion for Question 1 ### Discussion

(1-3 paragraphs) In the Discussion section, interpret the results of your analysis. Identify any trends revealed (or not revealed) by the plots. Speculate about why the data looks the way it does.

Question 2 ## Question 2  Among the parent entities identified in the first chart as the largest historical polluters, can their total operational emissions be compared to the amount of oil or coal they produce? Has it increased or deceased over time? Can we also compare this to other companies? What is the percentage of the emission to production factor for the key companies as compared to the other companies?

Introduction to Question 2 ### Introduction

(1-2 paragraphs) Introduction to the question and what parts of the dataset are necessary to answer the question. Also discuss why you’re interested in this question.

While total operational emissions provide a broad view of corporate responsibility in climate change, understanding the sources of these emissions is critical for identifying areas where reductions are possible. Different operational activities contribute to emissions at varying levels, with some practices—such as flaring and venting being avoidable through better infrastructure and regulation.

This analysis focuses on the top five polluters identified in Question 1, examining their emission trends over the last decade (2010–2022) to determine which sources drive their total operational emissions. Specifically, we analyze flaring, venting, and own fuel use emissions to assess whether companies have taken steps to mitigate high-emission activities or if their reliance on these processes has remained constant.

Breaking down emissions in this way provides actionable insights. If certain companies disproportionately rely on venting or flaring, targeted regulations could help decrease their emissions more effectively than broad policies. By visualizing how these emission sources have changed over time, we can identify patterns, stagnation, or progress in corporate emission reduction efforts.

Approach for Question 2 ### Approach

(1-2 paragraphs) Describe what types of plots you are going to make to address your question. For each plot, provide a clear explanation as to why this plot (e.g. box plot, bar plot, histogram, etc.) is best for providing the information you are asking about. The two plots should be of different types, and at least one of the two plots needs to use either color mapping or facets.

To analyze the key drivers of operational emissions for the top five polluters identified in Question 1, we will focus on emissions from flaring, venting, and own fuel use over the last three decades (1990 -2022)). This time frame allows us to assess contemporary trends in operational emissions and evaluate whether these companies have taken steps to reduce their reliance on high-emission processes.

First, we will extract the five entities with the highest cumulative emissions in the last decade. Once these major polluters are identified, we will examine their operational emissions, breaking them down into three primary categories: flaring emissions (flaring_emissions_MtCO2), venting emissions (venting_emissions_MtCO2), and own fuel use emissions (own_fuel_use_emissions_MtCO2). These categories capture significant sources of emissions that vary based on production methods, infrastructure efficiency, and regulatory compliance.

For each of these five entities, we will generate line charts tracking changes in their emissions over time. The x-axis will represent years from 2010 to 2022, while the y-axis will reflect emissions in metric tonnes (MtCO2). Each chart will display three distinct lines, one for each emission category, allowing for direct comparisons across sources. This visualization will highlight whether these companies have made efforts to reduce emissions from specific activities or if their reliance on flaring, venting, and own fuel use has remained stable or increased.

By analyzing these trends, we aim to determine whether major polluters have successfully adopted cleaner operational practices or if they continue to emit large amounts of carbon dioxide through inefficient or avoidable means. The results will help identify which companies are making strides in reducing high-emission processes and which remain the largest contributors to operational carbon pollution in recent years.

Analysis for Question 2 ### Analysis

`(2-3 code blocks, 2 figures, text/code comments as needed) In this section, provide the code that generates your plots. Use scale functions to provide nice axis labels and guides. You are welcome to use theme functions to customize the appearance of your plot, but you are not required to do so. All plots must be made with ggplot2. Do not use base R or lattice plotting functions.

#  Filter data to 1962–2022, then prepare "age" (years since founding)
# This is the filter we will use. These are the five key polluters we have found in question one.
key_polluters <- c(
  "China (Coal)",  # optionally merge with "China (Cement)" if needed
  "Gazprom",
  "National Iranian Oil Co.",
  "Saudi Aramco",
  "Chevron",
  "ExxonMobil"
)

# Here we calculate the amount of production and how long the companies have been in operation.
data_age <- carbon_df |>
  filter(year >= 1962, year <= 2022) |>               
  group_by(parent_entity) |>
  mutate(founding_year = min(year, na.rm = TRUE)) |>
  ungroup() |>
  mutate(age = year - founding_year) |>
  filter(!is.na(production_value), production_value != 0)

# Compute cumulative emissions, production, & intensity for each year they have been in production.
# Intensity is a value that represents the ratio between the emissions and production value. This is how much they pollute vs how much they make, and is a factor showing how well companies adopt practices that prevent pollution, or just how much CO2 companies emit compared to how much oil they produce.
data_age <- data_age |>
  arrange(parent_entity, age) |>
  group_by(parent_entity) |>
  mutate(
    cum_emissions  = cumsum(total_operational_emissions_MtCO2e),
    cum_production = cumsum(production_value),
    intensity      = ifelse(cum_production > 0, cum_emissions / cum_production, NA)
  ) |>
  ungroup() |>
  mutate(
    group = ifelse(parent_entity %in% key_polluters, "Key_Polluters", "Others")
  )

# We now summarize the emissions, production and intensity data by finding the mean production of these companies over the years. For all five companies, for each year they have been running, what is their mean production, emission and intensity values?
summaries_by_age <- data_age |>
  group_by(group, age) |>
  summarise(
    mean_cum_emissions  = mean(cum_emissions, na.rm = TRUE),
    mean_cum_production = mean(cum_production, na.rm = TRUE),
    mean_intensity      = mean(intensity, na.rm = TRUE),
    .groups = "drop"
  )

# This makes the data easier to graph.
summaries_wide <- summaries_by_age |>
  pivot_wider(
    names_from = group,
    values_from = c(mean_cum_emissions, mean_cum_production, mean_intensity)
  )

# Here we find the proportion of the emissions that are from key polluters compared to those from other companies.

diff_data <- summaries_wide |>
  mutate(
    ratio_emissions  = mean_cum_emissions_Key_Polluters / (mean_cum_emissions_Key_Polluters + mean_cum_emissions_Others),
    ratio_production = mean_cum_production_Key_Polluters / (mean_cum_production_Key_Polluters + mean_cum_production_Others),
    ratio_intensity  = mean_intensity_Key_Polluters / (mean_intensity_Key_Polluters + mean_intensity_Others)
  ) |>
  select(age, ratio_emissions, ratio_production, ratio_intensity) |>
  pivot_longer(
    cols = starts_with("ratio_"),
    names_to = "metric",
    values_to = "ratio"
  ) |>
  mutate(metric = recode(metric,
    "ratio_emissions"  = "Emissions Ratio (Key/Total)",
    "ratio_production" = "Production Ratio (Key/Total)",
    "ratio_intensity"  = "Intensity Ratio (Key/Total)"
  ))
print(data_age)
# A tibble: 12,978 × 22
    year parent_entity   parent_type reporting_entity commodity production_value
   <int> <chr>           <chr>       <chr>            <chr>                <dbl>
 1  1985 APA Corporation Investor-o… Apache           Oil & NGL             1.99
 2  1985 APA Corporation Investor-o… Apache           Natural …            38.6 
 3  1986 APA Corporation Investor-o… Apache           Oil & NGL             1.99
 4  1986 APA Corporation Investor-o… Apache           Natural …            38.6 
 5  1987 APA Corporation Investor-o… Apache           Oil & NGL             2.01
 6  1987 APA Corporation Investor-o… Apache           Natural …            40.9 
 7  1988 APA Corporation Investor-o… Apache           Oil & NGL             2.25
 8  1988 APA Corporation Investor-o… Apache           Natural …            53.0 
 9  1989 APA Corporation Investor-o… Apache           Oil & NGL             3.07
10  1989 APA Corporation Investor-o… Apache           Natural …            85.0 
# ℹ 12,968 more rows
# ℹ 16 more variables: production_unit <chr>, product_emissions_MtCO2 <dbl>,
#   flaring_emissions_MtCO2 <dbl>, venting_emissions_MtCO2 <dbl>,
#   own_fuel_use_emissions_MtCO2 <dbl>,
#   fugitive_methane_emissions_MtCO2e <dbl>,
#   fugitive_methane_emissions_MtCH4 <dbl>,
#   total_operational_emissions_MtCO2e <dbl>, total_emissions_MtCO2e <dbl>, …
print(summaries_by_age)
# A tibble: 122 × 5
   group           age mean_cum_emissions mean_cum_production mean_intensity
   <chr>         <int>              <dbl>               <dbl>          <dbl>
 1 Key_Polluters     0               50.0               2022.         0.0537
 2 Key_Polluters     1              146.                6040.         0.0578
 3 Key_Polluters     2              248.               10270.         0.0579
 4 Key_Polluters     3              355.               14689.         0.0568
 5 Key_Polluters     4              469.               19362.         0.0569
 6 Key_Polluters     5              588.               24366.         0.0570
 7 Key_Polluters     6              790.               33753.         0.0557
 8 Key_Polluters     7              936.               39877.         0.0557
 9 Key_Polluters     8             1085.               46079.         0.0547
10 Key_Polluters     9             1338.               57740.         0.0536
# ℹ 112 more rows
print(summaries_wide)
# A tibble: 61 × 7
     age mean_cum_emissions_Key_…¹ mean_cum_emissions_O…² mean_cum_production_…³
   <int>                     <dbl>                  <dbl>                  <dbl>
 1     0                      50.0                   10.4                  2022.
 2     1                     146.                    27.5                  6040.
 3     2                     248.                    45.4                 10270.
 4     3                     355.                    63.2                 14689.
 5     4                     469.                    80.0                 19362.
 6     5                     588.                    99.4                 24366.
 7     6                     790.                   119.                  33753.
 8     7                     936.                   133.                  39877.
 9     8                    1085.                   152.                  46079.
10     9                    1338.                   179.                  57740.
# ℹ 51 more rows
# ℹ abbreviated names: ¹​mean_cum_emissions_Key_Polluters,
#   ²​mean_cum_emissions_Others, ³​mean_cum_production_Key_Polluters
# ℹ 3 more variables: mean_cum_production_Others <dbl>,
#   mean_intensity_Key_Polluters <dbl>, mean_intensity_Others <dbl>
p1 <- ggplot(data_age, aes(x = age, y = cum_emissions, color = group)) +
  geom_smooth(method = "loess", span = 0.5, se = FALSE, size = 1.2) +
  labs(
    title = "Cumulative Operational Emissions",
    x = "Years Since Founding",
    y = "Cumulative Emissions (MtCO2e)"
  ) +
  scale_color_manual(values = c("Key_Polluters" = "#E7304D", "Others" = "#FDB863")) +
  scale_x_continuous(limits = c(0, 60), breaks = seq(0, 60, 10)) +   # <- NEW X-AXIS LIMITS
  theme_warm_readable()
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
p2 <- ggplot(data_age, aes(x = age, y = cum_production, color = group)) +
  geom_smooth(method = "loess", span = 0.5, se = FALSE, size = 1.2) +
  labs(
    title = "Cumulative Production",
    x = "Years Since Founding",
    y = "Cumulative Production (Production Value)"
  ) +
  scale_color_manual(values = c("Key_Polluters" = "#E7304D", "Others" = "#FDB863")) +
  scale_x_continuous(limits = c(0, 60), breaks = seq(0, 60, 10)) +   # <- NEW X-AXIS LIMITS
  theme_warm_readable()

p3 <- ggplot(data_age, aes(x = age, y = intensity, color = group)) +
  geom_smooth(method = "loess", span = 0.5, se = FALSE, size = 1.2) +
  labs(
    title = "Emission Intensity",
    subtitle = "Cumulative Emissions / Cumulative Production",
    x = "Years Since Founding",
    y = "Emission Intensity"
  ) +
  scale_color_manual(values = c("Key_Polluters" = "#E7304D", "Others" = "#FDB863")) +
  scale_x_continuous(limits = c(0, 60), breaks = seq(0, 60, 10)) +   # <- NEW X-AXIS LIMITS
  theme_warm_readable()

combined_plot <- (p1 + p2 + p3) +
  plot_annotation(
    title = "Comparing Growth Trajectories: Key Polluters vs. Others",
    subtitle = "Metrics Aligned by Years Since Each Company's Founding (1962–2022)"
  )

#Graph two, indicating the percentage of emissions, production and intensity values for the key polluters as compared to all companies.

p_ratio <- ggplot(diff_data, aes(x = age, y = ratio, color = metric)) +
  geom_line(linewidth = 1.2, alpha = 0.9) +
  geom_hline(yintercept = 1, linetype = "dashed") +
  scale_y_continuous(labels = scales::percent) +
  scale_x_continuous(limits = c(0, 60), breaks = seq(0, 60, 10)) +   # <- NEW X-AXIS LIMITS
  labs(
    title = "Percentage of Key Polluters to All the Polluters Over Time",
    x = "Years Since Founding",
    y = "Percentage",
    color = "Metric"
  ) +
  theme_warm_readable() +
  scale_color_manual(values = c(
    "Emissions Ratio (Key/Total)"  = "#E7304D",
    "Production Ratio (Key/Total)" = "#8073AC",
    "Intensity Ratio (Key/Total)"  = "#F2C545"
  ))

# Print final plots
combined_plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

p_ratio

Discussion for Question 2 ### Discussion

(1-3 paragraphs) In the Discussion section, interpret the results of your analysis. Identify any trends revealed (or not revealed) by the plots. Speculate about why the data looks the way it does.

##Chart 1: Cumulative Emissions, Production, and Intensity ### What the Chart Is:

This first set of three line graphs compares Key Polluters versus all other companies (labeled “Others”) based on how they grow from their respective founding years. Each panel uses a loess smoothing technique to illustrate average trends: (1) Cumulative Operational Emissions on the left, (2) Cumulative Production in the center, and (3) Emission Intensity (cumulative emissions divided by cumulative production) on the right. By aligning the x‐axis to “Years Since Founding” (0 to 60), we can see how these groups evolve over the same relative time span—helping us understand not just who is bigger but how they get bigger from the moment they appear in the data (1962–2022).

Interpreting Current Results:

Looking at the plots, Key Polluters (in red) show a substantially steeper rise in both cumulative emissions and production than Others (in orange). Even within the 60‐year window, the red lines outpace the orange, suggesting these major entities rapidly scale up operations and emissions. The rightmost panel reveals that Key Polluters also maintain a generally higher emission intensity, meaning they emit more CO₂ for each unit of production relative to the average of all other companies. Although the lines vary slightly as we move along the x‐axis, Key Polluters remain consistently above Others, reinforcing their status as heavy emitters and producers.

What the results Mean:

These three panels together indicate that Key Polluters tend to reach high output and high emissions relatively quickly in their corporate life cycles. The fact that their emission intensity (i.e., emissions per unit of production) also remains above the norm underscores the carbon‐heavy nature of their operations. In practical terms, any policy or technology targeting emission reductions will likely have a significant impact if it’s applied to these larger, higher‐intensity entities, since they are both big producers and relatively less efficient in terms of carbon per unit output.

##Graph 2: The Percentage of Emission, Production and Intensity Ratios of Key Polluters to All companies. ### What is the Graph? Here we have a line graph representing the percentage of emissions, production and intensity ratio of key companies as compared to all companies. The x axis is they time since the companies were founded while the y axis indicates the percentage of the total. The graph has three lines each showing three variables, Emissions Ratio (Key/Total), Production Ratio (Key/Total), Intensity Ratio (Key/Total). The emissions and production percentage or ratio in this case range between 80% - 90% while the intensity ratio ranges from 10% to 60% in a span of 60 years.

Interpretation

Both emissions and production ratio remain almost constant, but are constantly higher than all the other companies. There is a slight decrease in both in the 90s’ but there is a steady but small increase after that. There is a very sharp increase in the intensity ratio however, with a 50% increase in the span of 60 years.

What this may mean: Hypothesis

The decrease in the 90s can be hypothesized to be due to the recession that happened around that time. The intensity graph indicates that the amount of emissions increased compared to production over the sixty year period. Considering the fact that the key polluters contribute 80 - 90% of total mean emissions and production of polluters, then this increase indicates that either the goods being produced have more CO2 or CH4 to be emitted for each production unit or measures to reduce the amount of emissions are reducing. Both theories have disastrous implication, and this thus indicates where we would need to concentrate our efforts so as to reduce the amount of total operational emissions.

Presentation

Our presentation can be found here.

Data

Include a citation for your data here. See http://libraryguides.vu.edu.au/c.php?g=386501&p=4347840 for guidance on proper citation for datasets. If you got your data off the web, make sure to note the retrieval date.

References

List any references here. You should, at a minimum, list your data source.