Project proposal

Author

Gold Echidna

library(tidyverse)

# loading data
owid_energy <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-06-06/owid-energy.csv')

Dataset

A brief description of your dataset including its provenance, dimensions, etc. as well as the reason why you choose this dataset.

For our project, we chose the Energy Dataset from Tidy Tuesday which has energy records for every country from 1900 to today. The data comes from the organization Our World in Data which researches global problems like poverty and inequality through data collection and visualization. Their mission is to make “research and data to make progress against the world’s largest problems” and they release this dataset every year in that effort.

The dataset is extensive. It contains 21,890 different observations of 129 different dimensions ranging from descriptions of economic measures like GDP to measures of production/consumption of every type of energy used in the past 120 years. This data comes from over 300 countries — some have complete records while others have gaps or stop after certain years (e.g. records from West Germany begin in 1940s and end in the 1990s). While contemporary records are very well kept, we will have to keep missing data in mind if looking far back in the data.

Another trait of the dataset is that the author added a number of regional statistics. This opens up new opportunities for analyzing the global and regional trend of energy consumption that could be representing a larger-scale trend across multiple countries rather than focusing on a few. In the meantime, when visualizing the data, this trait also means that we need to take care to filter out irrelevant entries suppose we set a scope for the visualization. Finally, some data entries are about statistics of the entire world. This opens up opportunities to connect to other global statistics.

In summary, we choose this dataset because it is extensive and thorough, allowing for very interesting questions to be answered. We are also interested in the historical impact of energy on climate change. With this dataset, we can look at how countries current energy use compares to historical use. We can analyze trends and see patterns in how the developed world is impacting the developing world and all of our futures. Additionally, the dataset could also provide opportunities to connect the yearly data to other relevant dataset around the topic of energy production and consumption.

Questions

Question 1: How does the energy mix vary by country for the top and bottom 15 economies (measured by GDP)?

Question 2: Are fossil fuel use and average temperature correlated?

Analysis plan

A plan for answering each of the questions including the variables involved, variables to be created (if any), external data to be merged in (if any).

For question 1, we plan on studying the primary energy consumption of the largest and smallest 15 economies in 2022 by energy source, broken down into the following five sources and their corresponding variables in the dataset: - Coal: coal_share_energy - Natural gas: gas_share_energy - Oil: oil_share_energy - Renewables: renewables_share_energy - Nuclear electric power: nuclear_share_energy

The time frame of this question will be 1950 to 2022.

In addition to these variables, we will use the gdp variable to select the largest and smallest 15 economies in 2022, the most recent year of the study. This will let us create two groups (top 15 and bottom 15) which we will use to create two plots. The top 15 economies stay consistent with their quality of data but the bottom 15 however do not. So we will only use data for the bottom 15 economies for which we have complete data and we will describe which countries we leave out and for what reasons in the analysis.

To visualize this data, we will create a bar plot broken down by the 5 energy sources and by year, in two plots for each group of countries. We do not foresee the need to use create new variables or to add external data.

As for question 2, we hope to understand the extend to which global warming is influenced by the amount of fossil fuel. This is a very relevant question to our environment. There is a heated political debate going on in the United States regarding whether the burning of fossil fuels causes global warming, and if so, to what extent. We will seek to explore this claim by looking at data from credible sources. We will start by merging in climate data from the National Centers for Environmental Information (NCEI), specifically the global average surface temperature for each year in the period from 1950 to 2022. We will then create a new variable in our data set that estimates global fossil fuel use by summing the variable fossil_fuel_consumption for every country we have data on. Finally, we will plot two separate lines, one representing global average temperature and the other representing global fossil fuel consumption, over the time period 1950-2022. It will be intuitive to visualize whether there is a correlation, and also whether there is any “lag” from when fossil fuels are burned to when global temperatures rise (if at all).