library(tidyverse)
Project proposal
Dataset
A brief description of your dataset including its provenance, dimensions, etc. as well as the reason why you chose this dataset.
Make sure to load the data and use inline code for some of this information.
#load data
<- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-06-20/ufo_sightings.csv') ufo_sightings
Rows: 96429 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): city, state, country_code, shape, reported_duration, summary, day_...
dbl (1): duration_seconds
lgl (1): has_images
dttm (2): reported_date_time, reported_date_time_utc
date (1): posted_date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
<- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-06-20/places.csv') places
Rows: 14417 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): city, alternate_city_names, state, country, country_code, timezone
dbl (4): latitude, longitude, population, elevation_m
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
<- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-06-20/day_parts_map.csv') day_parts_map
Rows: 26409 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): rounded_lat, rounded_long
date (1): rounded_date
time (9): astronomical_twilight_begin, nautical_twilight_begin, civil_twilig...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
This is a dataset that is the result of an combination of UFO sightings from the National UFO Research Center enriched by data from https://sunrise-sunset.org/. This combination enables us to determine the meteorological and lighting conditions that took place during each recorded sighting. This data set includes three data frames: day_parts_map (the daytime conditions, 26409 observations of 12 variables), places (location of sightings, 14417 observations of 10 variabls), and ufo_sightings (96429 observations of 12 variables). We chose this dataset because we are interested in UFOs.
Questions
The two questions you want to answer.
Question 1: How does the frequency of UFO sightings evolve over time, based on location and time of day? What is the short term and long term frequency of UFO sightings? Do these metrics differ by location? y = date, x = location, z = time of day
Question 2: How is the shape of a UFO dependent on location? Is there a relationship between the shape of a UFO and where it is sighted? x = location, a = shape
Analysis plan
A plan for answering each of the questions including the variables involved, variables to be created (if any), external data to be merged in (if any).
For question 1, we need to merge two of the data frames. Variables date, and time of day are both in ufo_sightings, but to get a better idea of “location,” we need to get the latitude and longitude coordinates in order to map the data. Since latitude and longitude are not in ufo_sightings, but in the places data frame, we need to merge both data frames. We can do this by using a left merge that sets the “city” variable from ufo_sightings equal to the “city” variable from “places.”
We will do the same for question 2, since the shape variable is in the ufo_sightings data frame, and we are still using a location variable. We still need the latitude and longitude of the cities from ufo_sightings.
For both questions, we are working with a variable x that we must create using existing variables “city”, “latitude”, and “longitude.”