Project proposal

Author

gold-dingo

library(tidyverse)

Dataset

A brief description of your dataset including its provenance, dimensions, etc. as well as the reason why you chose this dataset.

Make sure to load the data and use inline code for some of this information.

#load data

ufo_sightings <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-06-20/ufo_sightings.csv')
Rows: 96429 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (7): city, state, country_code, shape, reported_duration, summary, day_...
dbl  (1): duration_seconds
lgl  (1): has_images
dttm (2): reported_date_time, reported_date_time_utc
date (1): posted_date

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
places <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-06-20/places.csv')
Rows: 14417 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): city, alternate_city_names, state, country, country_code, timezone
dbl (4): latitude, longitude, population, elevation_m

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
day_parts_map <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-06-20/day_parts_map.csv')
Rows: 26409 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl  (2): rounded_lat, rounded_long
date (1): rounded_date
time (9): astronomical_twilight_begin, nautical_twilight_begin, civil_twilig...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

This is a dataset that is the result of an combination of UFO sightings from the National UFO Research Center enriched by data from https://sunrise-sunset.org/. This combination enables us to determine the meteorological and lighting conditions that took place during each recorded sighting. This data set includes three data frames: day_parts_map (the daytime conditions, 26409 observations of 12 variables), places (location of sightings, 14417 observations of 10 variabls), and ufo_sightings (96429 observations of 12 variables). We chose this dataset because we are interested in UFOs.

Questions

The two questions you want to answer.

Question 1: How does the frequency of UFO sightings evolve over time, based on location and time of day? What is the short term and long term frequency of UFO sightings? Do these metrics differ by location? y = date, x = location, z = time of day

Question 2: How is the shape of a UFO dependent on location? Is there a relationship between the shape of a UFO and where it is sighted? x = location, a = shape

Analysis plan

A plan for answering each of the questions including the variables involved, variables to be created (if any), external data to be merged in (if any).

For question 1, we need to merge two of the data frames. Variables date, and time of day are both in ufo_sightings, but to get a better idea of “location,” we need to get the latitude and longitude coordinates in order to map the data. Since latitude and longitude are not in ufo_sightings, but in the places data frame, we need to merge both data frames. We can do this by using a left merge that sets the “city” variable from ufo_sightings equal to the “city” variable from “places.”

We will do the same for question 2, since the shape variable is in the ufo_sightings data frame, and we are still using a location variable. We still need the latitude and longitude of the cities from ufo_sightings.

For both questions, we are working with a variable x that we must create using existing variables “city”, “latitude”, and “longitude.”