library(tidyverse)
library(skimr)Project title
Proposal
Data 1
Introduction and data
Identify the source of the data.
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
The data was collected in October, 2018 over the 350 countable hectares of NYC’s Central Park. It was collected by a joint partnership between The Explorer’s Club, NYU’s dept. of Environmental Studies, Macaulay’s Honors College, the Central Park Conservancy, and the NYC Dept. of Parks and Recreation, who entered data through 3,000 pages of tally sheets.
Write a brief description of the observations.
There are approximately 3000 operations, providing information on the fur color of squirrels, their age, actions, and locations. There evidently seems to be far more adult squirrels sighted than juvenile ones.
Research question
A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
How does the time of day (the AM/PM shift) influence the color and location of the squirrel spotted?
How does squirrel color differ throughout the 350 countable hectares of NYC’s Central Park?
A description of the research topic along with a concise statement of your hypotheses on this topic.
The research questions will help us understand the population and behaviors of the Eastern Gray squirrel, which is often overlooked in academic studies due to its commonness. My hypothesis is that squirrels are more active in the AM shift, and that different colored squirrels inhabit different parts of the park. Since there are three fur colors and several highlight colors among 350 hectares, the squirrels will inhabit different parts of the park in different habitats.
Identify the types of variables in your research question. Categorical? Quantitative?
Squirrel color is a categorical variable. The 350 countable hectares variable will come from the Hectare Squirrel Number variable in the data set, and is a quantitative variable.
Glimpse of data
# add code here
library(tidyverse)
squirrel <- read_csv(file="squirrel.csv")Rows: 3023 Columns: 31
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (14): Unique Squirrel ID, Hectare, Shift, Age, Primary Fur Color, Highli...
dbl (4): X, Y, Date, Hectare Squirrel Number
lgl (13): Running, Chasing, Climbing, Eating, Foraging, Kuks, Quaas, Moans, ...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(squirrel)Rows: 3,023
Columns: 31
$ X <dbl> -73.95613, -73.96886, -73…
$ Y <dbl> 40.79408, 40.78378, 40.77…
$ `Unique Squirrel ID` <chr> "37F-PM-1014-03", "21B-AM…
$ Hectare <chr> "37F", "21B", "11B", "32E…
$ Shift <chr> "PM", "AM", "PM", "PM", "…
$ Date <dbl> 10142018, 10192018, 10142…
$ `Hectare Squirrel Number` <dbl> 3, 4, 8, 14, 5, 3, 2, 2, …
$ Age <chr> NA, NA, NA, "Adult", "Adu…
$ `Primary Fur Color` <chr> NA, NA, "Gray", "Gray", "…
$ `Highlight Fur Color` <chr> NA, NA, NA, NA, "Cinnamon…
$ `Combination of Primary and Highlight Color` <chr> "+", "+", "Gray+", "Gray+…
$ `Color notes` <chr> NA, NA, NA, "Nothing sele…
$ Location <chr> NA, NA, "Above Ground", N…
$ `Above Ground Sighter Measurement` <chr> NA, NA, "10", NA, NA, NA,…
$ `Specific Location` <chr> NA, NA, NA, NA, "on tree …
$ Running <lgl> FALSE, FALSE, FALSE, FALS…
$ Chasing <lgl> FALSE, FALSE, TRUE, FALSE…
$ Climbing <lgl> FALSE, FALSE, FALSE, FALS…
$ Eating <lgl> FALSE, FALSE, FALSE, TRUE…
$ Foraging <lgl> FALSE, FALSE, FALSE, TRUE…
$ `Other Activities` <chr> NA, NA, NA, NA, NA, NA, N…
$ Kuks <lgl> FALSE, FALSE, FALSE, FALS…
$ Quaas <lgl> FALSE, FALSE, FALSE, FALS…
$ Moans <lgl> FALSE, FALSE, FALSE, FALS…
$ `Tail flags` <lgl> FALSE, FALSE, FALSE, FALS…
$ `Tail twitches` <lgl> FALSE, FALSE, FALSE, FALS…
$ Approaches <lgl> FALSE, FALSE, FALSE, FALS…
$ Indifferent <lgl> FALSE, FALSE, FALSE, FALS…
$ `Runs from` <lgl> FALSE, FALSE, FALSE, TRUE…
$ `Other Interactions` <chr> NA, NA, NA, NA, NA, NA, N…
$ `Lat/Long` <chr> "POINT (-73.9561344937861…
Data 2
Introduction and data
Identify the source of the data.
The source of the data comes from the FBI’s Crime Data Explorer: https://www.fbi.gov/how-we-can-help-you/more-fbi-services-and-information/ucr
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
It was collected by the Uniform Crime Reporting Program within the FBI. The data we found was collected in 2020 and 2021 and includes approximately 44,925 observations for criminal acts and 127,077 observations for offenses.
Write a brief description of the observations.
There are multiple separate csv files within the downloaded 2020 and 2021 crime reports that include variety of variables such as criminal_id, offense_type, incident_id, location_id, offense_name, offense_code, offense_category, criminal_acts, and so on.
Research question
- A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
- How is the frequency of a criminal act affected by the type of offense and the location?
- A description of the research topic along with a concise statement of your hypotheses on this topic.
- The question will help us understand whether certain types of crimes are more prevalent than other crimes and whether the location that they take place also differs. My hypothesis is that burglary or theft crimes will take place more in public areas whereas pornography, rape, child abuse, and murder would take place predominantly in someone’s home or other private area.
- Identify the types of variables in your research question. Categorical? Quantitative?
- My research question would involve mostly quantitative variables including criminal act type, offense type, and location_id.
Glimpse of data
# add code here
library(tidyverse)
ca_type <- read_csv("data/NIBRS_CRIMINAL_ACT_TYPE.csv")Rows: 15 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): criminal_act_code, criminal_act_name, criminal_act_desc
dbl (1): criminal_act_id
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(ca_type)Rows: 15
Columns: 4
$ criminal_act_id <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
$ criminal_act_code <chr> "B", "C", "D", "E", "O", "P", "T", "U", "N", "G", "J…
$ criminal_act_name <chr> "Buying/Receiving", "Cultivating/Manufacturing/Publi…
$ criminal_act_desc <chr> "Buying/Receiving", "Cultivating/Manufacturing/Publi…
ca <- read_csv("data/NIBRS_CRIMINAL_ACT.csv")Rows: 44924 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (3): data_year, criminal_act_id, offense_id
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(ca)Rows: 44,924
Columns: 3
$ data_year <dbl> 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, …
$ criminal_act_id <dbl> 6, 9, 9, 9, 9, 9, 9, 6, 9, 8, 9, 9, 9, 9, 6, 9, 6, 9, …
$ offense_id <dbl> 172858527, 172859382, 172858531, 172858532, 172858879,…
off_type <- read_csv("data/NIBRS_OFFENSE_TYPE.csv")Rows: 86 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): offense_code, offense_name, crime_against, hc_code, offense_categor...
lgl (2): ct_flag, hc_flag
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(off_type)Rows: 86
Columns: 8
$ offense_code <chr> "09A", "09B", "09C", "11A", "11B", "11C", "11D",…
$ offense_name <chr> "Murder and Nonnegligent Manslaughter", "Neglige…
$ crime_against <chr> "Person", "Person", "Person", "Person", "Person"…
$ ct_flag <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,…
$ hc_flag <lgl> TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE,…
$ hc_code <chr> "01", NA, NA, "02", "02", "02", "02", "03", "04"…
$ offense_category_name <chr> "Homicide Offenses", "Homicide Offenses", "Homic…
$ offense_group <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A"…
off <- read_csv("data/NIBRS_OFFENSE.csv")Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
Rows: 127076 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): offense_code, attempt_complete_flag, method_entry_code
dbl (4): data_year, offense_id, incident_id, location_id
lgl (1): num_premises_entered
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(off)Rows: 127,076
Columns: 8
$ data_year <dbl> 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, …
$ offense_id <dbl> 172858525, 172858699, 172858526, 172858527, 1728…
$ incident_id <dbl> 143919821, 143919976, 143919822, 143919823, 1439…
$ offense_code <chr> "290", "23H", "23C", "35A", "26F", "13B", "23H",…
$ attempt_complete_flag <chr> "C", "C", "C", "C", "C", "C", "A", "C", "C", "C"…
$ location_id <dbl> 35, 17, 17, 25, 35, 35, 7, 24, 35, 35, 26, 33, 3…
$ num_premises_entered <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ method_entry_code <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
Data 3
Introduction and data
Identify the source of the data.
- The source of the data is the Food and Agriculture Organization of the United States (FAO): https://www.fao.org/platform-food-loss-waste/flw-data/en
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
- The data was compiled in November 2021 from over 700 publications and reports (which includes academic studies and official reports from organizations such as the World Bank, GIZ, FAO, IFPRI) containing over 29 thousand data points.
Write a brief description of the observations.
- There are approximately 28,000 observations with variables such as country, region, commodity, year, loss percentage of a certain commodity, quantity of loss, and cause of loss.
Research question
A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
What is the scale of food loss and waste across the food supply chain, and how has this changed over time and across regions?
How does food waste affect a country’s food insecurity and nutrition and how has this changed over time and across nations?
collected originally Food and Agriculture Organization of the United Nations (FAO)
this additional data set provides information on country, region, year, food insecurity values and nutrition values
A description of the research topic along with a concise statement of your hypotheses on this topic.
The purpose of this research question is to understand the extent of food loss and waste that occurs throughout the food supply chain, from production to consumption, and to identify how this varies across different regions and time periods. By answering this question, researchers can develop a better understanding of where food waste occurs most frequently and identify potential areas for improvement in the food supply chain. Our hypothesis is that food loss has increased over time.
The purpose of this research question is to explore how food waste impacts a country’s food security and nutrition. By answering this question, researchers can identify the potential negative consequences of food waste, such as exacerbating hunger and malnutrition and can develop strategies to reduce food waste and improve food security. Our hypothesis is that there is more food waste in countries that are less food insecure and generally have better nutrition.
Identify the types of variables in your research question. Categorical? Quantitative?
The first research question would require quantitative data related to food waste and categorical data related to year and country.
The second research question requires quantitative data related to food waste, categorical data related to year and country, and categorical and quantitative data related to malnutrition and food security.
Glimpse of data
# add code here
library(tidyverse)
food_waste<- read.csv("data/food_waste.csv")
glimpse(food_waste)Rows: 27,773
Columns: 18
$ m49_code <int> 104, 104, 104, 104, 104, 104, 104, 108, 108, …
$ country <chr> "Myanmar", "Myanmar", "Myanmar", "Myanmar", "…
$ region <chr> "", "", "", "", "", "", "", "", "", "", "", "…
$ cpc_code <chr> "0142", "0142", "0142", "0142", "0142", "0142…
$ commodity <chr> "Groundnuts, excluding shelled", "Groundnuts,…
$ year <int> 2009, 2008, 2007, 2006, 2005, 2004, 2003, 202…
$ loss_percentage <dbl> 5.22, 5.43, 5.61, 5.40, 5.00, 5.00, 5.00, 3.5…
$ loss_percentage_original <chr> "5.22%", "5.43%", "5.61%", "5.4%", "5%", "5%"…
$ loss_quantity <chr> "68100", "65240", "61080", "55270", "51970", …
$ activity <chr> "", "", "", "", "", "", "", "Shelling, Thresh…
$ food_supply_stage <chr> "Whole supply chain", "Whole supply chain", "…
$ treatment <chr> "", "", "", "", "", "", "", "", "", "", "", "…
$ cause_of_loss <chr> "", "", "", "", "", "", "", "", "", "", "", "…
$ sample_size <chr> "", "", "", "", "", "", "", "", "", "", "", "…
$ method_data_collection <chr> "FAO's annual Agriculture Production Question…
$ reference <chr> "FAO Sources", "FAO Sources", "FAO Sources", …
$ url <chr> "", "", "", "", "", "", "", "https://www.aphl…
$ notes <chr> "", "", "", "", "", "", "", "", "", "", "", "…
nutrition <-read.csv("data/nutrition.csv")
glimpse(nutrition)Rows: 23,422
Columns: 15
$ Domain.Code <chr> "FS", "FS", "FS", "FS", "FS", "FS", "FS", "FS", "FS",…
$ Domain <chr> "Suite of Food Security Indicators", "Suite of Food S…
$ Area.Code..M49. <int> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,…
$ Area <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghani…
$ Element.Code <int> 6121, 6121, 6121, 6121, 6121, 6121, 6121, 6121, 6121,…
$ Element <chr> "Value", "Value", "Value", "Value", "Value", "Value",…
$ Item.Code <chr> "210041", "210041", "210041", "210041", "210041", "21…
$ Item <chr> "Prevalence of undernourishment (percent) (3-year ave…
$ Year.Code <int> 20072009, 20082010, 20092011, 20102012, 20112013, 201…
$ Year <chr> "2007-2009", "2008-2010", "2009-2011", "2010-2012", "…
$ Unit <chr> "%", "%", "%", "%", "%", "%", "%", "%", "%", "%", "%"…
$ Value <chr> "26.5", "23.3", "21.2", "20.2", "21.1", "20.7", "20.7…
$ Flag <chr> "E", "E", "E", "E", "E", "E", "E", "E", "E", "E", "E"…
$ Flag.Description <chr> "Estimated value", "Estimated value", "Estimated valu…
$ Note <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", "…