Project proposal

Author

yellow-wallaby (Tammy Zhang tz332, Zhuoya Li zl928, Florence Li hl2472, Joey Kassenoff jdk333)

library(tidyverse)

Dataset

Our dataset is the 2018 Central Park Squirrel Census, produced by The Squirrel Census - a multimedia project consisting of hundreds of volunteers which gather to record observations of Eastern gray squirrels (Sciurus carolinensis). The dataset has 31 columns and 3023 rows, with each row corresponding to one sighting; fields recorded include the sighting time and geographical coordinates, the squirrel’s age, the primary and highlight fur colors, and a range of behavior flags.

squirrel_data <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-05-23/squirrel_data.csv')
Rows: 3023 Columns: 31
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (14): Unique Squirrel ID, Hectare, Shift, Age, Primary Fur Color, Highli...
dbl  (4): X, Y, Date, Hectare Squirrel Number
lgl (13): Running, Chasing, Climbing, Eating, Foraging, Kuks, Quaas, Moans, ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Motivation

A primary intrigue of the Squirrel Census dataset is the novelty of the topic; an analysis of Central Park squirrels is an interesting and unique way to apply geospatial visualization techniques to a fresh problem. Novelty isn’t all that the squirrel census offers, however. We believe that the data could actually prove useful to certain third parties. For example, ecologists might have an interest in the presence of squirrels over time to gauge the health of the species. Similarly, the City of New York could use the data’s insights to determine the degree to which Central Park is a place where wildlife can coexist with human activity. Therefore, the breadth of variables on squirrel location and activity can serve as a fun—and surprisingly useful—exercise in data visualization.

Questions

  • How does the time of day (Shift) influence squirrel activities and interactions?

  • Is there a relationship between squirrel fur color and their location or activities?

Analysis plan

Analysis Plan for Question 1:

How does the time of day (Shift) influence squirrel activities and interactions?

Variables Involved: Primary Variables: Shift (AM/PM), Running, Chasing, Climbing, Eating, Foraging, Approaches, Indifferent, Runs from. Secondary Variables: Date, Unique Squirrel ID (to ensure unique instances are analyzed).

Variables to be Created: Activity Index: A composite score could be created to quantify activity levels by assigning points to each observed activity (e.g., Running = 1 point, Eating = 1 point) and summing them for each sighting. This would help in comparing overall activity levels between AM and PM shifts.

Interaction Score: Similarly, a score representing the degree of interaction with humans (Approaches = positive score, Indifferent = neutral score, Runs from = negative score) could be created to assess human-squirrel interaction dynamics across shifts.

Analysis Steps:

Data Cleaning and Preparation: -Ensure each squirrel record is unique by checking “Unique Squirrel ID” and remove duplicates. -Check and handle missing or inconsistent entries in activity and interaction variables, excluding incomplete records.

Descriptive Statistics: Calculate how often each squirrel activity and interaction occurs, separating morning (AM) and afternoon (PM) shifts.

Activity and Interaction Analysis: Utilize the Activity Index and Interaction Score to compare the level of squirrel activities and their interactions with humans between the two shifts. This could involve statistical tests to assess significant differences (e.g., t-tests or non-parametric equivalents if assumptions are not met).

Temporal Analysis (optional): Explore if certain times of the year (using Date) show heightened activities or interactions, potentially indicating seasonal behavior changes.

External Data: No external data is required for this analysis plan.

Analysis Plan for Question 2:

Is there a relationship between squirrel fur color and their location or activities?

Variables Involved: Primary Variables: Primary Fur Color, Location (Ground Plane, Above Ground), Eating, Foraging, Climbing.

Secondary Variables: Date, Unique Squirrel ID (to ensure unique instances are analyzed).

Variables to be Created: Location Preference Index: A metric to evaluate preference for ground vs. above ground locations, possibly based on the proportion of sightings in each location type per fur color.

Activity Profile: A profile for each fur color category summarizing the frequency of each activity (Eating, Foraging, Climbing).

Analysis Steps:

Data Cleaning and Preparation: Check each squirrel sighting is listed only once by using the “Unique Squirrel ID” and clean data for the relevant variables.

Descriptive Statistics: Assess the distribution of fur colors, locations, and activities within the dataset.

Fur Color and Location Analysis: Examine if there’s a statistical relationship between fur color and preferred location using chi-square tests or logistic regression models, controlling for potential confounders like date (seasonality effects).

Fur Color and Activity Relationship: Analyze the relationship between fur color and specific activities using statistical tests (e.g., chi-square tests for categorical comparisons or ANOVA for comparing activity indices across fur colors).

Multivariate Analysis: use multivariate analysis techniques (like logistic or multiple regression) to assess the impact of fur color on location preference and activities, factoring in additional variables such as the date.

External Data: No external data is required for this analysis plan.