Project title

Proposal

library(tidyverse)
library(skimr)

Data 1

Introduction and data

  • Identify the source of the data.

The data is sourced from the New Haven Department of Corrections. https://data.ct.gov/Public-Safety/Accused-Pre-Trial-Inmates-in-Correctional-Faciltie/b674-jy6w

  • State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

The data was originally collected from prisons across New Haven, Connecticut in March 2023. It is not clear exactly how it was collected, but given that it was collected from prisons it is likely that prison officials simply pulled prisoner data from prisoners being held pre-trial.

  • Write a brief description of the observations.

The observations of prisoners held in prisons/correctional facilities pre-trial indicates that a majority of them are male, with a tendency for racial minorities to be overrepresented in pre-trial prison inmates.

Research question

  • A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
  • A description of the research topic along with a concise statement of your hypotheses on this topic.
  • Identify the types of variables in your research question. Categorical? Quantitative?

What demographic groups, if any, are statistically overrepresented in the population of prison inmates held in correctional facilities (read: prisons) before their trial? This question will analyze the oft-criticized practice of holding some inmates who are deemed a “flight risk” or who cannot pay bail in correctional facilities even before their trial has been held to convict or exonerate them. We hypothesize that racial minorities (Latino, Black), younger individuals, and men are overrepresented in the population of inmates held in prisons before their trial has completed. The primary variables of interest will be gender (categorical), age (numerical), and race (categorical).

Glimpse of data

# add code here
pretrial <- read.csv("data/pretrial.csv")
glimpse(pretrial)
Rows: 235,487
Columns: 10
$ DOWNLOAD.DATE         <chr> "05/15/2020", "05/15/2020", "05/15/2020", "05/15…
$ IDENTIFIER            <chr> "ZZHCZBZZ", "ZZHZZRLR", "ZZSRJBEE", "ZZHBJLRZ", …
$ LATEST.ADMISSION.DATE <chr> "08/16/2018", "03/28/2019", "04/03/2020", "01/15…
$ RACE                  <chr> "BLACK", "HISPANIC", "HISPANIC", "WHITE", "HISPA…
$ GENDER                <chr> "M", "M", "M", "M", "M", "M", "M", "M", "M", "M"…
$ AGE                   <int> 27, 41, 21, 36, 29, 54, 35, 55, 43, 29, 46, 22, …
$ BOND.AMOUNT           <int> 150000, 30100, 150000, 50500, 100000, 100000, 10…
$ OFFENSE               <chr> "CRIMINAL POSS OF PISTOL/REVOLVER      DF", "VIO…
$ FACILITY              <chr> "NEW HAVEN CC", "CORRIGAN CI", "CORRIGAN CI", "B…
$ DETAINER              <chr> "NONE", "NONE", "NONE", "NONE", "NONE", "NONE", …

Data 2

Introduction and data

  • Identify the source of the data.

The data is available on https://hollywoodagegap.com/, and the original data curator is Lynn Fisher. 

  • State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

It is not entirely clear how this data was originally collected, but the creator of the website and data collector is a woman named ​​Lynn Fisher. She allows online community members to make contributions to the data set on the github, if there are any films they believe are missing from the data set. 

  • Write a brief description of the observations.

Each observation is a film, and the data set shows the age gap between the characters as well as their genders and the year the film was made. 

Research question

  • A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
  • A description of the research topic along with a concise statement of your hypotheses on this topic.
  • Identify the types of variables in your research question. Categorical? Quantitative?

How has the concept of age gaps in Hollywood movies been shaped by the era in which the film was made and the genders of the characters that participate in them? This research question will analyze the trends of Hollywood movie romantic age gaps throughout the different eras of cinema, and how gender’s role in said age gaps have changed over time. We hypothesize that as society has progressed and as it has become less socially acceptable to have relationships with such unequal power balances, the average age gap between film characters has decreased. We also hypothesize that the number of older-women-younger-man as well as non-straight couples will also increase over time. The primary variables of interest will be gender (categorical), year of creation (quantitative), and age game (quantitative). 

Glimpse of data

# add code here
movies <- read.csv("data/movies.csv")
glimpse(movies)
Rows: 1,161
Columns: 12
$ Movie.Name        <chr> "Harold and Maude", "Venus", "The Quiet American", "…
$ Release.Year      <int> 1971, 2006, 2002, 1998, 2010, 1992, 2009, 1999, 1992…
$ Director          <chr> "Hal Ashby", "Roger Michell", "Phillip Noyce", "Joel…
$ Age.Difference    <int> 52, 50, 49, 45, 43, 42, 40, 39, 38, 38, 36, 36, 35, …
$ Actor.1.Name      <chr> "Bud Cort", "Peter O'Toole", "Michael Caine", "David…
$ Actor.1.Gender    <chr> "man", "man", "man", "man", "man", "man", "man", "ma…
$ Actor.1.Birthdate <chr> "1948-03-29", "1932-08-02", "1933-03-14", "1930-09-1…
$ Actor.1.Age       <int> 23, 74, 69, 68, 81, 59, 62, 69, 57, 77, 59, 56, 65, …
$ Actor.2.Name      <chr> "Ruth Gordon", "Jodie Whittaker", "Do Thi Hai Yen", …
$ Actor.2.Gender    <chr> "woman", "woman", "woman", "woman", "man", "woman", …
$ Actor.2.Birthdate <chr> "1896-10-30", "1982-06-03", "1982-10-01", "1975-11-0…
$ Actor.2.Age       <int> 75, 24, 20, 23, 38, 17, 22, 30, 19, 39, 23, 20, 30, …

Data 3

Introduction and data

  • Identify the source of the data.

  • State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

  • Write a brief description of the observations.

Research question

  • A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
  • A description of the research topic along with a concise statement of your hypotheses on this topic.
  • Identify the types of variables in your research question. Categorical? Quantitative?

Glimpse of data

# add code here