Project proposal

Author

Trusting Bunny

library(tidyverse)
library(readr)

Dataset

A brief description of your dataset including its provenance, dimensions, etc. as well as the reason why you chose this dataset.

Make sure to load the data and use inline code for some of this information.

mta_art <- read_csv(
  "https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-07-22/mta_art.csv",
  show_col_types = FALSE
)

station_lines <- read_csv(
  "https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-07-22/station_lines.csv",
  show_col_types = FALSE
)
glimpse(mta_art)
Rows: 381
Columns: 9
$ agency          <chr> "NYCT", "NYCT", "NYCT", "NYCT", "NYCT", "NYCT", "NYCT"…
$ station_name    <chr> "Clark St", "125 St", "Astor Pl", "Kings Hwy", "Newkir…
$ line            <chr> "2,3", "4,5,6", "6", "B,Q", "B,Q", "1", "6", "C,E", "1…
$ artist          <chr> "Ray Ring", "Houston Conwill", "Milton Glaser", "Rhoda…
$ art_title       <chr> "Clark Street Passage", "The Open Secret", "Untitled",…
$ art_date        <dbl> 1987, 1986, 1986, 1987, 1988, 1988, 1988, 1989, 1989, …
$ art_material    <chr> "Terrazzo floor tile", "Bronze - polychromed", "Porcel…
$ art_description <chr> "The first model that Brooklyn-born artist Ray Ring su…
$ art_image_link  <chr> "https://new.mta.info/agency/arts-design/collection/cl…
glimpse(station_lines)
Rows: 720
Columns: 3
$ agency       <chr> "NYCT", "NYCT", "NYCT", "NYCT", "NYCT", "NYCT", "NYCT", "…
$ station_name <chr> "Clark St", "Clark St", "125 St", "125 St", "125 St", "As…
$ line         <chr> "2", "3", "4", "5", "6", "6", "B", "Q", "B", "Q", "1", "6…
head(mta_art)
# A tibble: 6 × 9
  agency station_name        line  artist        art_title art_date art_material
  <chr>  <chr>               <chr> <chr>         <chr>        <dbl> <chr>       
1 NYCT   Clark St            2,3   Ray Ring      Clark St…     1987 Terrazzo fl…
2 NYCT   125 St              4,5,6 Houston Conw… The Open…     1986 Bronze - po…
3 NYCT   Astor Pl            6     Milton Glaser Untitled      1986 Porcelain e…
4 NYCT   Kings Hwy           B,Q   Rhoda Andors  Kings Hi…     1987 Porcelain E…
5 NYCT   Newkirk Av          B,Q   David Wilson  Transit …     1988 Zinc-glazed…
6 NYCT   137 St-City College 1     Steve Wood    Fossils       1988 Bronze      
# ℹ 2 more variables: art_description <chr>, art_image_link <chr>
head(station_lines)
# A tibble: 6 × 3
  agency station_name line 
  <chr>  <chr>        <chr>
1 NYCT   Clark St     2    
2 NYCT   Clark St     3    
3 NYCT   125 St       4    
4 NYCT   125 St       5    
5 NYCT   125 St       6    
6 NYCT   Astor Pl     6    

The dataset we selected is the MTA Permanent Art Catalog, featured in Week 29 of TidyTuesday 2025. It documents permanent public art installations across the New York City Metropolitan Transportation Authority system. The data originates from the New York State Open Data portal and is maintained by the MTA Art & Design department.

The data consists of two relational tables (mta_art & station_lines).

mta_art table

It contains 381 artworks and 9 variables describing each installation, including artist, material, agency, station, line, description, and year of installation.

  • Numerical: art_date
  • Categorical: agency, station_name, line, artist, art_tile, art_material, art_description, art_image_link

station_lines table

The table contains 723 rows and 3 variables describing which transit lines serve each station.

  • Categorical: agency, station_name, line

We chose this dataset because it allows us to explore how public art interacts with transit infrastructure. The combination of temporal, categorical, and relational structure makes it well suited for examining patterns in artistic diversity and material usage over time and across subway lines.

Questions

The two questions we want to answer:

  1. Do stations with multiple lines feature higher artist diversity (measured by number of unique artists)?

  2. Are certain art materials correlated with time periods, meaning did material usage evolve over time?

Analysis plan

Question 1

Do stations with multiple lines feature higher artist diversity? (The definition of diversity is defined as the number of unique artist names associated with a station. If an artwork lists multiple artists, each artist will be counted separately. Installations by the same artist at the same station will count as once.)

Variables involved: station_name, line, artist

Variables to be created:

  • n_lines: number of unique lines serving each station (from station_lines)

  • artist_diversity: number of unique artists per station

External data:

None required beyond joining the provided relational tables.

Approach: First, use station_lines to calculate the number of lines serving each station. Then group mta_art by station and count the number of unique artists. Join these summaries together to create a station-level dataset. Finally, examine the relationship between n_lines and artist_diversity using a scatterplot and possibly a simple linear trend line to assess association.

Question 2

Are certain art materials correlated with time periods?

Variables involved: art_date, art_material

Variables to be created:

  • decade: derived from art_date

  • Cleaned material categories if necessary (group similar materials).

  • Categories will be defined based on recurring keywords in the material descriptions. (such as Bronze - polychromed and Bronze - painted will both categorized as Bronze)

External data: None required.

Approach: Create a decade variable from art_date to group artworks by time period. Clean or standardize the art_material variable if similar materials appear under slightly different names. By standawrdizing, a proportional stacked bar chart (examine percentage change over time) will be created. Then count artworks by decade and material type. Visualize trends using stacked bar charts or proportional area plots to assess whether certain materials became more or less common over time. Eventually, we will fit the data into a simple linear regression model.