library(tidyverse)
library(readr)Project proposal
Dataset
A brief description of your dataset including its provenance, dimensions, etc. as well as the reason why you chose this dataset.
Make sure to load the data and use inline code for some of this information.
mta_art <- read_csv(
"https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-07-22/mta_art.csv",
show_col_types = FALSE
)
station_lines <- read_csv(
"https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-07-22/station_lines.csv",
show_col_types = FALSE
)glimpse(mta_art)Rows: 381
Columns: 9
$ agency <chr> "NYCT", "NYCT", "NYCT", "NYCT", "NYCT", "NYCT", "NYCT"…
$ station_name <chr> "Clark St", "125 St", "Astor Pl", "Kings Hwy", "Newkir…
$ line <chr> "2,3", "4,5,6", "6", "B,Q", "B,Q", "1", "6", "C,E", "1…
$ artist <chr> "Ray Ring", "Houston Conwill", "Milton Glaser", "Rhoda…
$ art_title <chr> "Clark Street Passage", "The Open Secret", "Untitled",…
$ art_date <dbl> 1987, 1986, 1986, 1987, 1988, 1988, 1988, 1989, 1989, …
$ art_material <chr> "Terrazzo floor tile", "Bronze - polychromed", "Porcel…
$ art_description <chr> "The first model that Brooklyn-born artist Ray Ring su…
$ art_image_link <chr> "https://new.mta.info/agency/arts-design/collection/cl…
glimpse(station_lines)Rows: 720
Columns: 3
$ agency <chr> "NYCT", "NYCT", "NYCT", "NYCT", "NYCT", "NYCT", "NYCT", "…
$ station_name <chr> "Clark St", "Clark St", "125 St", "125 St", "125 St", "As…
$ line <chr> "2", "3", "4", "5", "6", "6", "B", "Q", "B", "Q", "1", "6…
head(mta_art)# A tibble: 6 × 9
agency station_name line artist art_title art_date art_material
<chr> <chr> <chr> <chr> <chr> <dbl> <chr>
1 NYCT Clark St 2,3 Ray Ring Clark St… 1987 Terrazzo fl…
2 NYCT 125 St 4,5,6 Houston Conw… The Open… 1986 Bronze - po…
3 NYCT Astor Pl 6 Milton Glaser Untitled 1986 Porcelain e…
4 NYCT Kings Hwy B,Q Rhoda Andors Kings Hi… 1987 Porcelain E…
5 NYCT Newkirk Av B,Q David Wilson Transit … 1988 Zinc-glazed…
6 NYCT 137 St-City College 1 Steve Wood Fossils 1988 Bronze
# ℹ 2 more variables: art_description <chr>, art_image_link <chr>
head(station_lines)# A tibble: 6 × 3
agency station_name line
<chr> <chr> <chr>
1 NYCT Clark St 2
2 NYCT Clark St 3
3 NYCT 125 St 4
4 NYCT 125 St 5
5 NYCT 125 St 6
6 NYCT Astor Pl 6
The dataset we selected is the MTA Permanent Art Catalog, featured in Week 29 of TidyTuesday 2025. It documents permanent public art installations across the New York City Metropolitan Transportation Authority system. The data originates from the New York State Open Data portal and is maintained by the MTA Art & Design department.
The data consists of two relational tables (mta_art & station_lines).
mta_art table
It contains 381 artworks and 9 variables describing each installation, including artist, material, agency, station, line, description, and year of installation.
- Numerical:
art_date - Categorical:
agency,station_name,line,artist,art_tile,art_material,art_description,art_image_link
station_lines table
The table contains 723 rows and 3 variables describing which transit lines serve each station.
- Categorical:
agency,station_name,line
We chose this dataset because it allows us to explore how public art interacts with transit infrastructure. The combination of temporal, categorical, and relational structure makes it well suited for examining patterns in artistic diversity and material usage over time and across subway lines.
Questions
The two questions we want to answer:
Do stations with multiple lines feature higher artist diversity (measured by number of unique artists)?
Are certain art materials correlated with time periods, meaning did material usage evolve over time?
Analysis plan
Question 1
Do stations with multiple lines feature higher artist diversity? (The definition of diversity is defined as the number of unique artist names associated with a station. If an artwork lists multiple artists, each artist will be counted separately. Installations by the same artist at the same station will count as once.)
Variables involved: station_name, line, artist
Variables to be created:
n_lines: number of unique lines serving each station (from station_lines)artist_diversity: number of unique artists per station
External data:
None required beyond joining the provided relational tables.
Approach: First, use station_lines to calculate the number of lines serving each station. Then group mta_art by station and count the number of unique artists. Join these summaries together to create a station-level dataset. Finally, examine the relationship between n_lines and artist_diversity using a scatterplot and possibly a simple linear trend line to assess association.
Question 2
Are certain art materials correlated with time periods?
Variables involved: art_date, art_material
Variables to be created:
decade: derived from art_dateCleaned material categories if necessary (group similar materials).
Categories will be defined based on recurring keywords in the material descriptions. (such as Bronze - polychromed and Bronze - painted will both categorized as Bronze)
External data: None required.
Approach: Create a decade variable from art_date to group artworks by time period. Clean or standardize the art_material variable if similar materials appear under slightly different names. By standawrdizing, a proportional stacked bar chart (examine percentage change over time) will be created. Then count artworks by decade and material type. Visualize trends using stacked bar charts or proportional area plots to assess whether certain materials became more or less common over time. Eventually, we will fit the data into a simple linear regression model.