library(tidyverse)Project proposal
pixar <- read_csv("data/pixar_films.csv")
response <- read_csv("data/public_response.csv")
box_office <- read_csv("data/box_office.csv")
head(pixar)# A tibble: 6 × 5
number film release_date run_time film_rating
<dbl> <chr> <date> <dbl> <chr>
1 1 Toy Story 1995-11-22 81 G
2 2 A Bug's Life 1998-11-25 95 G
3 3 Toy Story 2 1999-11-24 92 G
4 4 Monsters, Inc. 2001-11-02 92 G
5 5 Finding Nemo 2003-05-30 100 G
6 6 The Incredibles 2004-11-05 115 PG
head(response)# A tibble: 6 × 5
film rotten_tomatoes metacritic cinema_score critics_choice
<chr> <dbl> <dbl> <chr> <dbl>
1 Toy Story 100 95 A NA
2 A Bug's Life 92 77 A NA
3 Toy Story 2 100 88 A+ 100
4 Monsters, Inc. 96 79 A+ 92
5 Finding Nemo 99 90 A+ 97
6 The Incredibles 97 90 A+ 88
head(box_office)# A tibble: 6 × 5
film budget box_office_us_canada box_office_other box_office_worldwide
<chr> <dbl> <dbl> <dbl> <dbl>
1 Toy Story 3 e7 223225679 171210907 394436586
2 A Bug's Life 1.20e8 162798565 200460294 363258859
3 Toy Story 2 9 e7 245852179 265506097 511358276
4 Monsters, I… 1.15e8 255873250 272900000 528773250
5 Finding Nemo 9.40e7 339714978 531300000 871014978
6 The Incredi… 9.2 e7 261441092 370001000 631442092
Dataset
Description of the Data
We use three datasets from the pixarfilms package: pixar_films.csv, public_response.csv, and box_office.csv. In each dataset, each row represents a single Pixar film.
pixar_films.csv
This dataset contains general production information about each film.
Variables and their meaning:
number- a sequential identification number assigned to each film
film- the title of the film
release date- the date the film was releasedrun_time- runtime of film in minutesfilm_rating- MPAA rating (G, PG, PG-13, R)
public_response.csv
This dataset contains measures of critic and audience reception.
Variables and their meaning:
film- the title of the film
rotten_tomatoes- Rotten Tomatoes score (percentage out of 100)
metacritic- Metacritic score (out of 100)
cinema_score- CinemaScore grade (letter based scale such as A+, A, A−, etc.)
critics_choice- Critics’ Choice score (out of 100)
box_office.csv
This dataset contains financial performance information for each film.
Variables and their meaning:
film- the title of the film
budget- production budget (in USD)
box_office_us_canada- box office revenue earned in the United States and Canada (in USD)
box_office_other- box office revenue earned outside the United States and Canada (in USD)
box_office_worldwide- total worldwide box office revenue (in USD)
The reason why we choose it
We chose to focus on Pixar films because Pixar is a highly influential studio within the animation industry and has played a major role in shaping pop culture. Its films are widely recognized not only as children’s movies, but as stories that resonate with audiences across age groups, making them especially interesting to study in terms of both critical reception and financial performance.
From an analytical perspective, Pixar provides a relatively consistent point of comparison: its films fall within a similar genre and target audience, which allows us to focus more clearly on trends rather than differences driven by fundamentally different types of movies. Additionally, Pixar has released films regularly over a long time span, making the dataset well suited for examining changes over time in metrics such as box office performance and audience or critic reception.
On a more personal level, we are genuinely familiar with and interested in Pixar’s films, which motivates us to engage more deeply with the data and to think critically about how these movies have evolved and been received over time.
Questions
The two questions you want to answer.
The first question we want to answer is “How do different public rating systems (like Rotten Tomatoes, IMDb, etc.) evaluate Pixar films, and do they differ systematically?”
We are interested in this question because many viewers rely on these rating systems when deciding whether to watch a movie, yet each system uses different scoring methods and may represent different audiences or evaluative criteria. Understanding whether these systems tend to agree or diverge can reveal whether “highly rated” films are universally perceived as strong, or whether different platforms consistently score films higher or lower than others. This question is significant because it helps clarify how cultural reception is measured and whether ratings across platforms can be interpreted interchangeably.
The second question is “What factors correlate most strongly with the box office performance of Pixar films?”
We are interested in this question because financial performance is often used as a proxy for a film’s success, yet revenue may be influenced by multiple factors beyond audience enjoyment, such as production budget, release timing, or runtime. By examining how financial outcomes relate to ratings and production characteristics, we can better understand whether commercial success aligns with critical reception or whether other structural factors play a stronger role. This question is significant because it sheds light on the broader relationship between artistic evaluation and market performance within a major animation studio.
Analysis plan
A plan for answering each of the questions including the variables involved, variables to be created (if any), external data to be merged in (if any).
Question 1
To answer the first question, we would look at the public_response dataset, because it contains all relevant rating variables for Pixar films. The relevant variables that we would inspect further are rotten_tomatoes, metacritic, cinema_score, and critics_choice. Because these rating systems use different scales (percentages and letter grades), we would first convert them to a common numeric scale. We could do this by either mapping each letter grade to a percent (like A+ = 100, A = 95, A- = 90, etc.) or we could standardize each rating using z-scores, or a standard deviation scale. Standardizing with z-scores may be preferable, as mapping all A+ grades to 100, for example, would compress variation, especially given that many Pixar films receive top ratings. After standardizing, we would compute the mean standardized score for each rating site to assess whether some platforms systematically rate films higher or lower than others. We could then plot this information as a grouped bar chart that displays the standardized ratings from each platform for each movie, so that different ratings for each film are easy to compare amongst each other. To highlight systematic differences, we would use another layer to overlay a point representing each site’s overall average rating. This allows us to identifying consistent differences between rating systems along with more specific differences per movie.
Question 2
To answer the second question, we will analyze both the box_office and pixar_film datasets to examine the relationship between different variables and the financial success of Pixar films. Specifically, we will look at the following variables:
release_year: The release_year variable will help us analyze whether newer films tend to perform better or worse than older ones.box_office_worldwide: We can use this variable to actually view the financial success of each filmbox_office_us_canadaandbox_office_other: These region specific box office variables may reveal trends in financial performance across different geographic marketsfilm_rating: wether the film is rated PG or G may impact box office performancerun_time: The runtime variable could be explored to see if shorter or longer films tend to perform better in terms of box office earnings
In terms of analysis, we would start by merging the pixar_films and box_office datasets by joining on movie title (as both datasets contain this column). Then we would normalize the box office data to account for inflation or changes in the global market over time and calculate correlations between the box office earnings and other variables described above such as rating, release year, and runtime. Also, we could model box office performance using linear regression to identify which variables are significant predictors of financial success. Finally, we could plot the relationship between these factors and box office earnings, possibly using a scatterplot matrix or correlation heatmap to visually identify significant relationships.