Project proposal

Author

Proud Corgi

library(tidyverse)
pixar <- read_csv("data/pixar_films.csv")
response <- read_csv("data/public_response.csv")
box_office <- read_csv("data/box_office.csv")

head(pixar)
# A tibble: 6 × 5
  number film            release_date run_time film_rating
   <dbl> <chr>           <date>          <dbl> <chr>      
1      1 Toy Story       1995-11-22         81 G          
2      2 A Bug's Life    1998-11-25         95 G          
3      3 Toy Story 2     1999-11-24         92 G          
4      4 Monsters, Inc.  2001-11-02         92 G          
5      5 Finding Nemo    2003-05-30        100 G          
6      6 The Incredibles 2004-11-05        115 PG         
head(response)
# A tibble: 6 × 5
  film            rotten_tomatoes metacritic cinema_score critics_choice
  <chr>                     <dbl>      <dbl> <chr>                 <dbl>
1 Toy Story                   100         95 A                        NA
2 A Bug's Life                 92         77 A                        NA
3 Toy Story 2                 100         88 A+                      100
4 Monsters, Inc.               96         79 A+                       92
5 Finding Nemo                 99         90 A+                       97
6 The Incredibles              97         90 A+                       88
head(box_office)
# A tibble: 6 × 5
  film         budget box_office_us_canada box_office_other box_office_worldwide
  <chr>         <dbl>                <dbl>            <dbl>                <dbl>
1 Toy Story    3   e7            223225679        171210907            394436586
2 A Bug's Life 1.20e8            162798565        200460294            363258859
3 Toy Story 2  9   e7            245852179        265506097            511358276
4 Monsters, I… 1.15e8            255873250        272900000            528773250
5 Finding Nemo 9.40e7            339714978        531300000            871014978
6 The Incredi… 9.2 e7            261441092        370001000            631442092

Dataset

Description of the Data

We use three datasets from the pixarfilms package: pixar_films.csv, public_response.csv, and box_office.csv. In each dataset, each row represents a single Pixar film.

pixar_films.csv

This dataset contains general production information about each film.

Variables and their meaning:

  • number - a sequential identification number assigned to each film
  • film - the title of the film
  • release date - the date the film was released
  • run_time - runtime of film in minutes
  • film_rating - MPAA rating (G, PG, PG-13, R)

public_response.csv

This dataset contains measures of critic and audience reception.

Variables and their meaning:

  • film - the title of the film
  • rotten_tomatoes - Rotten Tomatoes score (percentage out of 100)
  • metacritic - Metacritic score (out of 100)
  • cinema_score - CinemaScore grade (letter based scale such as A+, A, A−, etc.)
  • critics_choice - Critics’ Choice score (out of 100)

box_office.csv

This dataset contains financial performance information for each film.

Variables and their meaning:

  • film - the title of the film
  • budget - production budget (in USD)
  • box_office_us_canada - box office revenue earned in the United States and Canada (in USD)
  • box_office_other - box office revenue earned outside the United States and Canada (in USD)
  • box_office_worldwide - total worldwide box office revenue (in USD)

The reason why we choose it

We chose to focus on Pixar films because Pixar is a highly influential studio within the animation industry and has played a major role in shaping pop culture. Its films are widely recognized not only as children’s movies, but as stories that resonate with audiences across age groups, making them especially interesting to study in terms of both critical reception and financial performance.

From an analytical perspective, Pixar provides a relatively consistent point of comparison: its films fall within a similar genre and target audience, which allows us to focus more clearly on trends rather than differences driven by fundamentally different types of movies. Additionally, Pixar has released films regularly over a long time span, making the dataset well suited for examining changes over time in metrics such as box office performance and audience or critic reception.

On a more personal level, we are genuinely familiar with and interested in Pixar’s films, which motivates us to engage more deeply with the data and to think critically about how these movies have evolved and been received over time.

Questions

The two questions you want to answer.

The first question we want to answer is “How do different public rating systems (like Rotten Tomatoes, IMDb, etc.) evaluate Pixar films, and do they differ systematically?”

We are interested in this question because many viewers rely on these rating systems when deciding whether to watch a movie, yet each system uses different scoring methods and may represent different audiences or evaluative criteria. Understanding whether these systems tend to agree or diverge can reveal whether “highly rated” films are universally perceived as strong, or whether different platforms consistently score films higher or lower than others. This question is significant because it helps clarify how cultural reception is measured and whether ratings across platforms can be interpreted interchangeably.

The second question is “What factors correlate most strongly with the box office performance of Pixar films?”

We are interested in this question because financial performance is often used as a proxy for a film’s success, yet revenue may be influenced by multiple factors beyond audience enjoyment, such as production budget, release timing, or runtime. By examining how financial outcomes relate to ratings and production characteristics, we can better understand whether commercial success aligns with critical reception or whether other structural factors play a stronger role. This question is significant because it sheds light on the broader relationship between artistic evaluation and market performance within a major animation studio.

Analysis plan

A plan for answering each of the questions including the variables involved, variables to be created (if any), external data to be merged in (if any).

Question 1

To answer the first question, we would look at the public_response dataset, because it contains all relevant rating variables for Pixar films. The relevant variables that we would inspect further are rotten_tomatoes, metacritic, cinema_score, and critics_choice. Because these rating systems use different scales (percentages and letter grades), we would first convert them to a common numeric scale. We could do this by either mapping each letter grade to a percent (like A+ = 100, A = 95, A- = 90, etc.) or we could standardize each rating using z-scores, or a standard deviation scale. Standardizing with z-scores may be preferable, as mapping all A+ grades to 100, for example, would compress variation, especially given that many Pixar films receive top ratings. After standardizing, we would compute the mean standardized score for each rating site to assess whether some platforms systematically rate films higher or lower than others. We could then plot this information as a grouped bar chart that displays the standardized ratings from each platform for each movie, so that different ratings for each film are easy to compare amongst each other. To highlight systematic differences, we would use another layer to overlay a point representing each site’s overall average rating. This allows us to identifying consistent differences between rating systems along with more specific differences per movie.

Question 2

To answer the second question, we will analyze both the box_office and pixar_film datasets to examine the relationship between different variables and the financial success of Pixar films. Specifically, we will look at the following variables:

  • release_year: The release_year variable will help us analyze whether newer films tend to perform better or worse than older ones.
  • box_office_worldwide: We can use this variable to actually view the financial success of each film
  • box_office_us_canada and box_office_other: These region specific box office variables may reveal trends in financial performance across different geographic markets
  • film_rating: wether the film is rated PG or G may impact box office performance
  • run_time: The runtime variable could be explored to see if shorter or longer films tend to perform better in terms of box office earnings

In terms of analysis, we would start by merging the pixar_films and box_office datasets by joining on movie title (as both datasets contain this column). Then we would normalize the box office data to account for inflation or changes in the global market over time and calculate correlations between the box office earnings and other variables described above such as rating, release year, and runtime. Also, we could model box office performance using linear regression to identify which variables are significant predictors of financial success. Finally, we could plot the relationship between these factors and box office earnings, possibly using a scatterplot matrix or correlation heatmap to visually identify significant relationships.