Project title
Introduction
(1-2 paragraphs) Brief introduction to the dataset. You may repeat some of the information about the dataset provided in the introduction to the dataset on the TidyTuesday repository, paraphrasing on your own terms. Imagine that your project is a standalone document and the grader has no prior knowledge of the dataset.
We use three datasets from the pixarfilms package: pixar_films.csv, public_response.csv, and box_office.csv. In each dataset, each row represents a single Pixar film.
pixar_films.csv
This dataset contains general production information about each film.
Variables and their meaning:
number- a sequential identification number assigned to each film
film- the title of the film
release date- the date the film was releasedrun_time- runtime of film in minutesfilm_rating- MPAA rating (G, PG, PG-13, R)
public_response.csv
This dataset contains measures of critic and audience reception.
Variables and their meaning:
film- the title of the film
rotten_tomatoes- Rotten Tomatoes score (percentage out of 100)
metacritic- Metacritic score (out of 100)
cinema_score- CinemaScore grade (letter based scale such as A+, A, A−, etc.)
critics_choice- Critics’ Choice score (out of 100)
box_office.csv
This dataset contains financial performance information for each film.
Variables and their meaning:
film- the title of the film
budget- production budget (in USD)
box_office_us_canada- box office revenue earned in the United States and Canada (in USD)
box_office_other- box office revenue earned outside the United States and Canada (in USD)
box_office_worldwide- total worldwide box office revenue (in USD)
The reason why we choose it
We chose to focus on Pixar films because Pixar is a highly influential studio within the animation industry and has played a major role in shaping pop culture. Its films are widely recognized not only as children’s movies, but as stories that resonate with audiences across age groups, making them especially interesting to study in terms of both critical reception and financial performance.
From an analytical perspective, Pixar provides a relatively consistent point of comparison: its films fall within a similar genre and target audience, which allows us to focus more clearly on trends rather than differences driven by fundamentally different types of movies. Additionally, Pixar has released films regularly over a long time span, making the dataset well suited for examining changes over time in metrics such as box office performance and audience or critic reception.
On a more personal level, we are genuinely familiar with and interested in Pixar’s films, which motivates us to engage more deeply with the data and to think critically about how these movies have evolved and been received over time.
Question 1 <- How do different public rating systems (like Rotten Tomatoes, IMDb, etc.) evaluate Pixar films, and do they differ systematically?
Introduction
(1-2 paragraphs) Introduction to the question and what parts of the dataset are necessary to answer the question. Also discuss why you’re interested in this question.
We are interested in this question because many viewers rely on these rating systems when deciding whether to watch a movie, yet each system uses different scoring methods and may represent different audiences or evaluative criteria. Understanding whether these systems tend to agree or diverge can reveal whether “highly rated” films are universally perceived as strong, or whether different platforms consistently score films higher or lower than others. This question is significant because it helps clarify how cultural reception is measured and whether ratings across platforms can be interpreted interchangeably.
Approach
(1-2 paragraphs) Describe what types of plots you are going to make to address your question. For each plot, provide a clear explanation as to why this plot (e.g. boxplot, barplot, histogram, etc.) is best for providing the information you are asking about. The two plots should be of different types, and at least one of the two plots needs to use either color mapping or facets.
Analysis
(2-3 code blocks, 2 figures, text/code comments as needed) In this section, provide the code that generates your plots. Use scale functions to provide nice axis labels and guides. You are welcome to use theme functions to customize the appearance of your plot, but you are not required to do so. All plots must be made with {ggplot2}. Do not use base R or {lattice} plotting functions.
Discussion
(1-3 paragraphs) In the Discussion section, interpret the results of your analysis. Identify any trends revealed (or not revealed) by the plots. Speculate about why the data looks the way it does.
Question 2 <- What factors correlate most strongly with the box office performance of Pixar films?
Introduction
(1-2 paragraphs) Introduction to the question and what parts of the dataset are necessary to answer the question. Also discuss why you’re interested in this question.
We are interested in this question because financial performance is often used as a proxy for a film’s success, yet revenue may be influenced by multiple factors beyond audience enjoyment, such as production budget, release timing, or runtime. By examining how financial outcomes relate to ratings and production characteristics, we can better understand whether commercial success aligns with critical reception or whether other structural factors play a stronger role. This question is significant because it sheds light on the broader relationship between artistic evaluation and market performance within a major animation studio.
Approach
(1-2 paragraphs) Describe what types of plots you are going to make to address your question. For each plot, provide a clear explanation as to why this plot (e.g. boxplot, barplot, histogram, etc.) is best for providing the information you are asking about. The two plots should be of different types, and at least one of the two plots needs to use either color mapping or facets.
Analysis
(2-3 code blocks, 2 figures, text/code comments as needed) In this section, provide the code that generates your plots. Use scale functions to provide nice axis labels and guides. You are welcome to use theme functions to customize the appearance of your plot, but you are not required to do so. All plots must be made with {ggplot2}. Do not use base R or {lattice} plotting functions.
Discussion
(1-3 paragraphs) In the Discussion section, interpret the results of your analysis. Identify any trends revealed (or not revealed) by the plots. Speculate about why the data looks the way it does.
Presentation
Our presentation can be found here.
Data
Include a citation for your data here. See https://data.research.cornell.edu/data-management/storing-and-managing/data-citation/ for guidance on proper citation for datasets. If you got your data off the web, make sure to note the retrieval date.
References
List any references here. You should, at a minimum, list your data source.