iMDB Ratings


Elegant Evee
Diya Bansal, Sarah Young, Cyrus Irani, Tairan Zhang


May 5, 2023

Introduce the topic and motivation

Our project is focused on exploring the relationship between different variables amongst the 100 most popular movies (from iMDB) for each year from 2003-2022. The research questions we are exploring here are:

  1. Which release dates see the greatest profits?
  2. How strong is the relationship between a film’s release year and income versus its iMDB rating?

Introduce the data

  • 100 most popular movies for each year from 2003-2022

  • 1989 (why not 2000?) rows and 13 columns

  • Movie title, iMDB rating, year of release, month of release, budget, income, etc.

  • Each row represents a unique film that has all the above data on iMBD

Q1 – Highlights from EDA

Q1 – Highlights from EDA

Q1 – Inference/modeling/other analysis

# A tibble: 1 × 1
1       0

\[ p-value=0<0.05=\alpha \]

We reject the null hypothesis in favor of the alternate hypothesis.

Therefore, the data provides evidence that the there is a difference in profits between favorable and unfavorable months.

Q1 – Inference/modeling/other analysis

# A tibble: 1 × 2
   lower_ci   upper_ci
      <dbl>      <dbl>
1 16356012. 135972444.

We are 95% confident that the true mean profit of favorable months is between ~16 million USD to ~136 million USD higher than the profit for unfavorable months, on average.

If we were to simulate this again, at least 95% of these intervals will contain the true mean.

Q1 – Conclusions/Future Work

Observation: Movies released in favorable months (May, June, July, December) have a higher average profit then those released in unfavorable months (all other months).

Inference: We can expect movies released in May, June, July, Dec to earn a higher profit (on average) as opposed to movies that released in other months.

Support: Confidence interval and p-value (which showed us that there is a statistically significant difference in profits between favorable and unfavorable release months).

  • Small drop around 2020 (can be attributed to COVID-19)
  • Future work can do a detailed analysis of profits for release years to find more trends and explore the cause for this dips

Q2 – Highlights from EDA

Q2 – Inference/modeling/other analysis

# A tibble: 3 × 5
  term         estimate std.error statistic  p.value
  <chr>           <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept) -8.44e- 1  7.30e+ 0    -0.116 9.08e- 1
2 year         3.70e- 3  3.63e- 3     1.02  3.09e- 1
3 income       5.84e-10  7.34e-11     7.96  3.09e-15

\[ \begin{split} \widehat{Rating} = -8.440238*10^{-1} + 3.696143*10^{-3} \times Year \\ + 5.842187*10^{-10} \times Income \end{split} \]

  • When year and income = 0, expected rating is -0.844.

  • Expected rating increases by 0.0037 for every additional year and by 5.842187*10-10 for every additional dollar.

Q2 – Inference/modeling/other analysis

# A tibble: 1 × 1
1 0.0317
# A tibble: 1 × 1
1 0.183
  • Correlation between release year and iMDB rating is negligible because r ~ 0, and income has a weakly positive relationship with rating (r = 0.1829604).
# A tibble: 1 × 2
  lower_ci upper_ci
     <dbl>    <dbl>
1   0.0938    0.256
  • CI for blockbuster movies (income of at least $100,000,000) vs. non-blockbuster movies.

Q2 – Conclusions/Future Work

Observation: Movies with high box office incomes tend to earn higher iMDB ratings.

Inference: We can expect movies earning at least $100,000,000 at the box office to have higher iMDB ratings than those earning less than $100,000,000.

Support: Confidence interval, which showed us that the true mean difference between blockbusters and non-blockbusters is probably positive.

Could explore further by . . .

  • Comparing results to Metacritic scoring (different methodology)

  • Comparing budget to rating as a measure of care put into a movie


