Project proposal

Author

Giving-stan

library(tidyverse)

Dataset

Rows: 630 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): tconst, title_type, primary_title, original_title, genres, simple_t...
dbl (4): year, runtime_minutes, average_rating, num_votes

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 1291 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): tconst, genres

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 630
Columns: 10
$ tconst          <chr> "tt0016123", "tt0023236", "tt0031208", "tt0033879", "t…
$ title_type      <chr> "movie", "movie", "movie", "movie", "movie", "movie", …
$ primary_title   <chr> "The Monster", "The Monster Walks", "The Human Monster…
$ original_title  <chr> "The Monster", "The Monster Walks", "The Dark Eyes of …
$ year            <dbl> 1925, 1932, 1939, 1941, 1941, 1942, 1942, 1942, 1944, …
$ runtime_minutes <dbl> 86, 57, 73, 59, 65, 77, 73, 63, 86, 62, 61, 295, 201, …
$ genres          <chr> "Comedy,Horror,Mystery", "Horror,Mystery", "Crime,Dram…
$ simple_title    <chr> "the monster", "the monster walks", "the human monster…
$ average_rating  <dbl> 6.2, 4.1, 5.7, 6.1, 6.0, 3.5, 6.1, 6.1, 5.7, 4.8, 4.9,…
$ num_votes       <dbl> 1412, 1120, 1579, 1953, 799, 1969, 1938, 1588, 504, 12…
Rows: 1,291
Columns: 2
$ tconst <chr> "tt0016123", "tt0016123", "tt0016123", "tt0023236", "tt0023236"…
$ genres <chr> "Comedy", "Horror", "Mystery", "Horror", "Mystery", "Crime", "D…

Monsters Movies Dataset (monster_movies.csv): This data set contains information about monster movies, detailing their IMDb identifiers. Titles, release years, genres, runtime, ratings, and number of votes. The dataset has 630 rows with 10 columns.

Monster Movie Genres Dataset (monster_movie_genres.csv) The dataset contains movie-genre relationships, associating movies with individual genres. The dataset has 1291 rows and 2 columns.

We chose this dataset because of our general interest in horror and monster films. We all enjoy watching movies, and take this as an opportunity to study how genre has evolved in terms of runtime, popularity, and themes.

Questions

How have monster movie trends evolved over time?

What factors influence a movie’s popularity and rating?

Analysis plan

What factors influence a movie’s popularity and rating?

  • Q: Do horror-comedy monster movies tend to have higher ratings than pure horror ones?
  • Variables: average_rating, num_votes, runtime_minutes, genres.
  • Variables to be created:
  • is_multigenre: Binary variable (1 = movie has multiple genres, 0 = single genre).
  • genre_dummy_variables: Create dummy variables for each genre to enable statistical comparisons.
  • rating_bins: Categorizes movies into rating groups (e.g., low, medium, high ratings).
  • External data to be merged: Budget and revenue data could be merged to assess if higher-budget monster movies tend to have better ratings.
  • Sources: The Numbers, TMDB API, or IMDb financial reports.
  • Visualization:
  • Boxplots or violin plots: Compare ratings across genres.
  • Scatter plots with trend lines: Show how budget and rating correlate.
  • Faceted bar charts: Display differences in average ratings across sub-genres.
  • Statistical Analysis:
  • Linear regression: Predict average_rating using budget, num_votes, and is_multigenre.