library(tidyverse)
Project proposal
Dataset
Rows: 630 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): tconst, title_type, primary_title, original_title, genres, simple_t...
dbl (4): year, runtime_minutes, average_rating, num_votes
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 1291 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): tconst, genres
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 630
Columns: 10
$ tconst <chr> "tt0016123", "tt0023236", "tt0031208", "tt0033879", "t…
$ title_type <chr> "movie", "movie", "movie", "movie", "movie", "movie", …
$ primary_title <chr> "The Monster", "The Monster Walks", "The Human Monster…
$ original_title <chr> "The Monster", "The Monster Walks", "The Dark Eyes of …
$ year <dbl> 1925, 1932, 1939, 1941, 1941, 1942, 1942, 1942, 1944, …
$ runtime_minutes <dbl> 86, 57, 73, 59, 65, 77, 73, 63, 86, 62, 61, 295, 201, …
$ genres <chr> "Comedy,Horror,Mystery", "Horror,Mystery", "Crime,Dram…
$ simple_title <chr> "the monster", "the monster walks", "the human monster…
$ average_rating <dbl> 6.2, 4.1, 5.7, 6.1, 6.0, 3.5, 6.1, 6.1, 5.7, 4.8, 4.9,…
$ num_votes <dbl> 1412, 1120, 1579, 1953, 799, 1969, 1938, 1588, 504, 12…
Rows: 1,291
Columns: 2
$ tconst <chr> "tt0016123", "tt0016123", "tt0016123", "tt0023236", "tt0023236"…
$ genres <chr> "Comedy", "Horror", "Mystery", "Horror", "Mystery", "Crime", "D…
Monsters Movies Dataset (monster_movies.csv): This data set contains information about monster movies, detailing their IMDb identifiers. Titles, release years, genres, runtime, ratings, and number of votes. The dataset has 630 rows with 10 columns.
Monster Movie Genres Dataset (monster_movie_genres.csv) The dataset contains movie-genre relationships, associating movies with individual genres. The dataset has 1291 rows and 2 columns.
We chose this dataset because of our general interest in horror and monster films. We all enjoy watching movies, and take this as an opportunity to study how genre has evolved in terms of runtime, popularity, and themes.
Questions
How have monster movie trends evolved over time?
What factors influence a movie’s popularity and rating?
Analysis plan
How have monster movie trends evolved over time?
- Q: Are modern monster movies rated higher or lower than classic ones?
- Variables:
year
,primary_title
,average_rating
,runtime_minutes
,genres
.
- Variables to be created:
decade
:Groups movies by decade to track long-term trends.genre_counts_per_decade
:Measures how frequently each genre appears over time.
- External data to be merged: Box office revenue data could be incorporated to analyze financial success trends over time.
- Sources: IMDb, Box Office Mojo, TMDB API, or Kaggle datasets.
- Create a decade column from the
year
variable, and group data by decade to visualize trends using line charts and bar graphs. - Visualization:
- Faceted line charts: Show trends in ratings over different decades.
- Stacked bar charts: Show how genre popularity has changed over time.
- Statistical Analysis:
- t-tests: Compare average ratings between different decades.
- Linear regression: Investigate if release year significantly influences ratings, controlling for genre.
What factors influence a movie’s popularity and rating?
- Q: Do horror-comedy monster movies tend to have higher ratings than pure horror ones?
- Variables:
average_rating
,num_votes
,runtime_minutes
,genres
.
- Variables to be created:
is_multigenre
: Binary variable (1 = movie has multiple genres, 0 = single genre).genre_dummy_variables
: Create dummy variables for each genre to enable statistical comparisons.rating_bins
: Categorizes movies into rating groups (e.g., low, medium, high ratings).
- External data to be merged: Budget and revenue data could be merged to assess if higher-budget monster movies tend to have better ratings.
- Sources: The Numbers, TMDB API, or IMDb financial reports.
- Visualization:
- Boxplots or violin plots: Compare ratings across genres.
- Scatter plots with trend lines: Show how budget and rating correlate.
- Faceted bar charts: Display differences in average ratings across sub-genres.
- Statistical Analysis:
- Linear regression: Predict average_rating using budget, num_votes, and is_multigenre.