# Create 'decade' column
<- monster_movies %>%
monster_movies mutate(decade = (year %/% 10) * 10)
# Merge genre data with monster_movies to include decade column
<- movie_genres %>%
monster_movie_genres left_join(monster_movies %>% select(tconst, decade), by = "tconst") %>%
drop_na(genres)
Project title
Introduction
For this project, we considered the Monster Movie dataset, which aggregates information about monster-related movies. The data source used here is TidyTuesday, which contains advanced data sets such as film title, release year, IMDb rating, budget, box office receipts, and more. In addition, the dataset is categorized according to the type of monster and the type of movie so that we can study the evolution of monster movies over time here. Nonetheless, the dataset provides a great deal of information but has drawbacks. For example, data sets may only cover specific monster movies, while reviews of individual movies’ box office performances may lack data or incomplete information. Also, on IMDb, ratings are influenced by particular audience segments, and films made in the past are sometimes given higher ratings due to the influence of nostalgia.
The main objective of the project was to review how the behavior of monster movies has evolved throughout the past years, to more closely observe those genres preferred by the viewers, and finally, to determine whether the financial aspect actually has an effect on the rating of a film. By using statistical methods and explaining the results visually, we intend to uncover the patterns and the determining factors influencing the development of monster films.
Question 1: What is the trend of IMDB ratings across decades?
Introduction
Observing and understanding how a film’s audience reception has changed is essential. We looked at IMDb ratings across generations to see if audiences enjoyed monster movies more or less. The movie could show whether the high ratings of monster movies are due to nostalgia or whether they result from advances in modern filmmaking.
In the analysis of question 1, we aim to determine the overall trend of monster movie IMDb ratings over time. Specifically, we will analyze the increase or decrease in viewership of target films in different decades to analyze and explain the underlying factors of these trends.
Approach
To solve this problem, we will use two types of visualizations: line charts and Stacked Bar charts. First, the line chart will show each decade’s average IMDb rating of monster movies. Line charts are well suited to this kind of analysis because they effectively highlight trends over time, allowing us to see whether ratings have increased, decreased, or remained stable. Plotting the points for each decade and connecting them with a line shows the changing patterns and fluctuations in movie ratings.
In addition, stacked bar charts allow us to compare the relative proportions of genre monster movies from decade to decade. This can help us determine whether changes in ratings are related to changes in the genre’s popularity. In addition, we will use color mapping to distinguish the different types and make the visualization more informative.
By combining these two visualizations, we can better understand how monster movie ratings have evolved and whether these trends are consistent with changes in genre preferences over time.
Analysis
# Aggregate average rating per decade
<- monster_movies %>%
decade_ratings group_by(decade) %>%
summarise(avg_rating = mean(average_rating, na.rm = TRUE))
# Aggregate genre counts per decade
<- monster_movie_genres %>%
genre_counts_per_decade group_by(decade, genres) %>%
summarise(count = n(), .groups = 'drop') %>%
arrange(decade, desc(count))
# Cornell color palette - accent colors
<- c("#006699", "#6EB43F", "#F8981D", "#EF4035", "#073949")
cornell_pal
# Line chart: Ratings trend across decades.
ggplot(decade_ratings, aes(x = decade, y = avg_rating)) +
geom_line(color = cornell_pal[1], size = 1) +
geom_point(color = cornell_pal[3], size = 2) +
labs(title = "Monster Movie Trends Over Time",
subtitle = "Are modern monster movies rated higher or lower than classic ones?",
x = "Decade",
y = "Average IMDb Rating",
caption = "Source: the Internet Movie Database") +
scale_x_continuous(breaks = seq(min(decade_ratings$decade), max(decade_ratings$decade), by = 10)) +
theme_minimal(
base_family = "sans",
base_size = 12
+
) theme(
plot.title.position = "plot",
plot.title = element_text(hjust = 0.5, size = 16, color = cornell_pal[5]),
plot.subtitle = element_text(hjust = 0.5, size = 12, color = cornell_pal[5]),
axis.text = element_text(size = rel(1.0), color = "black"),
axis.title = element_text(size = rel(1.0), color = "black"),
legend.position = "bottom",
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.major.y = element_line(color = "grey", linewidth = 0.3),
panel.grid.minor.y = element_line(color = "grey", linewidth = 0.15)
)
%>%
genre_counts_per_decade group_by(decade) %>%
mutate(prop = count / sum(count)) %>%
filter(prop >= 0.1) %>% # Only show genres >10% in each decade
ggplot(aes(x = factor(decade), y = prop, fill = fct_reorder(genres, prop, .desc = TRUE))) +
geom_bar(stat = "identity", position = "fill") +
scale_y_continuous(labels = scales::percent_format()) +
scale_fill_viridis_d() +
labs(
title = "Monster Movie Trends Over Time",
subtitle = "What are the dominant monster movie genres by decade?",
x = "Decade",
y = "Proportion of Movies",
caption = "Source: the Internet Movie Database",
fill = "Genre"
+
) theme_minimal(
base_family = "sans",
base_size = 12
+
) theme(
plot.title.position = "plot",
plot.title = element_text(hjust = 0.5, size = 16, color = cornell_pal[5]),
plot.subtitle = element_text(hjust = 0.5, size = 12, color = cornell_pal[5]),
axis.text = element_text(size = rel(1.0), color = "black"),
axis.title = element_text(size = rel(1.0), color = "black"),
legend.position = "bottom",
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.major.y = element_line(color = "grey", linewidth = 0.3),
panel.grid.minor.y = element_line(color = "grey", linewidth = 0.15)
)
Discussion
Judging from the trend of IMDb ratings, monster movie ratings fluctuate significantly from generation to generation. After reaching a peak in the 1920s, ratings showed an overall downward trend in the following decades, only recovering in the 1970s and reaching another high in the 2000s. This may reflect advances in film technology, evolving storytelling techniques, and changing audience expectations. However, we have seen a decline in ratings over the last decade, reflecting an increase in the number of modern monster films, as well as the uneven quality of the work and possibly the high demand for creative storytelling from audiences.
From the analysis of the distribution map of monster movies, the early works were mainly horror and mystery, while the proportion of science fiction and action movies increased significantly after the 1980s. This change reflects the advancement and development of audience preferences and visual effects. But in recent years, the proportion of fantasy elements has increased. These changes show that monster movies are not only influenced by technology and market demands but also constantly adjusting their style as audiences’ aesthetics change.
Question 2: What factors influence a movie’s popularity and rating?
Introduction
We believe that a film’s popularity and rating are influenced by several factors, such as audience interest, genre, and reach. In this analysis, we want to explore how these factors affect the overall performance of the film. Specifically, we wanted to investigate whether films with high voting numbers generally have higher ratings and whether there are significant differences in ratings across genres.
To answer this question, we use two key variables in the dataset: the number of votes for a movie and its IMDb rating. The number of votes can reflect how widely a film has been seen and discussed, while the rating can show how much viewers approve of the quality of the film. In addition, we also analyze the impact of film genres on ratings, as different types of films may appeal to various audiences, resulting in other ratings.
Approach
To study the factors that influence a movie’s popularity and rating, we use two visualization methods: scatter plots and bar plots.
The scatter plot shows the relationship between a movie’s number of votes and its IMDb rating. We use a logarithmic transformation on the X-axis to better represent movies of varying popularity. From this graph, we can tell if movies with high ratings are generally more popular or if certain movies have low ratings but high votes.
We used a bar chart to show the average IMDb score for different movie genres to analyze whether the genre significantly impacts the score. Different types of films may appeal to different audiences, so the distribution of ratings may also vary considerably. Through this, we hope to reveal which factors most influence a film’s overall rating and whether popular films are necessarily rated higher.
Analysis
# Load the monster movie datasets
<- read.csv("data/monster_movies.csv")
movies <- read.csv("data/monster_movie_genres.csv")
genres
# Rename the 'genres' column to 'genre' for consistency
<- genres %>% rename(genre = genres)
genres
# Merge movie data with genre data to associate each movie with its genres
<- merge(movies, genres, by = "tconst")
movies_genres
# Remove entries with missing or empty genre values
<- movies_genres %>% filter(!is.na(genre) & genre != "")
movies_genres
# Scatter plot: Relationship between number of votes and IMDb rating
<- ggplot(movies, aes(x = num_votes, y = average_rating)) +
p2 geom_point(alpha = 0.5) +
geom_smooth(method = "lm", color = "red") +
scale_x_log10(labels = label_comma()) +
scale_y_continuous(labels = label_number()) +
labs(
title = "Number of Votes vs. Rating",
x = "Number of Votes (log scale)",
y = "Average Rating",
caption = "Source: the Internet Movie Database"
)
# Calculate the average IMDb rating for each movie genre
<- movies_genres %>%
genre_avg_rating group_by(genre) %>% # Group data by genre
summarise(avg_rating = mean(average_rating, na.rm = TRUE)) %>%
arrange(desc(avg_rating))
# Bar chart: Average rating by movie genre
<- ggplot(genre_avg_rating, aes(x = reorder(genre, avg_rating), y = avg_rating)) +
p3 geom_bar(stat = "identity", fill = "steelblue") +
coord_flip() +
labs(
title = "Average Rating by Genre",
x = "Genre",
y = "Average Rating",
caption = "Source: the Internet Movie Database"
+
) theme_minimal()
# Print the scatter plot for number of votes vs. rating
print(p2)
`geom_smooth()` using formula = 'y ~ x'
# Print the bar chart for average Rating
p3
Discussion
We can see from the scatterplot that there is a slight negative correlation between the number of votes and IMDb score; that is, movies with more votes have a slight downward trend in score. This suggests that more popular films appeal to a broader audience and that those audiences may have more stringent or divided ratings, leading to lower ratings. In addition, some films have lower ratings but more votes, which may be because they have strong commercial appeal or topicality; even if word of mouth is not good, they still attract many viewers to watch and evaluate.
The second bar chart shows that there are significant differences in ratings between different genres of movies. For example, music, biography, and documentary films generally received high ratings, while science fiction, Western, and horror films received low ratings. High-rated films rely more on narrative quality, depth, and realism, while some low-rated films, such as horror or science fiction, are entertaining but may have a low rating due to factors such as plot routines and use of special effects. In addition, some genres, such as horror movies, may appeal to a specific audience, and this audience may be rated on a more stringent or polarizing scale.
Taken together, a film’s rating is influenced by several factors, including popularity (number of votes), audience, and the film genre’s characteristics.
Presentation
Our presentation can be found here.
Data
TidyTuesday. (2024, October 29). Monster Movies Dataset [Data set]. GitHub. Retrieved March 6, 2025, from https://github.com/rfordatascience/tidytuesday/tree/main/data/2024/2024-10-29
References
TidyTuesday. (2024, October 29). Monster Movies Dataset [Data set]. GitHub. Retrieved March 6, 2025, from https://github.com/rfordatascience/tidytuesday/tree/main/data/2024/2024-10-29
IMDb. (2024). IMDb Monster Movies Dataset [Data set]. IMDb. Retrieved March 6, 2025, from https://www.imdb.com/interfaces/