Valentine’s Day Spending Analysis

Author

Gold Kangaroo
Fiona Gao, Ethan Talreja, Claire Yun

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Attaching package: 'scales'


The following object is masked from 'package:purrr':

    discard


The following object is masked from 'package:readr':

    col_factor


Loading required package: sysfonts

Loading required package: showtextdb
Rows: 13 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (10): Year, PercentCelebrating, PerPerson, Candy, Flowers, Jewelry, Gree...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 6 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): Age
dbl (8): SpendingCelebrating, Candy, Flowers, Jewelry, GreetingCards, Evenin...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 2 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): Gender
dbl (8): SpendingCelebrating, Candy, Flowers, Jewelry, GreetingCards, Evenin...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Introduction

We got this Valentine’s Day Consumer Data from TidyTuesday. The data was originally sourced and downloaded from the Sunja aa Kaggle dataset which used data from the National Retail Federation who surveyed U.S. adult consumers over the course of 10 years on their Valentine’s Day spending behavior. It includes consumer data for Valentine’s Day spending plans from 2010 to 2022. We looked for the original data from the NRF to add information for 2023 and 2024, but could not find it published, so we are only focusing on 2010 to 2022.

This dataset is split into 3 parts: data about historical spending during each year for 7 categories of gifts, data about the proportion of people that plan to buy those types of gifts by age, and data about the proportion of people that plan to buy those types of gifts by gender. The 7 categories are consistent between all 3 datasets. These include: candy, flowers, jewelry, greeting cards, an evening out, clothing, and gift cards. The historical spending dataset includes variables for year, the average amount a person plans to spend on Valentine’s Day that year, and the average amount someone plans to spend on each of the 7 categories. The age dataset includes variables for age range (18-24, 25-34, 35-44, 45-54, and 65+), the percentage of people that are spending on Valentine’s, and the percentage of people that plan to spend on each of the 7 categories. The gender dataset is structured the same as the age dataset, but has a variable for gender (men or women) instead of age.

How Have Gifting Preferences Changed Over Time?

Introduction

Our first question focuses on the ways that Valentine’s Day spending has changed over time. To do this, we’re looking at the first of the three datasets, the one that focuses on historical spending. In particular, we’re looking at the ways that spending varies for each category of gift. We’re interested in seeing how spending has changed by category because our time frame includes years after 2020, meaning we can see if the COVID-19 pandemic has changed anything. We imagined that it would change spending on evenings out for obvious reasons, but we were curious if it would significantly affect any of the other categories.

Approach

To address the question, we are going to look at the historical spending per category in two different ways. First, we are going to compare the spending habits within each year with a segmented bar chart. We are using this because we want to see if the percentages of what people are spending on changes from year to year. This may reveal if a certain category has gotten less popular over the years or not so it could reveal some general trends about popularity.

Second, we are going to look at the overall trend over the years with a line chart to compare the actual amounts that people are spending per category. This is slightly different from seeing if the percentages vary because, as we mentioned, we are also interested in seeing how the pandemic has affected spending habits. If, for example, all categories saw a similar decrease in spending then we may not see a big difference in the segmented bar chart since their relative proportions may not change much. However, in the line graph, which actually shows the average amount that people are spending, we would be able to see these trends of increases and decreases. Thus, we picked it to look at the actual magnitude of what people are spending and how that has varied over time.

Analysis

theme_valentine <- function(
    base_size = 16,
    base_family = "roboto",
    base_line_size = base_size / 22,
    base_rect_size = base_size / 22) {
  theme_minimal(
    base_family = base_family,
    base_size = base_size,
    base_rect_size = base_rect_size,
    base_line_size = base_line_size
  ) +
    theme(
      plot.title.position = "plot",
      plot.background = element_rect(fill = "#FFF6F6"),
      panel.grid.major.x = element_blank(),
      panel.grid.minor.x = element_blank(),
      panel.grid.major.y = element_line(colour = "#3A1823", linewidth = rel(0.2)),
      panel.grid.minor.y = element_blank(),
      axis.text = element_text(size = 14),
      legend.text = element_text(size = 12),
      plot.caption = element_text(hjust = 1, face = "italic"),
      plot.caption.position = "plot",
      plot.title = element_markdown(face = "bold", size = rel(1.6)),
      plot.subtitle = element_markdown(size = rel(1.3))
    )
}
# clean data
hs_new <- historical_spending |>
  pivot_longer(
    cols = 4:10,
    names_to = "category",
    values_to = "amount"
  ) |>
  rename_all(tolower) |>
  mutate(
    category = factor(category,
      levels = c(
        "Jewelry", "Flowers", "EveningOut",
        "GiftCards", "Clothing",
        "GreetingCards", "Candy"
      ),
      labels = c(
        "Jewelry", "Flowers", "Evening Out",
        "Gift Cards", "Clothing",
        "Greeting Cards", "Candy"
      )
    )
  )

# new color palette
vday <- c(
  "Jewelry" = "#ff595e",
  "Flowers" = "#ffca3a",
  "Evening Out" = "#8ac926",
  "Gift Cards" = "#000000",
  "Clothing" = "#1982c4",
  "Greeting Cards" = "#6a4c93",
  "Candy" = "#FFA500"
)

# created fill mapped bar chart
ggplot(
  data = hs_new,
  mapping = aes(x = year, y = amount, fill = category, group = year)
) +
  geom_col(position = "fill") +
  scale_fill_manual(values = vday) +
  scale_y_continuous(labels = label_percent()) +
  scale_x_continuous(breaks = seq(2010, 2022, 2)) +
  labs(
    title = "How do spending habits differ by year and category?",
    subtitle = "Overall, spending distribution habits tend to be <span style='color: #e04964'>**constant**</span> on average",
    x = NULL,
    y = NULL,
    fill = "Spending Category",
    caption = "Source: National Retail Federation"
  ) +
  theme_valentine()

# new color palette
vday <- c(
  "Jewelry" = "#ff595e",
  "Flowers" = "#ffca3a",
  "Evening Out" = "#8ac926",
  "Gift Cards" = "#000000",
  "Clothing" = "#1982c4",
  "Greeting Cards" = "#6a4c93",
  "Candy" = "#FFA500"
)

# define categories for labels
down <- c("Jewelry", "Evening Out")
up <- c("Candy", "Gift Cards", "Flowers", "Clothing", "Greeting Cards")

hs_new |>
  mutate(
    category = fct_relevel(
      .f = category, "Jewelry", "Evening Out", "Candy",
      "Gift Cards", "Flowers", "Clothing", "Greeting Cards"
    )
  ) |>
  ggplot(
    mapping = aes(x = year, y = amount, color = category)
  ) +
  geom_line() +
  geom_point() +
  geom_label_repel(
    data = filter(hs_new, year == 2021, category %in% up),
    aes(label = dollar(amount)), color = "black", fill = "white",
    family = "roboto", size = 4, face = "bold", nudge_y = 8
  ) +
  geom_label_repel(
    data = filter(hs_new, year == 2021, category %in% down),
    aes(label = dollar(amount)), color = "black", fill = "white",
    family = "roboto", size = 4, face = "bold", nudge_y = -8
  ) +
  facet_wrap(~category, ncol = 4) +
  scale_color_manual(values = vday) +
  scale_x_continuous(breaks = seq(2012, 2022, 3)) +
  scale_y_continuous(labels = label_dollar()) +
  labs(
    title = "How did COVID-19 Impact Average Spending on Valentine's Day 2021?*",
    subtitle = "Spending on <span style='color: #ff595e'>**jewelry**</span> and <span style='color: #8ac926'>**evening outs**</span> drastically decreased while spending on <span style='color: #000000'>**gift cards**</span> increased. ",
    x = NULL,
    y = NULL,
    color = "Spending Category",
    caption = "Source: National Retail Federation\n*Average Spending is Per Person"
  ) +
  theme_valentine() +
  theme(
    legend.position = "none",
    strip.text.x = element_text(
      family = "roboto",
      size = 14, color = "black", face = "bold"
    )
  )
Warning in geom_label_repel(data = filter(hs_new, year == 2021, category %in% :
Ignoring unknown parameters: `face`
Warning in geom_label_repel(data = filter(hs_new, year == 2021, category %in% :
Ignoring unknown parameters: `face`

Discussion

The bar chart shows that Valentine’s Day spending habits tended to stay consistent and constant across categories between 2010 and 2022. The most popular categories across the years appear to be jewelry and evening out which have relatively higher proportions than other categories. Conversely, greeting cards and gift cards seem to be the categories that people spent the least on, on average, throughout the years. The relative proportion of these categories reveals spending preferences on Valentine’s Day. It makes sense that jewlery and evening out would be popular categories because those are the categories that are most commonly marketed to consumers in the months and days leading up to Valentine’s Day by restaurants and jewelry companies. Jewlery is also traditionally a gift closely associated with love as seen by the wedding/engagement ring industry. Additionally, gift cards and greeting cards are respectively “thoughtless” and low-cost gifts which can potentially explain why the relative proportion of spending on them is so low.

The faceted line chart showing trends in spending habits across different categories is interesting because it shows that there’s an overall upward trend in average spending per person between 2010 and 2022. However, the most interesting thing about this visualization is the dip in spending across all categories except gift cards in 2021. Valentine’s Day 2021 was the first Valentine’s Day after the start of the COVID-19 pandemic and at the time many parts of the world were practicing social distancing. Besides being socially isolated from loved ones, many people’s employment and financial statuses were impacted by the pandemic. A decrease in financial stability can be a potential reason why average spending per person decreased in the wake of the pandemic, especially for high-value categories like jewelry which saw a drastic decrease from around $40 to $30.71. Another category that decreased drastically was evening out which decreased from around $30 to $21.39. Potential reasons for this drop include the closure of public spaces under social distancing guidelines as well as the fact that many couples were isolating away from each other preventing them from spending Valentine’s Day physically together. This could also explain why average spending per person on gift cards increased and gift cards are easy to gift virtually and can be used to buy things without having to leave the house.

How Does Spending Vary by Age and Gender?

Introduction

The second question is “How Does Spending Vary by Age and Gender?” When considering the Valentine’s Day dataset, we were interested in understanding patterns around age and gender. The general hypothesis was that we would see spending patterns differ in different age groups and between gender (men and women). This makes sense as different age groups and genders have different social expectations that may lead to different spending habits for Valentine’s Day. Plus, our first question focused on the amount spent per person rather than the relative number of people who plan to spend on each of the 7 categories. Obviously something like jewelry might be the most expensive, but that doesn’t necessarily mean everyone plans to spend on jewelry.

To analyze this question, we are really focusing on grouping the data by age and gender. We will also group the data by category to analyze the difference in spending habits by category and across differences in age and gender.

Approach

To analyze spending habits across age groups and categories, we used a stacked bar graph to show the distribution of spending amounts for each category and for each age group. The x-axis will be grouped by age group while each bar within a “stack” will represent a category. A stacked bar graph is the best plot to show this data grouped by category and age group so that the audience can visualize easily how spending distribution changes between age groups.

The second graph analyzes spending habits between gender and categories. Since gender is defined with only 2 categories, we chose to use a dumbell graph. This helps us look at the difference in spending habits between both genders for each category. In our graph, the y-axis is the categories while the x-axis is the proportion of men or women that spend money on each category. We clearly highlight the difference between the proportions to ensure the audience can easily visualize the difference and the interesting patterns that emerge.

Analysis

# clean data
new_age <- gifts_age |>
  pivot_longer(
    cols = 3:9,
    names_to = "category",
    values_to = "amount"
  ) |>
  rename_all(tolower) |>
  group_by(age)

# creating specific colors for categories
colors <- c(
  "Candy" = "#ffd9e5",
  "Clothing" = "#ffd9f4",
  "Evening Out" = "#ffd9fc",
  "Flowers" = "#f5d6ff",
  "Gift Cards" = "#e4cbf7",
  "Greeting Cards" = "#963D5A",
  "Jewelry" = "#e04964"
)

# calculate percentages and cumulative sums labels
new_age <- new_age %>%
  group_by(age) %>%
  mutate(total_amount = sum(amount)) %>%
  ungroup() %>%
  mutate(percentage = amount / total_amount) %>%
  arrange(age, desc(category)) %>%
  group_by(age) %>%
  mutate(cumulative = cumsum(amount)) %>%
  ungroup() %>%
  mutate(
    category = factor(
      category,
      levels = c(
        "Candy", "Clothing", "EveningOut", "Flowers",
        "GiftCards", "GreetingCards", "Jewelry"
      ),
      labels = c(
        "Candy", "Clothing", "Evening Out", "Flowers",
        "Gift Cards", "Greeting Cards", "Jewelry"
      )
    ),
  )

# prepare label data
label_data <- new_age %>%
  filter(category %in% c("Greeting Cards", "Jewelry")) %>%
  group_by(age) %>%
  mutate(
    previous_cumulative = lag(cumulative, default = 0),
    midpoint = (cumulative + previous_cumulative) / 2 / total_amount,
    label = percent(percentage, accuracy = 0.1)
  ) %>%
  ungroup()


# create bar graph
ggplot(data = new_age, aes(x = age, y = amount, fill = category)) +
  geom_col(position = "fill", color = "black") +
  geom_label(
    data = label_data,
    aes(x = age, y = midpoint, label = label),
    vjust = 0.5,
    fill = "white",
    color = "black",
    size = 3
  ) +
  scale_y_continuous(labels = percent_format(accuracy = 1)) +
  scale_fill_manual(values = colors) +
  labs(
    title = "How do spending habits differ by age group?",
    subtitle = "As age increases, spending on <span style='color: #e04964'>**jewelry**</span> decreases and <span style='color: #963D5A'>**greeting cards**</span> increases on average",
    x = "Age Group",
    y = NULL,
    fill = "Spending Category",
    caption = "Source: National Retail Federation"
  ) +
  theme_valentine()

# clean data
new_gender <- gifts_gender |>
  pivot_longer(
    cols = 3:9,
    names_to = "category",
    values_to = "percent"
  ) |>
  rename_all(tolower) |>
  mutate(
    highlight = category == "Flowers",
    percent_label = paste(percent, "%", sep = ""),
    heart = fontawesome("fa-heart"),
    category = factor(
      category,
      levels = c(
        "Jewelry", "Flowers", "EveningOut", "GiftCards",
        "Clothing", "GreetingCards", "Candy"
      ),
      labels = c(
        "Jewelry", "Flowers", "Evening Out", "Gift Cards",
        "Clothing", "Greeting Cards", "Candy"
      )
    ),
    category = fct_rev(category)
  ) |>
  group_by(gender)
gender_palette <- c("#963D5A", "#e04964")
# create dumbbell chart
ggplot(new_gender, aes(x = percent, y = category)) +
  geom_line(aes(group = category),
    color = "grey70", show.legend = FALSE,
    linewidth = 4, alpha = 0.5
  ) +
  annotate(
    geom = "segment",
    x = 19, xend = 56, y = "Flowers", yend = "Flowers",
    color = "#72BF64", alpha = 0.5, linewidth = 4
  ) +
  geom_text(aes(label = heart, color = gender),
    family = "fontawesome-webfont", size = 19, show.legend = FALSE
  ) +
  geom_text(aes(label = percent_label), color = "white", size = 6) +
  annotate(
    geom = "segment",
    x = 39, xend = 38, y = 5.5, yend = 5.8,
    arrow = arrow(angle = 15, length = unit(0.3, "lines")),
    color = "#963D5A"
  ) +
  annotate(
    geom = "text",
    x = 46.5, y = 5.5,
    label = "37% more men plan to buy flowers",
    color = "#963D5A", size = 6
  ) +
  # might get rid of these 2 labels
  annotate(
    geom = "label",
    x = 55, y = 7.2,
    label = "Higher proportion of men",
    color = "#963D5A", size = 5
  ) +
  annotate(
    geom = "label",
    x = 55, y = 4.2,
    label = "Higher proportion of women",
    color = "#e04964", size = 5
  ) +
  scale_color_manual(values = gender_palette) +
  theme_valentine() +
  scale_x_continuous(labels = label_percent(scale = 1)) +
  labs(
    title = "How do spending habits differ by gender?",
    subtitle = "Significantly fewer <span style='color: #e04964'>**women**</span> buy flowers, compared to <span style='color: #963D5A'>**men**</span>, on average",
    x = NULL,
    y = NULL,
    caption = "Source: National Retail Federation"
  ) +
  theme(
    axis.text.x = element_blank()
  )

Discussion

Looking at the bar chart visualizing Valentine’s Day spending by age group and spending category, it’s evident that the relative proportions of Valentine’s Day spending stay mostly consistent between age groups for each category. One thing to note is that as age increases spending on jewelry decreases while spending on greeting cards increases on average. Around 11.7% of spending in the 18-24 age group is on greeting cards while around 25.1% of spending in the 65+ age group is on the same category. Additionally, 11.7% of spending is on jewelry in the 18-24 age group contrasting with just 4.6% of average spending on jewlery in the 65+ age group. This trend could reflect a greater appreciation for non-material objects as a sign of love as we age. It could also be a reflection of what each generation values in gift-giving based on their upbringing. Another reason why spending on jewelry is lower in older categories is because people have simply had the opportunity to gift jewelry in earlier years, lowering the utility of gifting it in later years. 

The dumbbell chart reveals differences between men and women when spending on Valentine’s Day. For instance, 56% of men plan to buy flowers for Valentine’s Day as opposed to 19% of women, creating a 37% gap. Furthermore, for categories like jewelry, flowers, and evening out more men spend than women. For categories like clothing, greeting cards, and candy, more women spend than men. This could be reflective of traditional gender roles, especially in heterosexual couples, where men are expected to pay for meals and women often take on caretaker roles. Furthermore, gender stereotypes can also play into gifting patterns as men tend to not wear jewelry and receiving flowers is a violation of traditional standards of masculinity.

To analyze spending habits across age groups and categories, we used a stacked bar graph to show the distribution of spending amounts for each category and for each age group. The x-axis will be grouped by age group while each bar within a “stack” will represent a category. A stacked bar graph is the best plot to show this data grouped by category and age group so that the audience can visualize easily how spending distribution changes between age groups.

The second visualization analyzes spending habits between gender and categories. Since gender is defined with only 2 categories, we chose to use a dumbell graph. This helps us look at the difference in spending habits between both genders for each category. In our graph, the y-axis is the categories while the x-axis is the proportion of men or women that spend money on each category. We clearly highlight the difference between the proportions to ensure the audience can easily visualize the difference and the interesting patterns that emerge.

Presentation

Our presentation can be found here.

Data

Suraj Das. (2022). Happy Valentine’s Day 2022. Kaggle.com. https://www.kaggle.com/datasets/infinator/happy-valentines-day-2022/data

References

Suraj Das. (2022). Happy Valentine’s Day 2022. Kaggle.com. https://www.kaggle.com/datasets/infinator/happy-valentines-day-2022/data