── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0 ✔ purrr 1.0.0
✔ tibble 3.2.1 ✔ dplyr 1.1.2
✔ tidyr 1.2.1 ✔ stringr 1.5.0
✔ readr 2.1.3 ✔ forcats 0.5.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
Loading required package: Matrix
Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':
expand, pack, unpack
Loaded glmnet 4.1-7
Exploring the Age Factor
Substance Use, Abuse, and the Impact of Age on Patterns and Behaviors
Introduce the topic and motivation
As recreational drug use becomes more prevalent and accepted in the United States, particularly in the case of recreational Cannabis use, it is important to understand the broader trends and potentially broad implications those trends have on society.
Introduce the data
For our project, we used two different data sets:
- The Drugs data set from CORGIS was compiled by the CORGIS to be used as open source data in analysis by whoever wanted to use it. It was compiled from data from the National Survey on Drug Use and Health (NSDUH).
- The State Marijuana Laws (2019) data set was compiled by Liam Muecke from data on Wikipedia, which was recently updated to reflect the legal status of marijuana by state from 2002 to 2008 and include recreational, legal, illegal, and decriminalization as a variables.
Highlights from EDA
::: columns ::: {.column width=“50%”}
Mean rate of drug use in the past year divided into 3 age groups (12-17, 18-25, 26+) Focus on 3 types of drugs: Alcohol, Ilicit Drugs (Cocaine), and Marijuana :::
::: {.column width=“50%”}
Highlights from EDA
Highlights from EDA
Inference/modeling/other analysis
Df Sum Sq Mean Sq F value Pr(>F)
`Age Group` 2 114.65 57.32 12550 <2e-16 ***
Residuals 2598 11.87 0.00
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Df Sum Sq Mean Sq F value Pr(>F)
`Age Group` 2 38.49 19.244 5602 <2e-16 ***
Residuals 2598 8.92 0.003
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Df Sum Sq Mean Sq F value Pr(>F)
`Age Group` 2 9.315 4.657 3560 <2e-16 ***
Residuals 2598 3.398 0.001
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Df Sum Sq Mean Sq F value Pr(>F)
`Age Group` 2 1.1438 0.5719 4432 <2e-16 ***
Residuals 2598 0.3353 0.0001
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Alcohol p-value = less than 2e-16 Tobacco p-value = less than 2e-16 Marijuana p-value = less than 2e-16 Cocaine p-value = less than 2e-16
We use ANOVA tests is used to determine if there are any significant differences between the means of three or more groups. In this case, the groups are the age groups, and the variable of interest is the use rate. Since the p-value is less than our chosen significance value of 0.05 it indicates that there is a significant relationship between the age group and the use rate of the given substance type.
Inference/modeling/other analysis
Call:
lm(formula = `Use Rate` ~ Legal_Status * `Age Group`, data = drugUse_legalStatus_joined)
Residuals:
Min 1Q Median 3Q Max
-0.08990 -0.01361 -0.00310 0.01024 0.16173
Coefficients:
Estimate Std. Error t value
(Intercept) 0.167322 0.001243 134.653
Legal_StatusMedical 0.058988 0.002145 27.503
Legal_StatusDecriminalized 0.015613 0.002810 5.557
Legal_StatusRecreational 0.131145 0.004450 29.470
`Age Group`12-17 -0.100493 0.001757 -57.185
`Age Group`26+ -0.123360 0.001757 -70.198
Legal_StatusMedical:`Age Group`12-17 -0.041704 0.003033 -13.750
Legal_StatusDecriminalized:`Age Group`12-17 -0.010512 0.003974 -2.646
Legal_StatusRecreational:`Age Group`12-17 -0.103296 0.006293 -16.414
Legal_StatusMedical:`Age Group`26+ -0.028477 0.003033 -9.389
Legal_StatusDecriminalized:`Age Group`26+ -0.009111 0.003974 -2.293
Legal_StatusRecreational:`Age Group`26+ -0.038879 0.006293 -6.178
Pr(>|t|)
(Intercept) < 2e-16 ***
Legal_StatusMedical < 2e-16 ***
Legal_StatusDecriminalized 3.03e-08 ***
Legal_StatusRecreational < 2e-16 ***
`Age Group`12-17 < 2e-16 ***
`Age Group`26+ < 2e-16 ***
Legal_StatusMedical:`Age Group`12-17 < 2e-16 ***
Legal_StatusDecriminalized:`Age Group`12-17 0.00821 **
Legal_StatusRecreational:`Age Group`12-17 < 2e-16 ***
Legal_StatusMedical:`Age Group`26+ < 2e-16 ***
Legal_StatusDecriminalized:`Age Group`26+ 0.02193 *
Legal_StatusRecreational:`Age Group`26+ 7.53e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.02703 on 2589 degrees of freedom
Multiple R-squared: 0.8513, Adjusted R-squared: 0.8506
F-statistic: 1347 on 11 and 2589 DF, p-value: < 2.2e-16
The difference in use rate between medical legalization and illegal status for the 12-17 age group is 0.041704 units lower than the difference for the 18-25 age group. This means that the effect of medical legalization on use rate is less pronounced for the 12-17 age group compared to the 18-25 age group.
The difference in use rate between decriminalized status and illegal status for the 12-17 age group is 0.010512 units lower than the difference for the 18-25 age group. This implies that the effect of decriminalized status on use rate is also less pronounced for the 12-17 age group compared to the 18-25 age group.
The difference in use rate between recreational legalization and illegal status for the 12-17 age group is 0.103296 units lower than the difference for the 18-25 age group. This indicates that the effect of recreational legalization on use rate is less pronounced for the 12-17 age group compared to the 18-25 age group.
The difference in use rate between medical legalization and illegal status for the 26+ age group is 0.028477 units lower than the difference for the 18-25 age group. This means that the effect of medical legalization on use rate is less pronounced for the 26+ age group compared to the 18-25 age group.
The difference in use rate between decriminalized status and illegal status for the 26+ age group is 0.009111 units lower than the difference for the 18-25 age group. This implies that the effect of decriminalized status on use rate is also less pronounced for the 26+ age group compared to the 18-25 age group.
The difference in use rate between recreational legalization and illegal status for the 26+ age group is 0.038879 units lower than the difference for the 18-25 age group. This indicates that the effect of recreational legalization on use rate is less pronounced for the 26+ age group compared to the 18-25 age group.
These interaction terms suggest that the effect of legalization on use rate is not the same across different age groups. For both the 12-17 age group and the 26+ age group, the effect of legalization (be it medical, decriminalized, or recreational) on use rate appears to be less pronounced compared to the 18-25 age group.
Conclusions + future work
Future Work
- Future analysis could be done on how
Each analysis resulted in a p-value of less than 2e-16, essentially zero. This implies a significant difference in use rate across age groups means that the average use rates of a given substance type are not the same for all age groups. In other words, the proportion of people using a specific substance within one age group is significantly different from the proportion of people using the same substance within another age group.
This finding implies that age is an important factor in the use of a particular substance type. It suggests that substance use behavior might be influenced by factors that vary across different age groups, such as social environment, peer influence, availability, or developmental stage.
We ran a second analysis using a linear regression model to try and determine whether the effects of legalization increased use rate and if this increase was uniform across age groups. The main effect coefficients for Legal_StatusMedical, Legal_StatusDecriminalized, and Legal_StatusRecreational show the differences in use rate for the 18-25 age group compared to the reference category (illegal status) when the age group is held constant. All these coefficients are positive, which indicates that the use rate for marijuana is higher in states with medical, decriminalized, or recreational legalization compared to states with illegal status for the 18-25 age group.
The interaction terms suggest that the effect of legalization on use rate is not the same across different age groups. For both the 12-17 age group and the 26+ age group, the effect of legalization (be it medical, decriminalized, or recreational) on use rate appears to be less pronounced compared to the 18-25 age group.
The results indicate that legalization (medical, decriminalized, or recreational) is associated with higher use rates, particularly for the 18-25 age group. However, this association is not uniform across all age groups, with the 12-17 age group showing a less pronounced increase in use rate due to legalization.
First analysis findings: Age is an important factor in the use of a particular substance type, suggesting that substance use behavior might be influenced by factors that vary across different age groups, such as social environment, peer influence, availability, or developmental stage.
Second analysis findings: The results indicate the use rate for marijuana is higher in states with medical, decriminalized, or recreational legalization compared to states with illegal status for the 18-25 age group. However, it is important to note that this association is not uniform across all age groups, with the 12-17 age group showing a less pronounced increase in use rate due to legalization.