Exploring the Age Factor:

Substance Use, Abuse, and the Impact of Age on Patterns and Behaviors

library(tidyverse)
library(skimr)

Data 1

Introduction and data

  • Food Access CSV File From the CORGIS Dataset Project

  • Curated By Ryan Whitcomb, Joung Min Choi, Bo Guan from the United States Department of Agriculture’s Economic Research Service on 9/14/2021

  • The dataset contains information about US county’s ability to access supermarkets, supercenters, grocery stores, or other sources of healthy and affordable food.

Research question

  • A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
    • What US regions have the highest level of food insecurity?

    • What counties are considered to have food deserts (need to find definition of food desert)?

    • What state has the most food insecurity?

  • A description of the research topic along with a concise statement of your hypotheses on this topic.
    • Topic: American Food Insecurity

    • Our hypothesis is that rural counties will likely have higher food insecurity than urban counties.

  • Identify the types of variables in your research question. Categorical? Quantitative?
    • Categorical: County Names

    • Quantitative: Dist. From Supermarkets By Factor (remaining variables in dataset)

Glimpse of data

foodAccess <- read.csv('data/food_access.csv')

skimr::skim(foodAccess)
Data summary
Name foodAccess
Number of rows 3142
Number of columns 25
_______________________
Column type frequency:
character 2
numeric 23
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
County 0 1 10 33 0 1877 0
State 0 1 4 20 0 51 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Population 0 1 98264.02 312946.53 82 11114.50 25872.0 66780.00 9818605 ▇▁▁▁▁
Housing.Data.Residing.in.Group.Quarters 0 1 2541.21 6512.50 0 177.00 602.0 2247.00 171670 ▇▁▁▁▁
Housing.Data.Total.Housing.Units 0 1 37147.13 111990.96 39 4368.75 10017.0 25829.00 3241204 ▇▁▁▁▁
Vehicle.Access.1.Mile 0 1 662.16 1095.32 0 118.00 332.0 739.75 13735 ▇▁▁▁▁
Vehicle.Access.1.2.Mile 0 1 1503.13 3903.09 0 180.25 481.0 1197.75 83246 ▇▁▁▁▁
Vehicle.Access.10.Miles 0 1 31.01 80.16 0 1.00 11.0 34.75 1826 ▇▁▁▁▁
Vehicle.Access.20.Miles 0 1 5.16 47.42 0 0.00 0.0 0.00 1473 ▇▁▁▁▁
Low.Access.Numbers.Children.1.Mile 0 1 9527.62 16747.45 0 1649.25 4108.0 9723.25 250060 ▇▁▁▁▁
Low.Access.Numbers.Children.1.2.Mile 0 1 16668.66 41717.86 0 2176.50 5301.5 13327.25 911988 ▇▁▁▁▁
Low.Access.Numbers.Children.10.Miles 0 1 372.74 596.69 0 34.00 210.0 524.75 11490 ▇▁▁▁▁
Low.Access.Numbers.Children.20.Miles 0 1 40.76 235.28 0 0.00 0.0 0.00 5918 ▇▁▁▁▁
Low.Access.Numbers.Low.Income.People.1.Mile 0 1 11199.22 17273.37 0 2501.00 6300.5 13138.25 260673 ▇▁▁▁▁
Low.Access.Numbers.Low.Income.People.1.2.Mile 0 1 20660.44 48784.32 0 3472.25 8403.5 19185.50 1139072 ▇▁▁▁▁
Low.Access.Numbers.Low.Income.People.10.Miles 0 1 617.69 1142.24 0 51.25 319.0 804.00 24663 ▇▁▁▁▁
Low.Access.Numbers.Low.Income.People.20.Miles 0 1 76.11 476.40 0 0.00 0.0 0.00 12405 ▇▁▁▁▁
Low.Access.Numbers.People.1.Mile 0 1 39091.71 64757.27 0 7306.50 17921.5 42034.75 903299 ▇▁▁▁▁
Low.Access.Numbers.People.1.2.Mile 0 1 68483.47 164153.98 82 9527.50 22535.5 57185.00 3696268 ▇▁▁▁▁
Low.Access.Numbers.People.10.Miles 0 1 1637.40 2386.60 0 174.00 955.0 2288.00 37500 ▇▁▁▁▁
Low.Access.Numbers.People.20.Miles 0 1 172.54 823.48 0 0.00 0.0 0.00 17768 ▇▁▁▁▁
Low.Access.Numbers.Seniors.1.Mile 0 1 5339.46 8298.88 0 1194.25 2693.5 5919.75 123489 ▇▁▁▁▁
Low.Access.Numbers.Seniors.1.2.Mile 0 1 9148.15 20213.49 12 1556.25 3423.5 8226.75 431862 ▇▁▁▁▁
Low.Access.Numbers.Seniors.10.Miles 0 1 274.73 382.57 0 28.00 165.5 388.75 5801 ▇▁▁▁▁
Low.Access.Numbers.Seniors.20.Miles 0 1 30.33 137.68 0 0.00 0.0 0.00 4165 ▇▁▁▁▁

Data 2

Introduction and data

  • Drugs CSV file from the CORGIS Dataset Project

  • Data is by Austin Cory Bart, Ryan Whitcomb, Joung Min Choi, Bo Guan, created 10/29/2021. Data was collected from individual states as part of the NSDUH study. The data ranges from 2002 to 2018. Both totals (in thousands of people) and rates (as a percentage of the population) are given.

  • This dataset is about substance abuse. Specifically cigarettes, marijuana, cocaine, and alcohol use among different age groups and states in the US.

  • State Marijuana Laws CSV from data.world

  • Data compiled by Selene Arrazolo from 2016 map by Michael Maciag from Governing Data (https://www.governing.com/archive/state-marijuana-laws-map-medical-recreational.html) (article has since been updated) and updated Liam Muecke to be current to 2019 based on Wikipedia article (https://en.wikipedia.org/wiki/Timeline_of_cannabis_laws_in_the_United_States).

  • This dataset is reflects the legal status of marijuana by state placing each state in 4 categories (Medical, Recretional, No Laws Legalizing, and Decriminalized).

Research question

  • A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
    • What US regions have the highest level of drug use per category?

    • How has drug use in specific regions changed over time?

    • Which category of drug use is the most common?

    • What factors influence changes in adolescent substance abuse? (Marijuana legalization, popularity of vaping, etc.)

    • What type of substance does each age category prefer?

    • Has adolescent marijuana abuse increased in states that have legalized cannabis consumption?

  • A description of the research topic along with a concise statement of your hypotheses on this topic.
    • Topic: Drug use in the United States

    • Our hypothesis is that cigarette use has declines in most states, and that states with larger populations will have drug use.

  • Identify the types of variables in your research question. Categorical? Quantitative?
    • Categorical: States

    • Quantitative: Year, Population, (other variables in the dataset)

Glimpse of data

drugUse <- read.csv('data/drugs.csv')

skimr::skim(drugUse)
Data summary
Name drugUse
Number of rows 867
Number of columns 53
_______________________
Column type frequency:
character 1
numeric 52
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
State 0 1 4 20 0 51 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Year 0 1 2010.00 4.90 2002.00 2006.00 2010.00 2014.00 2018.00 ▇▆▆▆▇
Population.12.17 0 1 489714.13 563795.85 30551.00 131540.50 339685.00 541095.00 3293484.00 ▇▂▁▁▁
Population.18.25 0 1 658880.04 755989.75 57395.00 174293.50 456240.00 746808.00 4469106.00 ▇▁▁▁▁
Population.26. 0 1 3874155.48 4320775.92 310110.00 1027871.00 2698757.00 4509094.00 25917724.00 ▇▂▁▁▁
Totals.Alcohol.Use.Disorder.Past.Year.12.17 0 1 19.22 25.29 0.00 5.00 11.00 24.00 204.00 ▇▁▁▁▁
Totals.Alcohol.Use.Disorder.Past.Year.18.25 0 1 94.48 108.27 6.00 26.00 64.00 119.50 717.00 ▇▁▁▁▁
Totals.Alcohol.Use.Disorder.Past.Year.26. 0 1 224.15 254.02 19.00 57.50 154.00 271.50 1586.00 ▇▁▁▁▁
Rates.Alcohol.Use.Disorder.Past.Year.12.17 0 1 0.04 0.02 0.01 0.03 0.04 0.05 0.11 ▇▇▅▁▁
Rates.Alcohol.Use.Disorder.Past.Year.18.25 0 1 0.15 0.04 0.07 0.12 0.15 0.18 0.27 ▃▇▇▂▁
Rates.Alcohol.Use.Disorder.Past.Year.26. 0 1 0.06 0.01 0.03 0.05 0.06 0.07 0.11 ▂▇▃▁▁
Totals.Alcohol.Use.Past.Month.12.17 0 1 65.80 77.68 3.00 17.00 43.00 81.00 540.00 ▇▁▁▁▁
Totals.Alcohol.Use.Past.Month.18.25 0 1 393.02 440.01 32.00 99.00 258.00 480.00 2639.00 ▇▁▁▁▁
Totals.Alcohol.Use.Past.Month.26. 0 1 2124.66 2372.87 167.00 525.00 1380.00 2623.00 14513.00 ▇▂▁▁▁
Rates.Alcohol.Use.Past.Month.12.17 0 1 0.14 0.04 0.05 0.11 0.13 0.16 0.25 ▂▇▇▃▁
Rates.Alcohol.Use.Past.Month.18.25 0 1 0.61 0.08 0.30 0.56 0.61 0.66 0.76 ▁▁▅▇▃
Rates.Alcohol.Use.Past.Month.26. 0 1 0.55 0.08 0.28 0.51 0.56 0.61 0.72 ▁▂▅▇▃
Totals.Tobacco.Cigarette.Past.Month.12.17 0 1 36.80 41.88 1.00 10.00 23.00 47.50 295.00 ▇▁▁▁▁
Totals.Tobacco.Cigarette.Past.Month.18.25 0 1 209.94 219.53 14.00 56.00 147.00 265.00 1281.00 ▇▂▁▁▁
Totals.Tobacco.Cigarette.Past.Month.26. 0 1 857.22 844.42 76.00 223.00 678.00 1066.00 4452.00 ▇▂▁▁▁
Rates.Tobacco.Cigarette.Past.Month.12.17 0 1 0.08 0.04 0.01 0.05 0.08 0.11 0.20 ▆▇▇▃▁
Rates.Tobacco.Cigarette.Past.Month.18.25 0 1 0.34 0.08 0.13 0.28 0.35 0.40 0.53 ▂▅▇▇▁
Rates.Tobacco.Cigarette.Past.Month.26. 0 1 0.23 0.04 0.12 0.21 0.23 0.26 0.34 ▁▅▇▅▁
Totals.Illicit.Drugs.Cocaine.Used.Past.Year.12.17 0 1 5.06 7.51 0.00 1.00 3.00 6.00 56.00 ▇▁▁▁▁
Totals.Illicit.Drugs.Cocaine.Used.Past.Year.18.25 0 1 37.11 46.66 2.00 10.00 22.00 46.00 345.00 ▇▁▁▁▁
Totals.Illicit.Drugs.Cocaine.Used.Past.Year.26. 0 1 59.11 72.74 2.00 14.00 36.00 75.00 585.00 ▇▁▁▁▁
Rates.Illicit.Drugs.Cocaine.Used.Past.Year.12.17 0 1 0.01 0.01 0.00 0.01 0.01 0.01 0.03 ▇▆▃▁▁
Rates.Illicit.Drugs.Cocaine.Used.Past.Year.18.25 0 1 0.06 0.02 0.02 0.04 0.06 0.07 0.12 ▃▇▆▂▁
Rates.Illicit.Drugs.Cocaine.Used.Past.Year.26. 0 1 0.01 0.01 0.01 0.01 0.01 0.02 0.05 ▇▆▁▁▁
Totals.Marijuana.New.Users.12.17 0 1 24.72 28.67 2.00 7.00 17.00 29.00 197.00 ▇▁▁▁▁
Totals.Marijuana.New.Users.18.25 0 1 24.70 29.33 2.00 6.00 16.00 29.50 204.00 ▇▁▁▁▁
Totals.Marijuana.New.Users.26. 0 1 5.53 8.92 0.00 1.00 3.00 6.00 119.00 ▇▁▁▁▁
Rates.Marijuana.New.Users.12.17 0 1 0.06 0.01 0.03 0.05 0.06 0.07 0.10 ▁▇▆▂▁
Rates.Marijuana.New.Users.18.25 0 1 0.08 0.02 0.03 0.06 0.07 0.09 0.16 ▁▇▃▁▁
Rates.Marijuana.New.Users.26. 0 1 0.00 0.00 0.00 0.00 0.00 0.00 0.02 ▇▂▁▁▁
Totals.Marijuana.Used.Past.Month.12.17 0 1 34.86 41.58 2.00 9.50 22.00 42.00 307.00 ▇▁▁▁▁
Totals.Marijuana.Used.Past.Month.18.25 0 1 123.24 147.85 8.00 35.00 79.00 152.50 1106.00 ▇▁▁▁▁
Totals.Marijuana.Used.Past.Month.26. 0 1 216.13 290.39 10.00 56.50 121.00 276.50 3086.00 ▇▁▁▁▁
Rates.Marijuana.Used.Past.Month.12.17 0 1 0.07 0.02 0.04 0.06 0.07 0.08 0.14 ▂▇▅▂▁
Rates.Marijuana.Used.Past.Month.18.25 0 1 0.19 0.05 0.08 0.15 0.18 0.22 0.39 ▂▇▃▂▁
Rates.Marijuana.Used.Past.Month.26. 0 1 0.06 0.03 0.02 0.04 0.05 0.07 0.18 ▇▅▁▁▁
Totals.Marijuana.Used.Past.Year.12.17 0 1 65.55 76.86 4.00 18.00 43.00 81.00 545.00 ▇▁▁▁▁
Totals.Marijuana.Used.Past.Year.18.25 0 1 202.54 237.62 16.00 56.00 131.00 252.50 1687.00 ▇▁▁▁▁
Totals.Marijuana.Used.Past.Year.26. 0 1 348.76 449.85 17.00 91.50 212.00 439.50 4476.00 ▇▁▁▁▁
Rates.Marijuana.Used.Past.Year.12.17 0 1 0.14 0.03 0.09 0.12 0.13 0.16 0.23 ▃▇▅▂▁
Rates.Marijuana.Used.Past.Year.18.25 0 1 0.31 0.07 0.17 0.27 0.30 0.35 0.53 ▂▇▅▂▁
Rates.Marijuana.Used.Past.Year.26. 0 1 0.09 0.04 0.04 0.07 0.08 0.11 0.25 ▇▆▂▁▁
Totals.Tobacco.Use.Past.Month.12.17 0 1 47.51 51.33 1.00 13.00 31.00 62.00 358.00 ▇▁▁▁▁
Totals.Tobacco.Use.Past.Month.18.25 0 1 249.24 253.20 18.00 67.50 181.00 313.00 1488.00 ▇▂▁▁▁
Totals.Tobacco.Use.Past.Month.26. 0 1 1029.91 1001.45 95.00 258.50 828.00 1289.00 5099.00 ▇▂▁▁▁
Rates.Tobacco.Use.Past.Month.12.17 0 1 0.11 0.04 0.02 0.07 0.11 0.14 0.24 ▅▇▇▃▁
Rates.Tobacco.Use.Past.Month.18.25 0 1 0.40 0.08 0.17 0.35 0.42 0.46 0.59 ▁▃▆▇▂
Rates.Tobacco.Use.Past.Month.26. 0 1 0.28 0.05 0.15 0.25 0.28 0.31 0.41 ▁▅▇▅▁
legalStatus <- read.csv('data/state_marijuana_laws_2019_2.csv')

skimr::skim(legalStatus)
Data summary
Name legalStatus
Number of rows 51
Number of columns 5
_______________________
Column type frequency:
character 5
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
State 0 1 4 20 0 51 0
Medical 0 1 0 3 33 2 0
Recreational 0 1 0 3 39 2 0
Illegal 0 1 0 3 34 2 0
Decriminalized 0 1 0 3 47 2 0

Data 3

Introduction and data

  • Monkeypox CSV file from the CORGIS Dataset Project

  • It was curated by Sam Donald on 9/27/2022, using data from the World Health Organization.

  • This dataset contains information about the status of monkeypox in a given country. Each observation is a different country, and the information includes the number of cases and deaths reported on a given day.

Research question

  • A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
    • Which countries had the highest amount of deaths related to Monkeypox?

    • How has the rate of Monkeypox decreased over time?

  • A description of the research topic along with a concise statement of your hypotheses on this topic.
    • Topic: Monkeypox around the world.

    • Hypothesis: Cases of Monkeypox has decreased over time.

  • Identify the types of variables in your research question. Categorical? Quantitative?
    • Categorical variables: country code, country variable, date

    • Quantitative variables: year, month, day, cases (other variables in the dataset)

Glimpse of data

monkey_pox <- read.csv('data/monkeypox.csv')

skimr::skim(monkey_pox)
Data summary
Name monkey_pox
Number of rows 5874
Number of columns 14
_______________________
Column type frequency:
character 3
numeric 11
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Country.Iso.code 0 1 3 8 0 99 0
Country.Full 0 1 4 28 0 99 0
Date.Full 0 1 10 10 0 126 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Date.Year 0 1 2022.00 0.00 2022 2022.00 2022.00 2022.00 2022.00 ▁▁▇▁▁
Date.Month 0 1 7.13 0.98 5 6.00 7.00 8.00 9.00 ▁▅▇▇▁
Date.Day 0 1 15.91 9.11 1 8.00 16.00 24.00 31.00 ▇▆▆▆▆
Data.Cases.New 0 1 19.42 113.86 0 0.00 0.00 1.00 2063.00 ▇▁▁▁▁
Data.Cases.Total 0 1 717.81 3894.24 1 3.00 15.00 116.00 57039.00 ▇▁▁▁▁
Data.Cases.New.per.million 0 1 0.26 1.36 0 0.00 0.00 0.01 54.52 ▇▁▁▁▁
Data.Cases.Total.per.million 0 1 9.35 17.88 0 0.28 1.46 8.55 142.12 ▇▁▁▁▁
Data.Deaths.New 0 1 0.01 0.10 0 0.00 0.00 0.00 3.00 ▇▁▁▁▁
Data.Deaths.Total 0 1 0.12 0.95 0 0.00 0.00 0.00 19.00 ▇▁▁▁▁
Data.Deaths.New.per.million 0 1 0.00 0.00 0 0.00 0.00 0.00 0.09 ▇▁▁▁▁
Data.Deaths.Total.per.million 0 1 0.00 0.00 0 0.00 0.00 0.00 0.09 ▇▁▁▁▁