Exploring the Age Factor:

Substance Use, Abuse, and the Impact of Age on Patterns and Behaviors

library(tidyverse)
library(skimr)

Data 1

Introduction and data

Food Access CSV File From the CORGIS Dataset Project
Curated By Ryan Whitcomb, Joung Min Choi, Bo Guan from the United States Department of Agriculture’s Economic Research Service on 9/14/2021
The dataset contains information about US county’s ability to access supermarkets, supercenters, grocery stores, or other sources of healthy and affordable food.

Research question

A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
- What US regions have the highest level of food insecurity?
- What counties are considered to have food deserts (need to find definition of food desert)?
- What state has the most food insecurity?
A description of the research topic along with a concise statement of your hypotheses on this topic.
- Topic: American Food Insecurity
- Our hypothesis is that rural counties will likely have higher food insecurity than urban counties.
Identify the types of variables in your research question. Categorical? Quantitative?
- Categorical: County Names
- Quantitative: Dist. From Supermarkets By Factor (remaining variables in dataset)

Glimpse of data

foodAccess <- read.csv('data/food_access.csv')

skimr::skim(foodAccess)

Data summary
Name	foodAccess
Number of rows	3142
Number of columns	25
_______________________
Column type frequency:
character	2
numeric	23
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
County	0	1	10	33	0	1877	0
State	0	1	4	20	0	51	0

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
Population	1	98264.02	312946.53	82	11114.50	25872.0	66780.00	9818605	▇▁▁▁▁
Housing.Data.Residing.in.Group.Quarters	1	2541.21	6512.50	0	177.00	602.0	2247.00	171670	▇▁▁▁▁
Housing.Data.Total.Housing.Units	1	37147.13	111990.96	39	4368.75	10017.0	25829.00	3241204	▇▁▁▁▁
Vehicle.Access.1.Mile	1	662.16	1095.32	0	118.00	332.0	739.75	13735	▇▁▁▁▁
Vehicle.Access.1.2.Mile	1	1503.13	3903.09	0	180.25	481.0	1197.75	83246	▇▁▁▁▁
Vehicle.Access.10.Miles	1	31.01	80.16	0	1.00	11.0	34.75	1826	▇▁▁▁▁
Vehicle.Access.20.Miles	1	5.16	47.42	0	0.00	0.0	0.00	1473	▇▁▁▁▁
Low.Access.Numbers.Children.1.Mile	1	9527.62	16747.45	0	1649.25	4108.0	9723.25	250060	▇▁▁▁▁
Low.Access.Numbers.Children.1.2.Mile	1	16668.66	41717.86	0	2176.50	5301.5	13327.25	911988	▇▁▁▁▁
Low.Access.Numbers.Children.10.Miles	1	372.74	596.69	0	34.00	210.0	524.75	11490	▇▁▁▁▁
Low.Access.Numbers.Children.20.Miles	1	40.76	235.28	0	0.00	0.0	0.00	5918	▇▁▁▁▁
Low.Access.Numbers.Low.Income.People.1.Mile	1	11199.22	17273.37	0	2501.00	6300.5	13138.25	260673	▇▁▁▁▁
Low.Access.Numbers.Low.Income.People.1.2.Mile	1	20660.44	48784.32	0	3472.25	8403.5	19185.50	1139072	▇▁▁▁▁
Low.Access.Numbers.Low.Income.People.10.Miles	1	617.69	1142.24	0	51.25	319.0	804.00	24663	▇▁▁▁▁
Low.Access.Numbers.Low.Income.People.20.Miles	1	76.11	476.40	0	0.00	0.0	0.00	12405	▇▁▁▁▁
Low.Access.Numbers.People.1.Mile	1	39091.71	64757.27	0	7306.50	17921.5	42034.75	903299	▇▁▁▁▁
Low.Access.Numbers.People.1.2.Mile	1	68483.47	164153.98	82	9527.50	22535.5	57185.00	3696268	▇▁▁▁▁
Low.Access.Numbers.People.10.Miles	1	1637.40	2386.60	0	174.00	955.0	2288.00	37500	▇▁▁▁▁
Low.Access.Numbers.People.20.Miles	1	172.54	823.48	0	0.00	0.0	0.00	17768	▇▁▁▁▁
Low.Access.Numbers.Seniors.1.Mile	1	5339.46	8298.88	0	1194.25	2693.5	5919.75	123489	▇▁▁▁▁
Low.Access.Numbers.Seniors.1.2.Mile	1	9148.15	20213.49	12	1556.25	3423.5	8226.75	431862	▇▁▁▁▁
Low.Access.Numbers.Seniors.10.Miles	1	274.73	382.57	0	28.00	165.5	388.75	5801	▇▁▁▁▁
Low.Access.Numbers.Seniors.20.Miles	1	30.33	137.68	0	0.00	0.0	0.00	4165	▇▁▁▁▁

Data 2

Introduction and data

Drugs CSV file from the CORGIS Dataset Project
Data is by Austin Cory Bart, Ryan Whitcomb, Joung Min Choi, Bo Guan, created 10/29/2021. Data was collected from individual states as part of the NSDUH study. The data ranges from 2002 to 2018. Both totals (in thousands of people) and rates (as a percentage of the population) are given.
This dataset is about substance abuse. Specifically cigarettes, marijuana, cocaine, and alcohol use among different age groups and states in the US.
State Marijuana Laws CSV from data.world
Data compiled by Selene Arrazolo from 2016 map by Michael Maciag from Governing Data (https://www.governing.com/archive/state-marijuana-laws-map-medical-recreational.html) (article has since been updated) and updated Liam Muecke to be current to 2019 based on Wikipedia article (https://en.wikipedia.org/wiki/Timeline_of_cannabis_laws_in_the_United_States).
This dataset is reflects the legal status of marijuana by state placing each state in 4 categories (Medical, Recretional, No Laws Legalizing, and Decriminalized).

Research question

A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
- What US regions have the highest level of drug use per category?
- How has drug use in specific regions changed over time?
- Which category of drug use is the most common?
- What factors influence changes in adolescent substance abuse? (Marijuana legalization, popularity of vaping, etc.)
- What type of substance does each age category prefer?
- Has adolescent marijuana abuse increased in states that have legalized cannabis consumption?
A description of the research topic along with a concise statement of your hypotheses on this topic.
- Topic: Drug use in the United States
- Our hypothesis is that cigarette use has declines in most states, and that states with larger populations will have drug use.
Identify the types of variables in your research question. Categorical? Quantitative?
- Categorical: States
- Quantitative: Year, Population, (other variables in the dataset)

Glimpse of data

drugUse <- read.csv('data/drugs.csv')

skimr::skim(drugUse)

Data summary
Name	drugUse
Number of rows	867
Number of columns	53
_______________________
Column type frequency:
character	1
numeric	52
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
State	0	1	4	20	0	51	0

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
Year	1	2010.00	4.90	2002.00	2006.00	2010.00	2014.00	2018.00	▇▆▆▆▇
Population.12.17	1	489714.13	563795.85	30551.00	131540.50	339685.00	541095.00	3293484.00	▇▂▁▁▁
Population.18.25	1	658880.04	755989.75	57395.00	174293.50	456240.00	746808.00	4469106.00	▇▁▁▁▁
Population.26.	1	3874155.48	4320775.92	310110.00	1027871.00	2698757.00	4509094.00	25917724.00	▇▂▁▁▁
Totals.Alcohol.Use.Disorder.Past.Year.12.17	1	19.22	25.29	0.00	5.00	11.00	24.00	204.00	▇▁▁▁▁
Totals.Alcohol.Use.Disorder.Past.Year.18.25	1	94.48	108.27	6.00	26.00	64.00	119.50	717.00	▇▁▁▁▁
Totals.Alcohol.Use.Disorder.Past.Year.26.	1	224.15	254.02	19.00	57.50	154.00	271.50	1586.00	▇▁▁▁▁
Rates.Alcohol.Use.Disorder.Past.Year.12.17	1	0.04	0.02	0.01	0.03	0.04	0.05	0.11	▇▇▅▁▁
Rates.Alcohol.Use.Disorder.Past.Year.18.25	1	0.15	0.04	0.07	0.12	0.15	0.18	0.27	▃▇▇▂▁
Rates.Alcohol.Use.Disorder.Past.Year.26.	1	0.06	0.01	0.03	0.05	0.06	0.07	0.11	▂▇▃▁▁
Totals.Alcohol.Use.Past.Month.12.17	1	65.80	77.68	3.00	17.00	43.00	81.00	540.00	▇▁▁▁▁
Totals.Alcohol.Use.Past.Month.18.25	1	393.02	440.01	32.00	99.00	258.00	480.00	2639.00	▇▁▁▁▁
Totals.Alcohol.Use.Past.Month.26.	1	2124.66	2372.87	167.00	525.00	1380.00	2623.00	14513.00	▇▂▁▁▁
Rates.Alcohol.Use.Past.Month.12.17	1	0.14	0.04	0.05	0.11	0.13	0.16	0.25	▂▇▇▃▁
Rates.Alcohol.Use.Past.Month.18.25	1	0.61	0.08	0.30	0.56	0.61	0.66	0.76	▁▁▅▇▃
Rates.Alcohol.Use.Past.Month.26.	1	0.55	0.08	0.28	0.51	0.56	0.61	0.72	▁▂▅▇▃
Totals.Tobacco.Cigarette.Past.Month.12.17	1	36.80	41.88	1.00	10.00	23.00	47.50	295.00	▇▁▁▁▁
Totals.Tobacco.Cigarette.Past.Month.18.25	1	209.94	219.53	14.00	56.00	147.00	265.00	1281.00	▇▂▁▁▁
Totals.Tobacco.Cigarette.Past.Month.26.	1	857.22	844.42	76.00	223.00	678.00	1066.00	4452.00	▇▂▁▁▁
Rates.Tobacco.Cigarette.Past.Month.12.17	1	0.08	0.04	0.01	0.05	0.08	0.11	0.20	▆▇▇▃▁
Rates.Tobacco.Cigarette.Past.Month.18.25	1	0.34	0.08	0.13	0.28	0.35	0.40	0.53	▂▅▇▇▁
Rates.Tobacco.Cigarette.Past.Month.26.	1	0.23	0.04	0.12	0.21	0.23	0.26	0.34	▁▅▇▅▁
Totals.Illicit.Drugs.Cocaine.Used.Past.Year.12.17	1	5.06	7.51	0.00	1.00	3.00	6.00	56.00	▇▁▁▁▁
Totals.Illicit.Drugs.Cocaine.Used.Past.Year.18.25	1	37.11	46.66	2.00	10.00	22.00	46.00	345.00	▇▁▁▁▁
Totals.Illicit.Drugs.Cocaine.Used.Past.Year.26.	1	59.11	72.74	2.00	14.00	36.00	75.00	585.00	▇▁▁▁▁
Rates.Illicit.Drugs.Cocaine.Used.Past.Year.12.17	1	0.01	0.01	0.00	0.01	0.01	0.01	0.03	▇▆▃▁▁
Rates.Illicit.Drugs.Cocaine.Used.Past.Year.18.25	1	0.06	0.02	0.02	0.04	0.06	0.07	0.12	▃▇▆▂▁
Rates.Illicit.Drugs.Cocaine.Used.Past.Year.26.	1	0.01	0.01	0.01	0.01	0.01	0.02	0.05	▇▆▁▁▁
Totals.Marijuana.New.Users.12.17	1	24.72	28.67	2.00	7.00	17.00	29.00	197.00	▇▁▁▁▁
Totals.Marijuana.New.Users.18.25	1	24.70	29.33	2.00	6.00	16.00	29.50	204.00	▇▁▁▁▁
Totals.Marijuana.New.Users.26.	1	5.53	8.92	0.00	1.00	3.00	6.00	119.00	▇▁▁▁▁
Rates.Marijuana.New.Users.12.17	1	0.06	0.01	0.03	0.05	0.06	0.07	0.10	▁▇▆▂▁
Rates.Marijuana.New.Users.18.25	1	0.08	0.02	0.03	0.06	0.07	0.09	0.16	▁▇▃▁▁
Rates.Marijuana.New.Users.26.	1	0.00	0.00	0.00	0.00	0.00	0.00	0.02	▇▂▁▁▁
Totals.Marijuana.Used.Past.Month.12.17	1	34.86	41.58	2.00	9.50	22.00	42.00	307.00	▇▁▁▁▁
Totals.Marijuana.Used.Past.Month.18.25	1	123.24	147.85	8.00	35.00	79.00	152.50	1106.00	▇▁▁▁▁
Totals.Marijuana.Used.Past.Month.26.	1	216.13	290.39	10.00	56.50	121.00	276.50	3086.00	▇▁▁▁▁
Rates.Marijuana.Used.Past.Month.12.17	1	0.07	0.02	0.04	0.06	0.07	0.08	0.14	▂▇▅▂▁
Rates.Marijuana.Used.Past.Month.18.25	1	0.19	0.05	0.08	0.15	0.18	0.22	0.39	▂▇▃▂▁
Rates.Marijuana.Used.Past.Month.26.	1	0.06	0.03	0.02	0.04	0.05	0.07	0.18	▇▅▁▁▁
Totals.Marijuana.Used.Past.Year.12.17	1	65.55	76.86	4.00	18.00	43.00	81.00	545.00	▇▁▁▁▁
Totals.Marijuana.Used.Past.Year.18.25	1	202.54	237.62	16.00	56.00	131.00	252.50	1687.00	▇▁▁▁▁
Totals.Marijuana.Used.Past.Year.26.	1	348.76	449.85	17.00	91.50	212.00	439.50	4476.00	▇▁▁▁▁
Rates.Marijuana.Used.Past.Year.12.17	1	0.14	0.03	0.09	0.12	0.13	0.16	0.23	▃▇▅▂▁
Rates.Marijuana.Used.Past.Year.18.25	1	0.31	0.07	0.17	0.27	0.30	0.35	0.53	▂▇▅▂▁
Rates.Marijuana.Used.Past.Year.26.	1	0.09	0.04	0.04	0.07	0.08	0.11	0.25	▇▆▂▁▁
Totals.Tobacco.Use.Past.Month.12.17	1	47.51	51.33	1.00	13.00	31.00	62.00	358.00	▇▁▁▁▁
Totals.Tobacco.Use.Past.Month.18.25	1	249.24	253.20	18.00	67.50	181.00	313.00	1488.00	▇▂▁▁▁
Totals.Tobacco.Use.Past.Month.26.	1	1029.91	1001.45	95.00	258.50	828.00	1289.00	5099.00	▇▂▁▁▁
Rates.Tobacco.Use.Past.Month.12.17	1	0.11	0.04	0.02	0.07	0.11	0.14	0.24	▅▇▇▃▁
Rates.Tobacco.Use.Past.Month.18.25	1	0.40	0.08	0.17	0.35	0.42	0.46	0.59	▁▃▆▇▂
Rates.Tobacco.Use.Past.Month.26.	1	0.28	0.05	0.15	0.25	0.28	0.31	0.41	▁▅▇▅▁

legalStatus <- read.csv('data/state_marijuana_laws_2019_2.csv')

skimr::skim(legalStatus)

Data summary
Name	legalStatus
Number of rows	51
Number of columns	5
_______________________
Column type frequency:
character	5
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	empty	n_unique
State	1	4	20	0	51
Medical	1	0	3	33	2
Recreational	1	0	3	39	2
Illegal	1	0	3	34	2
Decriminalized	1	0	3	47	2

Data 3

Introduction and data

Monkeypox CSV file from the CORGIS Dataset Project
It was curated by Sam Donald on 9/27/2022, using data from the World Health Organization.
This dataset contains information about the status of monkeypox in a given country. Each observation is a different country, and the information includes the number of cases and deaths reported on a given day.

Research question

A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
- Which countries had the highest amount of deaths related to Monkeypox?
- How has the rate of Monkeypox decreased over time?
A description of the research topic along with a concise statement of your hypotheses on this topic.
- Topic: Monkeypox around the world.
- Hypothesis: Cases of Monkeypox has decreased over time.
Identify the types of variables in your research question. Categorical? Quantitative?
- Categorical variables: country code, country variable, date
- Quantitative variables: year, month, day, cases (other variables in the dataset)

Glimpse of data

monkey_pox <- read.csv('data/monkeypox.csv')

skimr::skim(monkey_pox)

Data summary
Name	monkey_pox
Number of rows	5874
Number of columns	14
_______________________
Column type frequency:
character	3
numeric	11
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
Country.Iso.code	1	3	8	99
Country.Full	1	4	28	99
Date.Full	1	10	10	126

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
Date.Year	1	2022.00	0.00	2022	2022.00	2022.00	2022.00	2022.00	▁▁▇▁▁
Date.Month	1	7.13	0.98	5	6.00	7.00	8.00	9.00	▁▅▇▇▁
Date.Day	1	15.91	9.11	1	8.00	16.00	24.00	31.00	▇▆▆▆▆
Data.Cases.New	1	19.42	113.86	0	0.00	0.00	1.00	2063.00	▇▁▁▁▁
Data.Cases.Total	1	717.81	3894.24	1	3.00	15.00	116.00	57039.00	▇▁▁▁▁
Data.Cases.New.per.million	1	0.26	1.36	0	0.00	0.00	0.01	54.52	▇▁▁▁▁
Data.Cases.Total.per.million	1	9.35	17.88	0	0.28	1.46	8.55	142.12	▇▁▁▁▁
Data.Deaths.New	1	0.01	0.10	0	0.00	0.00	0.00	3.00	▇▁▁▁▁
Data.Deaths.Total	1	0.12	0.95	0	0.00	0.00	0.00	19.00	▇▁▁▁▁
Data.Deaths.New.per.million	1	0.00	0.00	0	0.00	0.00	0.00	0.09	▇▁▁▁▁
Data.Deaths.Total.per.million	1	0.00	0.00	0	0.00	0.00	0.00	0.09	▇▁▁▁▁