INFO 2950 Final Project

Proposal

library(tidyverse)
library(skimr)
library(rvest)
library(httr)

Data 1

Introduction and data

  • Identify the source of the data.

    • the source of the data is the Reddit Developers platform, which provides access to Reddit’s API documentation and resources.
  • State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

    • The data was originally collected by Reddit, in which they collected data from its users and their interactions on the platform.
  • Write a brief description of the observations.

    • The data available through the Reddit API includes information about posts, comments, users, subreddits, and more. Since this is an API, this data is not a fixed dataset with a fixed number of observations. Instead, it provides dynamic, real-time data from the Reddit platform.

Research question

  • A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
    • What is the impact of Reddit’s community moderation practices on user engagement and retention?

    • What is the general response rate/frequency among Reddit posts, and what is the general relationship of respondents to the poster? Do strangers answer more frequently than friends?

    • How do subreddit rules and post removal policies affect user behavior on Reddit?

  • A description of the research topic along with a concise statement of your hypotheses on this topic.
    • The research topic is the impact of Reddit’s community moderation practices on user engagement and retention.

    • hypothesis: Moderation practices have varying effects on user behavior depending on the type of subreddit or user group. For example, stricter moderation may be more effective in niche subreddits with a smaller and more engaged users, while looser moderation may be more effective in larger subreddits with more diverse users.

  • Identify the types of variables in your research question. Categorical? Quantitative?
    • Categorical variables include:

      • Subreddit type (e.g., news, entertainment, sports, etc.)

      • User groups (e.g., moderators, regular users, banned users, etc.)

      • Moderation practices (e.g., strict, loose, clear, inconsistent, etc.)

    • Quantitative variables include:

      • User engagement measurements (e.g., number of posts, comments, upvotes, etc.)

      • User retention measurements (e.g., time spent on the platform, frequency of visits, etc.)

      • Subreddit size (e.g., number of subscribers, active users, etc.)

      • Time variables (e.g., length of bans, time since rule changes, etc.)

Glimpse of data

NULL

Data 2

Introduction and data

  • Identify the source of the data.

The source of the data about teens cyberbullying comes from Pew Research Center which is a nonpartisan research center focused on providing qualitative and quantitate statistical analysis on social issues, public opinion, and demographic trends in the US and the world.

  • State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

The information comes from their latest survey data from 2022 where participants are parents and teens using the NORC AmeriSpeak panel. The NORC AmeriSpeak is a national representative planel for US household. The samples of parents and teens were randomly sampled and contacted by US mail, telephone, and field interviews. The sampled participants were 1, 058 parents with 743 teens from the ages of 13 to 17.

  • Write a brief description of the observations.

The observations of the data for the teens were gender, race, age-group, income, parent’s level of education. The observation of the data for the parents were parent gender, race, income, education level, teen age, and teen gender. More specifically there are 746 rows and 176 columns.

Research question

  • A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)

Research Question 1: How does a teen’s parent’s education level associated with the teen spreading false rumors and receiving explicit based on their gender. Research Question 2: How does parent’s concerns of their teen experiencing cyber bullying correlated with their parent’s income? Research Question 3: How is gender and race related to their parent’s concern?

  • A description of the research topic along with a concise statement of your hypotheses on this topic.

Research Question 1: The association between a teen’s parent’s education level, gender, and the likelihood of spreading false rumors and receiving explicit content online. We can split the hypothesis into two types of hypothesis:

Null hypothesis: The teen’s parent’s education level is not associated with spreading false rumors and receiving explicit images for male and female teens. Alternative hypothesis: The teen’s parent’s education level is associated with spreading false rumors and receiving explicit images for male and female teens.

Research Question 2: The correlation between a parent’s income and their concern for their teen experiencing cyberbullying. We can split the hypothesis into two types of hypothesis:

Null hypothesis: No correlation exists between teens experiencing cyberbullying and their parent’s income and concerns. Alternative hypothesis: A correlation exists between teens experiencing cyberbullying and their parent’s income and concerns.

Research Question 3: The relationship between gender, race, and a parent’s concern for their teen’s well-being concerning online safety and social media use. We can split the hypothesis into two types of hypothesis: Null hypothesis: Teen’s gender and race are unrelated to their parent’s concerns. Alternative hypothesis: Teen’s gender and race are related to their parent’s concerns.

  • Identify the types of variables in your research question. Categorical? Quantitative?

Research Question 1: Parent’s educational level (high school or less, some college, and college graduate+), U.S. parents of teens, and gender. Variable types: Categorical

Glimpse of data

# add code here

teen_cyberbullying_data = read.csv("march-7-April 10, 2018-Teensand TechSurvey- CSV.csv")
skimr::skim(teen_cyberbullying_data) 
Data summary
Name teen_cyberbullying_data
Number of rows 743
Number of columns 176
_______________________
Column type frequency:
character 2
numeric 174
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
SURV_LANG 0 1 2 2 0 2 0
DEVICE 0 1 6 10 0 4 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
CASEID 0 1.00 1170.47 544.01 2.00 727.00 1430.00 1615.50 1801.00 ▂▂▂▂▇
FITIN 0 1.00 1.97 6.36 1.00 1.00 2.00 2.00 98.00 ▇▁▁▁▁
FRIEND1 0 1.00 2.52 1.09 1.00 2.00 2.00 3.00 4.00 ▆▇▁▇▇
FRIEND2 0 1.00 1.79 3.67 1.00 1.00 1.00 2.00 98.00 ▇▁▁▁▁
FRIEND3 0 1.00 2.47 0.65 1.00 2.00 3.00 3.00 3.00 ▁▁▅▁▇
FRIEND4_1 0 1.00 0.18 0.39 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▂
FRIEND4_2 0 1.00 0.29 0.45 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▃
FRIEND4_3 0 1.00 0.37 0.48 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▅
FRIEND4_4 0 1.00 0.43 0.49 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▆
FRIEND4_5 0 1.00 0.36 0.48 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▅
FRIEND4_6 0 1.00 0.11 0.32 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▁
FRIEND5 0 1.00 2.23 0.57 1.00 2.00 2.00 2.00 4.00 ▁▇▁▂▁
FRIEND6_1 19 0.97 0.62 0.48 0.00 0.00 1.00 1.00 1.00 ▅▁▁▁▇
FRIEND6_2 19 0.97 0.61 0.49 0.00 0.00 1.00 1.00 1.00 ▅▁▁▁▇
FRIEND6_3 19 0.97 0.49 0.50 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▇
FRIEND6_4 19 0.97 0.15 0.36 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▂
FRIEND6_5 19 0.97 0.34 0.47 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▅
FRIEND6_6 19 0.97 0.87 0.33 0.00 1.00 1.00 1.00 1.00 ▁▁▁▁▇
FRIEND6_7 19 0.97 0.03 0.17 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▁
DEVICEA 0 1.00 1.57 7.10 1.00 1.00 1.00 1.00 98.00 ▇▁▁▁▁
DEVICEB 0 1.00 3.39 12.64 1.00 1.00 2.00 2.00 98.00 ▇▁▁▁▁
DEVICEC 0 1.00 1.22 3.57 1.00 1.00 1.00 1.00 98.00 ▇▁▁▁▁
DEVICED 0 1.00 2.06 9.37 1.00 1.00 1.00 1.00 98.00 ▇▁▁▁▁
HOMEWORKA 0 1.00 3.67 2.81 1.00 3.00 4.00 4.00 77.00 ▇▁▁▁▁
HOMEWORKB 0 1.00 3.30 6.12 1.00 2.00 3.00 4.00 98.00 ▇▁▁▁▁
HOMEWORKC 0 1.00 3.55 3.57 1.00 3.00 4.00 4.00 98.00 ▇▁▁▁▁
INTREQ 0 1.00 1.66 0.83 1.00 1.00 2.00 2.00 5.00 ▇▆▁▁▁
GAMING 0 1.00 1.10 0.29 1.00 1.00 1.00 1.00 2.00 ▇▁▁▁▁
SNS1_1 0 1.00 0.34 0.47 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▅
SNS1_2 0 1.00 0.72 0.45 0.00 0.00 1.00 1.00 1.00 ▃▁▁▁▇
SNS1_3 0 1.00 0.51 0.50 0.00 0.00 1.00 1.00 1.00 ▇▁▁▁▇
SNS1_4 0 1.00 0.71 0.46 0.00 0.00 1.00 1.00 1.00 ▃▁▁▁▇
SNS1_5 0 1.00 0.85 0.36 0.00 1.00 1.00 1.00 1.00 ▂▁▁▁▇
SNS1_6 0 1.00 0.08 0.28 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▁
SNS1_7 0 1.00 0.07 0.26 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▁
SNS1_8 0 1.00 0.03 0.17 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▁
SNS2 109 0.85 3.94 3.92 1.00 3.00 4.00 5.00 98.00 ▇▁▁▁▁
SOC1 0 1.00 2.35 4.54 1.00 1.00 2.00 3.00 98.00 ▇▁▁▁▁
SOC1A_GOOD_1 652 0.12 1.00 0.00 1.00 1.00 1.00 1.00 1.00 ▁▁▇▁▁
SOC1A_GOOD_2 718 0.03 1.00 0.00 1.00 1.00 1.00 1.00 1.00 ▁▁▇▁▁
SOC1A_GOOD_3 734 0.01 1.00 0.00 1.00 1.00 1.00 1.00 1.00 ▁▁▇▁▁
SOC1A_GOOD_4 729 0.02 1.00 0.00 1.00 1.00 1.00 1.00 1.00 ▁▁▇▁▁
SOC1A_GOOD_5 715 0.04 1.00 0.00 1.00 1.00 1.00 1.00 1.00 ▁▁▇▁▁
SOC1A_GOOD_6 726 0.02 1.00 0.00 1.00 1.00 1.00 1.00 1.00 ▁▁▇▁▁
SOC1A_GOOD_7 726 0.02 1.00 0.00 1.00 1.00 1.00 1.00 1.00 ▁▁▇▁▁
SOC1A_BAD_1 732 0.01 1.00 0.00 1.00 1.00 1.00 1.00 1.00 ▁▁▇▁▁
SOC1A_BAD_2 686 0.08 1.00 0.00 1.00 1.00 1.00 1.00 1.00 ▁▁▇▁▁
SOC1A_BAD_3 713 0.04 1.00 0.00 1.00 1.00 1.00 1.00 1.00 ▁▁▇▁▁
SOC1A_BAD_4 705 0.05 1.00 0.00 1.00 1.00 1.00 1.00 1.00 ▁▁▇▁▁
SOC1A_BAD_5 717 0.03 1.00 0.00 1.00 1.00 1.00 1.00 1.00 ▁▁▇▁▁
SOC1A_BAD_6 718 0.03 1.00 0.00 1.00 1.00 1.00 1.00 1.00 ▁▁▇▁▁
SOC1A_BAD_7 732 0.01 1.00 0.00 1.00 1.00 1.00 1.00 1.00 ▁▁▇▁▁
SOC1A_OTHER 699 0.06 98.00 0.00 98.00 98.00 98.00 98.00 98.00 ▁▁▇▁▁
SOC1A_DK_REF 730 0.02 99.00 0.00 99.00 99.00 99.00 99.00 99.00 ▁▁▇▁▁
POST1A 23 0.97 3.14 7.17 1.00 2.00 3.00 4.00 98.00 ▇▁▁▁▁
POST1B 23 0.97 3.29 7.15 1.00 2.00 3.00 4.00 98.00 ▇▁▁▁▁
POST1C 23 0.97 3.10 6.22 1.00 2.00 3.00 4.00 98.00 ▇▁▁▁▁
POST1D 23 0.97 2.89 5.83 1.00 2.00 2.00 3.00 98.00 ▇▁▁▁▁
POST1E 23 0.97 3.60 6.77 1.00 2.00 3.00 4.00 98.00 ▇▁▁▁▁
POST2_1 23 0.97 0.51 0.50 0.00 0.00 1.00 1.00 1.00 ▇▁▁▁▇
POST2_2 23 0.97 0.22 0.41 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▂
POST2_3 23 0.97 0.45 0.50 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▆
POST2_4 23 0.97 0.11 0.31 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▁
POST2_5 23 0.97 0.15 0.36 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▂
POST2_6 23 0.97 0.10 0.30 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▁
POST2_7 23 0.97 0.36 0.48 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▅
POST2_8 23 0.97 0.29 0.46 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▃
SOC2POSA 23 0.97 2.52 6.82 1.00 1.00 2.00 3.00 98.00 ▇▁▁▁▁
SOC2POSB 23 0.97 2.35 6.23 1.00 1.00 2.00 2.00 98.00 ▇▁▁▁▁
SOC2POSC 23 0.97 2.30 7.19 1.00 1.00 2.00 2.00 98.00 ▇▁▁▁▁
SOC2POSD 23 0.97 2.64 7.70 1.00 1.00 2.00 3.00 98.00 ▇▁▁▁▁
SOC2NEGA 23 0.97 3.09 6.17 1.00 2.00 3.00 3.00 98.00 ▇▁▁▁▁
SOC2NEGB 23 0.97 2.80 6.20 1.00 2.00 3.00 3.00 98.00 ▇▁▁▁▁
SOC2NEGC 24 0.97 2.99 6.79 1.00 2.00 3.00 3.00 98.00 ▇▁▁▁▁
SOC2NEGD 23 0.97 2.69 5.08 1.00 2.00 3.00 3.00 98.00 ▇▁▁▁▁
SOC4A 23 0.97 1.98 7.73 1.00 1.00 1.00 2.00 98.00 ▇▁▁▁▁
SOC4B 23 0.97 2.33 9.91 1.00 1.00 1.00 2.00 98.00 ▇▁▁▁▁
SOC4C 23 0.97 2.72 9.87 1.00 1.00 2.00 2.00 98.00 ▇▁▁▁▁
SOC4D 23 0.97 1.76 6.85 1.00 1.00 1.00 2.00 98.00 ▇▁▁▁▁
SOC5A 23 0.97 3.37 7.99 1.00 2.00 3.00 4.00 98.00 ▇▁▁▁▁
SOC5B 23 0.97 3.53 7.14 1.00 2.00 3.00 4.00 98.00 ▇▁▁▁▁
SOC5C 23 0.97 3.58 7.14 1.00 2.00 3.00 4.00 98.00 ▇▁▁▁▁
SOC6 23 0.97 2.84 5.10 1.00 2.00 3.00 3.00 98.00 ▇▁▁▁▁
SOC7_1 424 0.43 0.73 0.44 0.00 0.00 1.00 1.00 1.00 ▃▁▁▁▇
SOC7_2 424 0.43 0.41 0.49 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▆
SOC7_3 424 0.43 0.20 0.40 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▂
SOC7_4 424 0.43 0.45 0.50 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▇
SOC7_5 424 0.43 0.53 0.50 0.00 0.00 1.00 1.00 1.00 ▇▁▁▁▇
SOC7_6 424 0.43 0.07 0.26 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▁
SOCEXPA 0 1.00 8.32 21.17 1.00 1.00 2.00 3.00 98.00 ▇▁▁▁▁
SOCEXPB 0 1.00 9.25 21.30 1.00 2.00 3.00 3.00 98.00 ▇▁▁▁▁
SOCEXPC 0 1.00 6.76 18.42 1.00 1.00 2.00 3.00 98.00 ▇▁▁▁▁
SOCEXPD 0 1.00 7.65 20.01 1.00 1.00 2.00 3.00 98.00 ▇▁▁▁▁
WORRYA 22 0.97 1.83 0.94 1.00 1.00 1.00 3.00 3.00 ▇▁▁▁▆
WORRYB 23 0.97 2.14 3.70 1.00 1.00 2.00 3.00 98.00 ▇▁▁▁▁
WORRYC 71 0.90 2.47 5.29 1.00 1.00 2.00 3.00 98.00 ▇▁▁▁▁
LIMITA 22 0.97 1.48 0.50 1.00 1.00 1.00 2.00 2.00 ▇▁▁▁▇
LIMITB 23 0.97 1.41 0.49 1.00 1.00 1.00 2.00 2.00 ▇▁▁▁▆
LIMITC 71 0.90 1.51 3.76 1.00 1.00 1.00 2.00 98.00 ▇▁▁▁▁
CELL1_1 22 0.97 0.42 0.49 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▆
CELL1_2 22 0.97 0.15 0.36 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▂
CELL1_3 22 0.97 0.25 0.43 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▂
CELL1_4 22 0.97 0.13 0.34 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▁
CELL1_5 22 0.97 0.25 0.43 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▂
CELL1_6 22 0.97 0.31 0.46 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▃
CELL2A 22 0.97 1.81 0.80 1.00 1.00 2.00 2.00 4.00 ▇▇▁▂▁
CELL2B 22 0.97 1.90 0.77 1.00 1.00 2.00 2.00 4.00 ▅▇▁▂▁
CELL2C 22 0.97 1.58 0.75 1.00 1.00 1.00 2.00 4.00 ▇▅▁▁▁
CELL2D 22 0.97 2.78 3.67 1.00 2.00 3.00 3.00 98.00 ▇▁▁▁▁
CELL3A 22 0.97 2.61 5.13 1.00 2.00 2.00 3.00 98.00 ▇▁▁▁▁
CELL3B 22 0.97 2.29 6.27 1.00 1.00 2.00 3.00 98.00 ▇▁▁▁▁
CELL3C 22 0.97 3.13 3.67 1.00 2.00 3.00 4.00 98.00 ▇▁▁▁▁
DISTRACT 0 1.00 2.93 6.13 1.00 2.00 2.00 3.00 98.00 ▇▁▁▁▁
GROUP1 0 1.00 3.02 6.13 1.00 2.00 3.00 3.00 98.00 ▇▁▁▁▁
GROUP2_1 169 0.77 0.36 0.48 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▅
GROUP2_2 169 0.77 0.33 0.47 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▃
GROUP2_3 169 0.77 0.54 0.50 0.00 0.00 1.00 1.00 1.00 ▇▁▁▁▇
GROUP2_4 169 0.77 0.14 0.35 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▂
GROUP2_5 169 0.77 0.12 0.33 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▁
GROUP2_6 169 0.77 0.52 0.50 0.00 0.00 1.00 1.00 1.00 ▇▁▁▁▇
GROUP2_7 169 0.77 0.20 0.40 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▂
GROUP2_8 169 0.77 0.10 0.29 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▁
GROUP2_9 169 0.77 0.34 0.48 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▅
GROUP2_10 169 0.77 0.24 0.43 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▂
GROUP2_11 169 0.77 0.10 0.30 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▁
GROUP3A 169 0.77 3.19 9.46 1.00 2.00 2.00 3.00 98.00 ▇▁▁▁▁
GROUP3B 169 0.77 3.18 9.96 1.00 2.00 2.00 3.00 98.00 ▇▁▁▁▁
GROUP3C 169 0.77 3.10 9.46 1.00 2.00 2.00 3.00 98.00 ▇▁▁▁▁
GROUP3D 169 0.77 2.75 8.61 1.00 1.00 2.00 2.00 98.00 ▇▁▁▁▁
OH1A 0 1.00 2.00 7.10 1.00 1.00 1.00 2.00 98.00 ▇▁▁▁▁
OH1B 0 1.00 2.06 6.75 1.00 1.00 1.00 2.00 98.00 ▇▁▁▁▁
OH1C 0 1.00 1.74 5.05 1.00 1.00 1.00 2.00 98.00 ▇▁▁▁▁
OH1D 0 1.00 1.77 5.05 1.00 1.00 1.00 2.00 98.00 ▇▁▁▁▁
OH2A 0 1.00 3.79 9.24 1.00 2.00 3.00 4.00 98.00 ▇▁▁▁▁
OH2B 0 1.00 4.25 9.95 1.00 3.00 3.00 4.00 98.00 ▇▁▁▁▁
OH2C 0 1.00 3.96 10.59 1.00 2.00 3.00 4.00 98.00 ▇▁▁▁▁
OH2D 0 1.00 3.15 9.04 1.00 2.00 2.00 3.00 98.00 ▇▁▁▁▁
OH2E 0 1.00 3.42 8.31 1.00 2.00 3.00 3.00 98.00 ▇▁▁▁▁
OH2F 0 1.00 3.89 10.23 1.00 2.00 3.00 3.00 98.00 ▇▁▁▁▁
OH3_1 0 1.00 0.40 0.49 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▅
OH3_2 0 1.00 0.16 0.36 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▂
OH3_3 0 1.00 0.31 0.46 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▃
OH3_4 0 1.00 0.08 0.27 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▁
OH3_5 0 1.00 0.25 0.43 0.00 0.00 0.00 0.50 1.00 ▇▁▁▁▃
OH3_6 0 1.00 0.21 0.41 0.00 0.00 0.00 0.00 1.00 ▇▁▁▁▂
OH3_7 0 1.00 0.42 0.49 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▆
GUN1 0 1.00 2.44 3.65 1.00 2.00 2.00 3.00 98.00 ▇▁▁▁▁
GUN2A 0 1.00 2.01 6.17 1.00 1.00 1.00 2.00 98.00 ▇▁▁▁▁
GUN2B 0 1.00 2.48 6.19 1.00 1.00 2.00 3.00 98.00 ▇▁▁▁▁
GUN2C 0 1.00 3.27 5.72 1.00 2.00 3.00 4.00 98.00 ▇▁▁▁▁
GUN2D 0 1.00 2.36 8.39 1.00 1.00 1.00 2.00 98.00 ▇▁▁▁▁
GUN2E 0 1.00 2.31 6.17 1.00 1.00 2.00 2.00 98.00 ▇▁▁▁▁
GENDER 0 1.00 1.79 5.08 1.00 1.00 2.00 2.00 99.00 ▇▁▁▁▁
AGE 0 1.00 14.95 1.42 13.00 14.00 15.00 16.00 17.00 ▇▇▇▇▇
P_EDUC 0 1.00 11.71 8.67 1.00 10.00 11.00 12.00 98.00 ▇▁▁▁▁
RACETHNICITY 0 1.00 2.78 7.95 1.00 1.00 2.00 4.00 98.00 ▇▁▁▁▁
HOME_TYPE 0 1.00 1.63 0.96 1.00 1.00 1.00 2.00 4.00 ▇▁▁▂▁
HOUSING 0 1.00 1.43 0.53 1.00 1.00 1.00 2.00 3.00 ▇▁▆▁▁
INCOME 0 1.00 9.72 4.43 1.00 6.00 10.00 13.00 18.00 ▃▅▇▆▃
INTERNET 0 1.00 0.89 0.32 0.00 1.00 1.00 1.00 1.00 ▁▁▁▁▇
PHONESERVICE 0 1.00 3.14 1.01 1.00 2.00 4.00 4.00 5.00 ▁▆▂▇▁
METRO 0 1.00 0.90 0.30 0.00 1.00 1.00 1.00 1.00 ▁▁▁▁▇
REGION4 0 1.00 2.66 0.98 1.00 2.00 3.00 3.00 4.00 ▃▆▁▇▅
HHSIZE 0 1.00 4.26 1.27 1.00 3.00 4.00 5.00 6.00 ▂▅▇▅▆
HH01 0 1.00 0.16 0.45 0.00 0.00 0.00 0.00 2.00 ▇▁▁▁▁
HH25 0 1.00 0.21 0.49 0.00 0.00 0.00 0.00 2.00 ▇▁▂▁▁
HH612 0 1.00 0.71 0.89 0.00 0.00 0.00 1.00 4.00 ▇▅▂▁▁
HH1317 0 1.00 0.92 0.74 0.00 0.00 1.00 1.00 3.00 ▅▇▁▂▁
HH18OV 0 1.00 2.36 1.02 1.00 2.00 2.00 3.00 6.00 ▇▂▁▁▁
CO_DATE 0 1.00 20180335.78 36.78 20180307.00 20180310.00 20180322.00 20180330.00 20180410.00 ▇▁▁▁▂
DURATION 0 1.00 16.90 9.95 2.70 10.67 14.20 19.46 82.43 ▇▂▁▁▁
SURV_MODE 0 1.00 1.96 0.19 1.00 2.00 2.00 2.00 2.00 ▁▁▁▁▇
MODE_END 0 1.00 1.96 0.19 1.00 2.00 2.00 2.00 2.00 ▁▁▁▁▇
WEIGHT 0 1.00 0.99 0.99 0.02 0.37 0.67 1.25 8.24 ▇▂▁▁▁

Data 3

Introduction and data

  • Identify the source of the data.

    The World Bank - World Development Indicators

  • State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

    The WDI data base was created in the late 1970s as a branch of the World Development Report database. All the data is collected through the World Bank by its member countries. (source https://datatopics.worldbank.org/world-development-indicators/stories/world-development-indicators-the-story.html)

  • Write a brief description of the observations.

    I have 148 data observations, including annual net income, birthrate, populations, deathrate, etc.. from 265 countries or regions and then an data for the whole world for those specific categories.

Research question

  • A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
    • Does education save lives? Assessing if there a distinct correlation between rate of education and life expectancy?

    • Are there enough resources for the populations? - Assessing a relationship between GDP and birthrate/population.

  • A description of the research topic along with a concise statement of your hypotheses on this topic.
    • Access to education seems to be a determinant in an area’s prosperity. As students, many of us are in school to provide a better life for ourselves, to become prosperous in the future. I want to see if, in countries where education is more readily available, if life expectancy is higher.

      • There will be a distinct, positive correlation.
    • The one-child policy in India and China around the late 1900s into the 2000s inspired this research question. This policy was put in place because there are many countries that have rising populations, and not enough resources to handle the population numbers. I want to see if there is a correlation between a countries prosperity and their population, alongside their birthrate— if birthrate has risen as the country became more prosperous.

      • Hypothesized negative correlation.
  • Identify the types of variables in your research question. Categorical? Quantitative?
    • GDP (quantitative), population (quantitative), deathrate (quantitative), birthrate (quantitative), education rate (quantitative)

Glimpse of data

# add code here
#https://databank.worldbank.org/source/world-development-indicators

#world_bank = read.csv("worldbank.csv")
#skimr::skim(world_bank)