library(tidyverse)
library(skimr)
library(rvest)
library(httr)INFO 2950 Final Project
Proposal
Data 1
Introduction and data
Identify the source of the data.
- the source of the data is the Reddit Developers platform, which provides access to Reddit’s API documentation and resources.
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
- The data was originally collected by Reddit, in which they collected data from its users and their interactions on the platform.
Write a brief description of the observations.
- The data available through the Reddit API includes information about posts, comments, users, subreddits, and more. Since this is an API, this data is not a fixed dataset with a fixed number of observations. Instead, it provides dynamic, real-time data from the Reddit platform.
Research question
- A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
What is the impact of Reddit’s community moderation practices on user engagement and retention?
What is the general response rate/frequency among Reddit posts, and what is the general relationship of respondents to the poster? Do strangers answer more frequently than friends?
How do subreddit rules and post removal policies affect user behavior on Reddit?
- A description of the research topic along with a concise statement of your hypotheses on this topic.
The research topic is the impact of Reddit’s community moderation practices on user engagement and retention.
hypothesis: Moderation practices have varying effects on user behavior depending on the type of subreddit or user group. For example, stricter moderation may be more effective in niche subreddits with a smaller and more engaged users, while looser moderation may be more effective in larger subreddits with more diverse users.
- Identify the types of variables in your research question. Categorical? Quantitative?
Categorical variables include:
Subreddit type (e.g., news, entertainment, sports, etc.)
User groups (e.g., moderators, regular users, banned users, etc.)
Moderation practices (e.g., strict, loose, clear, inconsistent, etc.)
Quantitative variables include:
User engagement measurements (e.g., number of posts, comments, upvotes, etc.)
User retention measurements (e.g., time spent on the platform, frequency of visits, etc.)
Subreddit size (e.g., number of subscribers, active users, etc.)
Time variables (e.g., length of bans, time since rule changes, etc.)
Glimpse of data
NULL
Data 2
Introduction and data
- Identify the source of the data.
The source of the data about teens cyberbullying comes from Pew Research Center which is a nonpartisan research center focused on providing qualitative and quantitate statistical analysis on social issues, public opinion, and demographic trends in the US and the world.
- State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
The information comes from their latest survey data from 2022 where participants are parents and teens using the NORC AmeriSpeak panel. The NORC AmeriSpeak is a national representative planel for US household. The samples of parents and teens were randomly sampled and contacted by US mail, telephone, and field interviews. The sampled participants were 1, 058 parents with 743 teens from the ages of 13 to 17.
- Write a brief description of the observations.
The observations of the data for the teens were gender, race, age-group, income, parent’s level of education. The observation of the data for the parents were parent gender, race, income, education level, teen age, and teen gender. More specifically there are 746 rows and 176 columns.
Research question
- A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
Research Question 1: How does a teen’s parent’s education level associated with the teen spreading false rumors and receiving explicit based on their gender. Research Question 2: How does parent’s concerns of their teen experiencing cyber bullying correlated with their parent’s income? Research Question 3: How is gender and race related to their parent’s concern?
- A description of the research topic along with a concise statement of your hypotheses on this topic.
Research Question 1: The association between a teen’s parent’s education level, gender, and the likelihood of spreading false rumors and receiving explicit content online. We can split the hypothesis into two types of hypothesis:
Null hypothesis: The teen’s parent’s education level is not associated with spreading false rumors and receiving explicit images for male and female teens. Alternative hypothesis: The teen’s parent’s education level is associated with spreading false rumors and receiving explicit images for male and female teens.
Research Question 2: The correlation between a parent’s income and their concern for their teen experiencing cyberbullying. We can split the hypothesis into two types of hypothesis:
Null hypothesis: No correlation exists between teens experiencing cyberbullying and their parent’s income and concerns. Alternative hypothesis: A correlation exists between teens experiencing cyberbullying and their parent’s income and concerns.
Research Question 3: The relationship between gender, race, and a parent’s concern for their teen’s well-being concerning online safety and social media use. We can split the hypothesis into two types of hypothesis: Null hypothesis: Teen’s gender and race are unrelated to their parent’s concerns. Alternative hypothesis: Teen’s gender and race are related to their parent’s concerns.
- Identify the types of variables in your research question. Categorical? Quantitative?
Research Question 1: Parent’s educational level (high school or less, some college, and college graduate+), U.S. parents of teens, and gender. Variable types: Categorical
Glimpse of data
# add code here
teen_cyberbullying_data = read.csv("march-7-April 10, 2018-Teensand TechSurvey- CSV.csv")
skimr::skim(teen_cyberbullying_data) | Name | teen_cyberbullying_data |
| Number of rows | 743 |
| Number of columns | 176 |
| _______________________ | |
| Column type frequency: | |
| character | 2 |
| numeric | 174 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| SURV_LANG | 0 | 1 | 2 | 2 | 0 | 2 | 0 |
| DEVICE | 0 | 1 | 6 | 10 | 0 | 4 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| CASEID | 0 | 1.00 | 1170.47 | 544.01 | 2.00 | 727.00 | 1430.00 | 1615.50 | 1801.00 | ▂▂▂▂▇ |
| FITIN | 0 | 1.00 | 1.97 | 6.36 | 1.00 | 1.00 | 2.00 | 2.00 | 98.00 | ▇▁▁▁▁ |
| FRIEND1 | 0 | 1.00 | 2.52 | 1.09 | 1.00 | 2.00 | 2.00 | 3.00 | 4.00 | ▆▇▁▇▇ |
| FRIEND2 | 0 | 1.00 | 1.79 | 3.67 | 1.00 | 1.00 | 1.00 | 2.00 | 98.00 | ▇▁▁▁▁ |
| FRIEND3 | 0 | 1.00 | 2.47 | 0.65 | 1.00 | 2.00 | 3.00 | 3.00 | 3.00 | ▁▁▅▁▇ |
| FRIEND4_1 | 0 | 1.00 | 0.18 | 0.39 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| FRIEND4_2 | 0 | 1.00 | 0.29 | 0.45 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▃ |
| FRIEND4_3 | 0 | 1.00 | 0.37 | 0.48 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▅ |
| FRIEND4_4 | 0 | 1.00 | 0.43 | 0.49 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▆ |
| FRIEND4_5 | 0 | 1.00 | 0.36 | 0.48 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▅ |
| FRIEND4_6 | 0 | 1.00 | 0.11 | 0.32 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| FRIEND5 | 0 | 1.00 | 2.23 | 0.57 | 1.00 | 2.00 | 2.00 | 2.00 | 4.00 | ▁▇▁▂▁ |
| FRIEND6_1 | 19 | 0.97 | 0.62 | 0.48 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 | ▅▁▁▁▇ |
| FRIEND6_2 | 19 | 0.97 | 0.61 | 0.49 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 | ▅▁▁▁▇ |
| FRIEND6_3 | 19 | 0.97 | 0.49 | 0.50 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▇ |
| FRIEND6_4 | 19 | 0.97 | 0.15 | 0.36 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| FRIEND6_5 | 19 | 0.97 | 0.34 | 0.47 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▅ |
| FRIEND6_6 | 19 | 0.97 | 0.87 | 0.33 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▁▁▁▁▇ |
| FRIEND6_7 | 19 | 0.97 | 0.03 | 0.17 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| DEVICEA | 0 | 1.00 | 1.57 | 7.10 | 1.00 | 1.00 | 1.00 | 1.00 | 98.00 | ▇▁▁▁▁ |
| DEVICEB | 0 | 1.00 | 3.39 | 12.64 | 1.00 | 1.00 | 2.00 | 2.00 | 98.00 | ▇▁▁▁▁ |
| DEVICEC | 0 | 1.00 | 1.22 | 3.57 | 1.00 | 1.00 | 1.00 | 1.00 | 98.00 | ▇▁▁▁▁ |
| DEVICED | 0 | 1.00 | 2.06 | 9.37 | 1.00 | 1.00 | 1.00 | 1.00 | 98.00 | ▇▁▁▁▁ |
| HOMEWORKA | 0 | 1.00 | 3.67 | 2.81 | 1.00 | 3.00 | 4.00 | 4.00 | 77.00 | ▇▁▁▁▁ |
| HOMEWORKB | 0 | 1.00 | 3.30 | 6.12 | 1.00 | 2.00 | 3.00 | 4.00 | 98.00 | ▇▁▁▁▁ |
| HOMEWORKC | 0 | 1.00 | 3.55 | 3.57 | 1.00 | 3.00 | 4.00 | 4.00 | 98.00 | ▇▁▁▁▁ |
| INTREQ | 0 | 1.00 | 1.66 | 0.83 | 1.00 | 1.00 | 2.00 | 2.00 | 5.00 | ▇▆▁▁▁ |
| GAMING | 0 | 1.00 | 1.10 | 0.29 | 1.00 | 1.00 | 1.00 | 1.00 | 2.00 | ▇▁▁▁▁ |
| SNS1_1 | 0 | 1.00 | 0.34 | 0.47 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▅ |
| SNS1_2 | 0 | 1.00 | 0.72 | 0.45 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 | ▃▁▁▁▇ |
| SNS1_3 | 0 | 1.00 | 0.51 | 0.50 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 | ▇▁▁▁▇ |
| SNS1_4 | 0 | 1.00 | 0.71 | 0.46 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 | ▃▁▁▁▇ |
| SNS1_5 | 0 | 1.00 | 0.85 | 0.36 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▂▁▁▁▇ |
| SNS1_6 | 0 | 1.00 | 0.08 | 0.28 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| SNS1_7 | 0 | 1.00 | 0.07 | 0.26 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| SNS1_8 | 0 | 1.00 | 0.03 | 0.17 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| SNS2 | 109 | 0.85 | 3.94 | 3.92 | 1.00 | 3.00 | 4.00 | 5.00 | 98.00 | ▇▁▁▁▁ |
| SOC1 | 0 | 1.00 | 2.35 | 4.54 | 1.00 | 1.00 | 2.00 | 3.00 | 98.00 | ▇▁▁▁▁ |
| SOC1A_GOOD_1 | 652 | 0.12 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▁▁▇▁▁ |
| SOC1A_GOOD_2 | 718 | 0.03 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▁▁▇▁▁ |
| SOC1A_GOOD_3 | 734 | 0.01 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▁▁▇▁▁ |
| SOC1A_GOOD_4 | 729 | 0.02 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▁▁▇▁▁ |
| SOC1A_GOOD_5 | 715 | 0.04 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▁▁▇▁▁ |
| SOC1A_GOOD_6 | 726 | 0.02 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▁▁▇▁▁ |
| SOC1A_GOOD_7 | 726 | 0.02 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▁▁▇▁▁ |
| SOC1A_BAD_1 | 732 | 0.01 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▁▁▇▁▁ |
| SOC1A_BAD_2 | 686 | 0.08 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▁▁▇▁▁ |
| SOC1A_BAD_3 | 713 | 0.04 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▁▁▇▁▁ |
| SOC1A_BAD_4 | 705 | 0.05 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▁▁▇▁▁ |
| SOC1A_BAD_5 | 717 | 0.03 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▁▁▇▁▁ |
| SOC1A_BAD_6 | 718 | 0.03 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▁▁▇▁▁ |
| SOC1A_BAD_7 | 732 | 0.01 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▁▁▇▁▁ |
| SOC1A_OTHER | 699 | 0.06 | 98.00 | 0.00 | 98.00 | 98.00 | 98.00 | 98.00 | 98.00 | ▁▁▇▁▁ |
| SOC1A_DK_REF | 730 | 0.02 | 99.00 | 0.00 | 99.00 | 99.00 | 99.00 | 99.00 | 99.00 | ▁▁▇▁▁ |
| POST1A | 23 | 0.97 | 3.14 | 7.17 | 1.00 | 2.00 | 3.00 | 4.00 | 98.00 | ▇▁▁▁▁ |
| POST1B | 23 | 0.97 | 3.29 | 7.15 | 1.00 | 2.00 | 3.00 | 4.00 | 98.00 | ▇▁▁▁▁ |
| POST1C | 23 | 0.97 | 3.10 | 6.22 | 1.00 | 2.00 | 3.00 | 4.00 | 98.00 | ▇▁▁▁▁ |
| POST1D | 23 | 0.97 | 2.89 | 5.83 | 1.00 | 2.00 | 2.00 | 3.00 | 98.00 | ▇▁▁▁▁ |
| POST1E | 23 | 0.97 | 3.60 | 6.77 | 1.00 | 2.00 | 3.00 | 4.00 | 98.00 | ▇▁▁▁▁ |
| POST2_1 | 23 | 0.97 | 0.51 | 0.50 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 | ▇▁▁▁▇ |
| POST2_2 | 23 | 0.97 | 0.22 | 0.41 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| POST2_3 | 23 | 0.97 | 0.45 | 0.50 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▆ |
| POST2_4 | 23 | 0.97 | 0.11 | 0.31 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| POST2_5 | 23 | 0.97 | 0.15 | 0.36 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| POST2_6 | 23 | 0.97 | 0.10 | 0.30 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| POST2_7 | 23 | 0.97 | 0.36 | 0.48 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▅ |
| POST2_8 | 23 | 0.97 | 0.29 | 0.46 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▃ |
| SOC2POSA | 23 | 0.97 | 2.52 | 6.82 | 1.00 | 1.00 | 2.00 | 3.00 | 98.00 | ▇▁▁▁▁ |
| SOC2POSB | 23 | 0.97 | 2.35 | 6.23 | 1.00 | 1.00 | 2.00 | 2.00 | 98.00 | ▇▁▁▁▁ |
| SOC2POSC | 23 | 0.97 | 2.30 | 7.19 | 1.00 | 1.00 | 2.00 | 2.00 | 98.00 | ▇▁▁▁▁ |
| SOC2POSD | 23 | 0.97 | 2.64 | 7.70 | 1.00 | 1.00 | 2.00 | 3.00 | 98.00 | ▇▁▁▁▁ |
| SOC2NEGA | 23 | 0.97 | 3.09 | 6.17 | 1.00 | 2.00 | 3.00 | 3.00 | 98.00 | ▇▁▁▁▁ |
| SOC2NEGB | 23 | 0.97 | 2.80 | 6.20 | 1.00 | 2.00 | 3.00 | 3.00 | 98.00 | ▇▁▁▁▁ |
| SOC2NEGC | 24 | 0.97 | 2.99 | 6.79 | 1.00 | 2.00 | 3.00 | 3.00 | 98.00 | ▇▁▁▁▁ |
| SOC2NEGD | 23 | 0.97 | 2.69 | 5.08 | 1.00 | 2.00 | 3.00 | 3.00 | 98.00 | ▇▁▁▁▁ |
| SOC4A | 23 | 0.97 | 1.98 | 7.73 | 1.00 | 1.00 | 1.00 | 2.00 | 98.00 | ▇▁▁▁▁ |
| SOC4B | 23 | 0.97 | 2.33 | 9.91 | 1.00 | 1.00 | 1.00 | 2.00 | 98.00 | ▇▁▁▁▁ |
| SOC4C | 23 | 0.97 | 2.72 | 9.87 | 1.00 | 1.00 | 2.00 | 2.00 | 98.00 | ▇▁▁▁▁ |
| SOC4D | 23 | 0.97 | 1.76 | 6.85 | 1.00 | 1.00 | 1.00 | 2.00 | 98.00 | ▇▁▁▁▁ |
| SOC5A | 23 | 0.97 | 3.37 | 7.99 | 1.00 | 2.00 | 3.00 | 4.00 | 98.00 | ▇▁▁▁▁ |
| SOC5B | 23 | 0.97 | 3.53 | 7.14 | 1.00 | 2.00 | 3.00 | 4.00 | 98.00 | ▇▁▁▁▁ |
| SOC5C | 23 | 0.97 | 3.58 | 7.14 | 1.00 | 2.00 | 3.00 | 4.00 | 98.00 | ▇▁▁▁▁ |
| SOC6 | 23 | 0.97 | 2.84 | 5.10 | 1.00 | 2.00 | 3.00 | 3.00 | 98.00 | ▇▁▁▁▁ |
| SOC7_1 | 424 | 0.43 | 0.73 | 0.44 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 | ▃▁▁▁▇ |
| SOC7_2 | 424 | 0.43 | 0.41 | 0.49 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▆ |
| SOC7_3 | 424 | 0.43 | 0.20 | 0.40 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| SOC7_4 | 424 | 0.43 | 0.45 | 0.50 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▇ |
| SOC7_5 | 424 | 0.43 | 0.53 | 0.50 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 | ▇▁▁▁▇ |
| SOC7_6 | 424 | 0.43 | 0.07 | 0.26 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| SOCEXPA | 0 | 1.00 | 8.32 | 21.17 | 1.00 | 1.00 | 2.00 | 3.00 | 98.00 | ▇▁▁▁▁ |
| SOCEXPB | 0 | 1.00 | 9.25 | 21.30 | 1.00 | 2.00 | 3.00 | 3.00 | 98.00 | ▇▁▁▁▁ |
| SOCEXPC | 0 | 1.00 | 6.76 | 18.42 | 1.00 | 1.00 | 2.00 | 3.00 | 98.00 | ▇▁▁▁▁ |
| SOCEXPD | 0 | 1.00 | 7.65 | 20.01 | 1.00 | 1.00 | 2.00 | 3.00 | 98.00 | ▇▁▁▁▁ |
| WORRYA | 22 | 0.97 | 1.83 | 0.94 | 1.00 | 1.00 | 1.00 | 3.00 | 3.00 | ▇▁▁▁▆ |
| WORRYB | 23 | 0.97 | 2.14 | 3.70 | 1.00 | 1.00 | 2.00 | 3.00 | 98.00 | ▇▁▁▁▁ |
| WORRYC | 71 | 0.90 | 2.47 | 5.29 | 1.00 | 1.00 | 2.00 | 3.00 | 98.00 | ▇▁▁▁▁ |
| LIMITA | 22 | 0.97 | 1.48 | 0.50 | 1.00 | 1.00 | 1.00 | 2.00 | 2.00 | ▇▁▁▁▇ |
| LIMITB | 23 | 0.97 | 1.41 | 0.49 | 1.00 | 1.00 | 1.00 | 2.00 | 2.00 | ▇▁▁▁▆ |
| LIMITC | 71 | 0.90 | 1.51 | 3.76 | 1.00 | 1.00 | 1.00 | 2.00 | 98.00 | ▇▁▁▁▁ |
| CELL1_1 | 22 | 0.97 | 0.42 | 0.49 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▆ |
| CELL1_2 | 22 | 0.97 | 0.15 | 0.36 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| CELL1_3 | 22 | 0.97 | 0.25 | 0.43 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| CELL1_4 | 22 | 0.97 | 0.13 | 0.34 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| CELL1_5 | 22 | 0.97 | 0.25 | 0.43 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| CELL1_6 | 22 | 0.97 | 0.31 | 0.46 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▃ |
| CELL2A | 22 | 0.97 | 1.81 | 0.80 | 1.00 | 1.00 | 2.00 | 2.00 | 4.00 | ▇▇▁▂▁ |
| CELL2B | 22 | 0.97 | 1.90 | 0.77 | 1.00 | 1.00 | 2.00 | 2.00 | 4.00 | ▅▇▁▂▁ |
| CELL2C | 22 | 0.97 | 1.58 | 0.75 | 1.00 | 1.00 | 1.00 | 2.00 | 4.00 | ▇▅▁▁▁ |
| CELL2D | 22 | 0.97 | 2.78 | 3.67 | 1.00 | 2.00 | 3.00 | 3.00 | 98.00 | ▇▁▁▁▁ |
| CELL3A | 22 | 0.97 | 2.61 | 5.13 | 1.00 | 2.00 | 2.00 | 3.00 | 98.00 | ▇▁▁▁▁ |
| CELL3B | 22 | 0.97 | 2.29 | 6.27 | 1.00 | 1.00 | 2.00 | 3.00 | 98.00 | ▇▁▁▁▁ |
| CELL3C | 22 | 0.97 | 3.13 | 3.67 | 1.00 | 2.00 | 3.00 | 4.00 | 98.00 | ▇▁▁▁▁ |
| DISTRACT | 0 | 1.00 | 2.93 | 6.13 | 1.00 | 2.00 | 2.00 | 3.00 | 98.00 | ▇▁▁▁▁ |
| GROUP1 | 0 | 1.00 | 3.02 | 6.13 | 1.00 | 2.00 | 3.00 | 3.00 | 98.00 | ▇▁▁▁▁ |
| GROUP2_1 | 169 | 0.77 | 0.36 | 0.48 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▅ |
| GROUP2_2 | 169 | 0.77 | 0.33 | 0.47 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▃ |
| GROUP2_3 | 169 | 0.77 | 0.54 | 0.50 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 | ▇▁▁▁▇ |
| GROUP2_4 | 169 | 0.77 | 0.14 | 0.35 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| GROUP2_5 | 169 | 0.77 | 0.12 | 0.33 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| GROUP2_6 | 169 | 0.77 | 0.52 | 0.50 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 | ▇▁▁▁▇ |
| GROUP2_7 | 169 | 0.77 | 0.20 | 0.40 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| GROUP2_8 | 169 | 0.77 | 0.10 | 0.29 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| GROUP2_9 | 169 | 0.77 | 0.34 | 0.48 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▅ |
| GROUP2_10 | 169 | 0.77 | 0.24 | 0.43 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| GROUP2_11 | 169 | 0.77 | 0.10 | 0.30 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| GROUP3A | 169 | 0.77 | 3.19 | 9.46 | 1.00 | 2.00 | 2.00 | 3.00 | 98.00 | ▇▁▁▁▁ |
| GROUP3B | 169 | 0.77 | 3.18 | 9.96 | 1.00 | 2.00 | 2.00 | 3.00 | 98.00 | ▇▁▁▁▁ |
| GROUP3C | 169 | 0.77 | 3.10 | 9.46 | 1.00 | 2.00 | 2.00 | 3.00 | 98.00 | ▇▁▁▁▁ |
| GROUP3D | 169 | 0.77 | 2.75 | 8.61 | 1.00 | 1.00 | 2.00 | 2.00 | 98.00 | ▇▁▁▁▁ |
| OH1A | 0 | 1.00 | 2.00 | 7.10 | 1.00 | 1.00 | 1.00 | 2.00 | 98.00 | ▇▁▁▁▁ |
| OH1B | 0 | 1.00 | 2.06 | 6.75 | 1.00 | 1.00 | 1.00 | 2.00 | 98.00 | ▇▁▁▁▁ |
| OH1C | 0 | 1.00 | 1.74 | 5.05 | 1.00 | 1.00 | 1.00 | 2.00 | 98.00 | ▇▁▁▁▁ |
| OH1D | 0 | 1.00 | 1.77 | 5.05 | 1.00 | 1.00 | 1.00 | 2.00 | 98.00 | ▇▁▁▁▁ |
| OH2A | 0 | 1.00 | 3.79 | 9.24 | 1.00 | 2.00 | 3.00 | 4.00 | 98.00 | ▇▁▁▁▁ |
| OH2B | 0 | 1.00 | 4.25 | 9.95 | 1.00 | 3.00 | 3.00 | 4.00 | 98.00 | ▇▁▁▁▁ |
| OH2C | 0 | 1.00 | 3.96 | 10.59 | 1.00 | 2.00 | 3.00 | 4.00 | 98.00 | ▇▁▁▁▁ |
| OH2D | 0 | 1.00 | 3.15 | 9.04 | 1.00 | 2.00 | 2.00 | 3.00 | 98.00 | ▇▁▁▁▁ |
| OH2E | 0 | 1.00 | 3.42 | 8.31 | 1.00 | 2.00 | 3.00 | 3.00 | 98.00 | ▇▁▁▁▁ |
| OH2F | 0 | 1.00 | 3.89 | 10.23 | 1.00 | 2.00 | 3.00 | 3.00 | 98.00 | ▇▁▁▁▁ |
| OH3_1 | 0 | 1.00 | 0.40 | 0.49 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▅ |
| OH3_2 | 0 | 1.00 | 0.16 | 0.36 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| OH3_3 | 0 | 1.00 | 0.31 | 0.46 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▃ |
| OH3_4 | 0 | 1.00 | 0.08 | 0.27 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
| OH3_5 | 0 | 1.00 | 0.25 | 0.43 | 0.00 | 0.00 | 0.00 | 0.50 | 1.00 | ▇▁▁▁▃ |
| OH3_6 | 0 | 1.00 | 0.21 | 0.41 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| OH3_7 | 0 | 1.00 | 0.42 | 0.49 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▆ |
| GUN1 | 0 | 1.00 | 2.44 | 3.65 | 1.00 | 2.00 | 2.00 | 3.00 | 98.00 | ▇▁▁▁▁ |
| GUN2A | 0 | 1.00 | 2.01 | 6.17 | 1.00 | 1.00 | 1.00 | 2.00 | 98.00 | ▇▁▁▁▁ |
| GUN2B | 0 | 1.00 | 2.48 | 6.19 | 1.00 | 1.00 | 2.00 | 3.00 | 98.00 | ▇▁▁▁▁ |
| GUN2C | 0 | 1.00 | 3.27 | 5.72 | 1.00 | 2.00 | 3.00 | 4.00 | 98.00 | ▇▁▁▁▁ |
| GUN2D | 0 | 1.00 | 2.36 | 8.39 | 1.00 | 1.00 | 1.00 | 2.00 | 98.00 | ▇▁▁▁▁ |
| GUN2E | 0 | 1.00 | 2.31 | 6.17 | 1.00 | 1.00 | 2.00 | 2.00 | 98.00 | ▇▁▁▁▁ |
| GENDER | 0 | 1.00 | 1.79 | 5.08 | 1.00 | 1.00 | 2.00 | 2.00 | 99.00 | ▇▁▁▁▁ |
| AGE | 0 | 1.00 | 14.95 | 1.42 | 13.00 | 14.00 | 15.00 | 16.00 | 17.00 | ▇▇▇▇▇ |
| P_EDUC | 0 | 1.00 | 11.71 | 8.67 | 1.00 | 10.00 | 11.00 | 12.00 | 98.00 | ▇▁▁▁▁ |
| RACETHNICITY | 0 | 1.00 | 2.78 | 7.95 | 1.00 | 1.00 | 2.00 | 4.00 | 98.00 | ▇▁▁▁▁ |
| HOME_TYPE | 0 | 1.00 | 1.63 | 0.96 | 1.00 | 1.00 | 1.00 | 2.00 | 4.00 | ▇▁▁▂▁ |
| HOUSING | 0 | 1.00 | 1.43 | 0.53 | 1.00 | 1.00 | 1.00 | 2.00 | 3.00 | ▇▁▆▁▁ |
| INCOME | 0 | 1.00 | 9.72 | 4.43 | 1.00 | 6.00 | 10.00 | 13.00 | 18.00 | ▃▅▇▆▃ |
| INTERNET | 0 | 1.00 | 0.89 | 0.32 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▁▁▁▁▇ |
| PHONESERVICE | 0 | 1.00 | 3.14 | 1.01 | 1.00 | 2.00 | 4.00 | 4.00 | 5.00 | ▁▆▂▇▁ |
| METRO | 0 | 1.00 | 0.90 | 0.30 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▁▁▁▁▇ |
| REGION4 | 0 | 1.00 | 2.66 | 0.98 | 1.00 | 2.00 | 3.00 | 3.00 | 4.00 | ▃▆▁▇▅ |
| HHSIZE | 0 | 1.00 | 4.26 | 1.27 | 1.00 | 3.00 | 4.00 | 5.00 | 6.00 | ▂▅▇▅▆ |
| HH01 | 0 | 1.00 | 0.16 | 0.45 | 0.00 | 0.00 | 0.00 | 0.00 | 2.00 | ▇▁▁▁▁ |
| HH25 | 0 | 1.00 | 0.21 | 0.49 | 0.00 | 0.00 | 0.00 | 0.00 | 2.00 | ▇▁▂▁▁ |
| HH612 | 0 | 1.00 | 0.71 | 0.89 | 0.00 | 0.00 | 0.00 | 1.00 | 4.00 | ▇▅▂▁▁ |
| HH1317 | 0 | 1.00 | 0.92 | 0.74 | 0.00 | 0.00 | 1.00 | 1.00 | 3.00 | ▅▇▁▂▁ |
| HH18OV | 0 | 1.00 | 2.36 | 1.02 | 1.00 | 2.00 | 2.00 | 3.00 | 6.00 | ▇▂▁▁▁ |
| CO_DATE | 0 | 1.00 | 20180335.78 | 36.78 | 20180307.00 | 20180310.00 | 20180322.00 | 20180330.00 | 20180410.00 | ▇▁▁▁▂ |
| DURATION | 0 | 1.00 | 16.90 | 9.95 | 2.70 | 10.67 | 14.20 | 19.46 | 82.43 | ▇▂▁▁▁ |
| SURV_MODE | 0 | 1.00 | 1.96 | 0.19 | 1.00 | 2.00 | 2.00 | 2.00 | 2.00 | ▁▁▁▁▇ |
| MODE_END | 0 | 1.00 | 1.96 | 0.19 | 1.00 | 2.00 | 2.00 | 2.00 | 2.00 | ▁▁▁▁▇ |
| WEIGHT | 0 | 1.00 | 0.99 | 0.99 | 0.02 | 0.37 | 0.67 | 1.25 | 8.24 | ▇▂▁▁▁ |
Data 3
Introduction and data
Identify the source of the data.
The World Bank - World Development Indicators
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
The WDI data base was created in the late 1970s as a branch of the World Development Report database. All the data is collected through the World Bank by its member countries. (source https://datatopics.worldbank.org/world-development-indicators/stories/world-development-indicators-the-story.html)
Write a brief description of the observations.
I have 148 data observations, including annual net income, birthrate, populations, deathrate, etc.. from 265 countries or regions and then an data for the whole world for those specific categories.
Research question
- A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
Does education save lives? Assessing if there a distinct correlation between rate of education and life expectancy?
Are there enough resources for the populations? - Assessing a relationship between GDP and birthrate/population.
- A description of the research topic along with a concise statement of your hypotheses on this topic.
Access to education seems to be a determinant in an area’s prosperity. As students, many of us are in school to provide a better life for ourselves, to become prosperous in the future. I want to see if, in countries where education is more readily available, if life expectancy is higher.
- There will be a distinct, positive correlation.
The one-child policy in India and China around the late 1900s into the 2000s inspired this research question. This policy was put in place because there are many countries that have rising populations, and not enough resources to handle the population numbers. I want to see if there is a correlation between a countries prosperity and their population, alongside their birthrate— if birthrate has risen as the country became more prosperous.
- Hypothesized negative correlation.
- Identify the types of variables in your research question. Categorical? Quantitative?
- GDP (quantitative), population (quantitative), deathrate (quantitative), birthrate (quantitative), education rate (quantitative)
Glimpse of data
# add code here
#https://databank.worldbank.org/source/world-development-indicators
#world_bank = read.csv("worldbank.csv")
#skimr::skim(world_bank)