library(tidyverse)
library(skimr)
Awesome Evee
Proposal
Data 1
Introduction and data
Identify the source of the data.
- The source of the data is United States Census Beaureu Datasets from https://data.census.gov/table?tid=DECENNIALPL2020.P1.
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
- There are many datasets found, but the one I am specifically looking at Decennial Census P1|Race from 2020. This dataset was collected from a decennial census, in which The Census Bureau conducts a complete count of every person living in the United States every ten years, as mandated by the Constitution.
Write a brief description of the observations.
- In this dataset, it contains the total population of each US state as well as the amount of people belonging to a certain race or combination of races for each state.
Research question
- A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
Do certain parts of the country have higher concentrations of certain races/combinations of races
Do states with liberal/conservative governors hold more of a percentage of a certain race? (Requires input of column of US governors and political standing but this can be done easily)
- A description of the research topic along with a concise statement of your hypotheses on this topic.
My target population is people who want to study the geographical locations of different races in the US and if there are perhaps concentrations of certain races in certain parts of the country. Additionally, this could be for people who want to study how the primary political ideology of the state (which could be properly determined by looking at the political idology of statewide leaders) correlate to the number of residents of a certain race in that state.
Hypothesis 1: States closer to the north will have higher percentage of white population that states closer to the south.
Hypothesis 2: States with liberal governors will have a higher African American population percent compared to states with conservative governors.
- Identify the types of variables in your research question. Categorical? Quantitative?
- There are both categorical and quantitative variables in my research questions. The quantitative variables are the population numbers which will be converted into a percent to properly compare each state. The categorical variables are the different states and races as well as governors and political ideology.
Glimpse of data
# add code here
<- read.csv("data/DECENNIALPL2020.P1-2023-03-14T231744.csv")
race_census_raw
::skim(race_census_raw) skimr
Name | race_census_raw |
Number of rows | 71 |
Number of columns | 53 |
_______________________ | |
Column type frequency: | |
character | 53 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
Label..Grouping. | 0 | 1 | 6 | 147 | 0 | 71 | 0 |
Alabama | 0 | 1 | 1 | 9 | 0 | 64 | 0 |
Alaska | 0 | 1 | 1 | 7 | 0 | 63 | 0 |
Arizona | 0 | 1 | 1 | 9 | 0 | 68 | 0 |
Arkansas | 0 | 1 | 1 | 9 | 0 | 61 | 0 |
California | 0 | 1 | 2 | 10 | 0 | 70 | 0 |
Colorado | 0 | 1 | 1 | 9 | 0 | 69 | 0 |
Connecticut | 0 | 1 | 1 | 9 | 0 | 65 | 0 |
Delaware | 0 | 1 | 1 | 7 | 0 | 57 | 0 |
District.of.Columbia | 0 | 1 | 1 | 7 | 0 | 57 | 0 |
Florida | 0 | 1 | 2 | 10 | 0 | 69 | 0 |
Georgia | 0 | 1 | 1 | 10 | 0 | 67 | 0 |
Hawaii | 0 | 1 | 1 | 9 | 0 | 69 | 0 |
Idaho | 0 | 1 | 1 | 9 | 0 | 60 | 0 |
Illinois | 0 | 1 | 1 | 10 | 0 | 66 | 0 |
Indiana | 0 | 1 | 1 | 9 | 0 | 69 | 0 |
Iowa | 0 | 1 | 1 | 9 | 0 | 62 | 0 |
Kansas | 0 | 1 | 1 | 9 | 0 | 61 | 0 |
Kentucky | 0 | 1 | 1 | 9 | 0 | 64 | 0 |
Louisiana | 0 | 1 | 1 | 9 | 0 | 66 | 0 |
Maine | 0 | 1 | 1 | 9 | 0 | 55 | 0 |
Maryland | 0 | 1 | 1 | 9 | 0 | 68 | 0 |
Massachusetts | 0 | 1 | 1 | 9 | 0 | 65 | 0 |
Michigan | 0 | 1 | 1 | 10 | 0 | 65 | 0 |
Minnesota | 0 | 1 | 1 | 9 | 0 | 67 | 0 |
Mississippi | 0 | 1 | 1 | 9 | 0 | 62 | 0 |
Missouri | 0 | 1 | 1 | 9 | 0 | 66 | 0 |
Montana | 0 | 1 | 1 | 9 | 0 | 54 | 0 |
Nebraska | 0 | 1 | 1 | 9 | 0 | 64 | 0 |
Nevada | 0 | 1 | 1 | 9 | 0 | 68 | 0 |
New.Hampshire | 0 | 1 | 1 | 9 | 0 | 58 | 0 |
New.Jersey | 0 | 1 | 2 | 9 | 0 | 70 | 0 |
New.Mexico | 0 | 1 | 1 | 9 | 0 | 64 | 0 |
New.York | 0 | 1 | 2 | 10 | 0 | 69 | 0 |
North.Carolina | 0 | 1 | 1 | 10 | 0 | 68 | 0 |
North.Dakota | 0 | 1 | 1 | 7 | 0 | 56 | 0 |
Ohio | 0 | 1 | 1 | 10 | 0 | 68 | 0 |
Oklahoma | 0 | 1 | 1 | 9 | 0 | 63 | 0 |
Oregon | 0 | 1 | 1 | 9 | 0 | 66 | 0 |
Pennsylvania | 0 | 1 | 1 | 10 | 0 | 64 | 0 |
Rhode.Island | 0 | 1 | 1 | 9 | 0 | 62 | 0 |
South.Carolina | 0 | 1 | 1 | 9 | 0 | 65 | 0 |
South.Dakota | 0 | 1 | 1 | 7 | 0 | 55 | 0 |
Tennessee | 0 | 1 | 1 | 9 | 0 | 64 | 0 |
Texas | 0 | 1 | 1 | 10 | 0 | 69 | 0 |
Utah | 0 | 1 | 1 | 9 | 0 | 67 | 0 |
Vermont | 0 | 1 | 1 | 7 | 0 | 50 | 0 |
Virginia | 0 | 1 | 1 | 9 | 0 | 68 | 0 |
Washington | 0 | 1 | 1 | 9 | 0 | 70 | 0 |
West.Virginia | 0 | 1 | 1 | 9 | 0 | 59 | 0 |
Wisconsin | 0 | 1 | 1 | 9 | 0 | 64 | 0 |
Wyoming | 0 | 1 | 1 | 7 | 0 | 51 | 0 |
Puerto.Rico | 0 | 1 | 1 | 9 | 0 | 62 | 0 |
Data 2
Introduction and data
Identify the source of the data.
- Billionaires CSV File - From the CORGIS Data-set Project
By Ryan Whitcomb - Version 2.0.0, created 5-17-16
- Billionaires CSV File - From the CORGIS Data-set Project
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
- Researchers have compiled a multi-decade database of the super-rich. Building off the Forbes World’s Billionaires lists from 1996-2014, scholars at Peterson Institute for International Economics have added a couple dozen more variables about each billionaire - including whether they were self-made or inherited their wealth. (Roughly half of European billionaires and one-third of U.S. billionaires got a significant financial boost from family, the authors estimate.)
Write a brief description of the observations.
- A compilation of data of billionaires. Displays the source of their wealth in the US, Europe, and other countries: whether their wealth is self-made, inherited.
Research question
A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
- How do billionaires between the US and Europe differ in the sources of their wealth, industries they work in, and worth in billions?
A description of the research topic along with a concise statement of your hypotheses on this topic.
This research topic explores different sources of wealth among billionaires, whether inherited or self-made, and how wealth is distributed among different industries, from technology to finance to commodity products. We will also research the differences in the super wealthy between America and Europe, whether they focus in different industries, differ in methods of becoming billionaires, reveal what is responsible for the wealth in each region, and how billionaires differ in worth among regions.
We hypothesize that more American billionaires are self made, working in greater diversity of industries, compared to European billionaires that become wealthy through inheritances and work mainly in technology or financial sectors.
Identify the types of variables in your research question. Categorical? Quantitative?
- company.sector - categorical
- wealth.how.industry - categorical
- location.region – categorical
- wealth.type - categorical
- wealth.how.inherited - categorical
- wealth.worth.in.billions - quantitative
Glimpse of data
# add code here
<- read_csv("data/billionaires.csv") billionaires
Rows: 2614 Columns: 22
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (13): name, company.name, company.relationship, company.sector, company....
dbl (6): rank, year, company.founded, demographics.age, location.gdp, wealt...
lgl (3): wealth.how.from emerging, wealth.how.was founder, wealth.how.was p...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
::skim(billionaires) skimr
Name | billionaires |
Number of rows | 2614 |
Number of columns | 22 |
_______________________ | |
Column type frequency: | |
character | 13 |
logical | 3 |
numeric | 6 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
name | 0 | 1.00 | 5 | 45 | 0 | 2077 | 0 |
company.name | 38 | 0.99 | 3 | 59 | 0 | 1576 | 0 |
company.relationship | 46 | 0.98 | 3 | 46 | 0 | 73 | 0 |
company.sector | 23 | 0.99 | 3 | 52 | 0 | 505 | 0 |
company.type | 36 | 0.99 | 3 | 22 | 0 | 15 | 0 |
demographics.gender | 34 | 0.99 | 4 | 14 | 0 | 3 | 0 |
location.citizenship | 0 | 1.00 | 4 | 20 | 0 | 73 | 0 |
location.country code | 0 | 1.00 | 3 | 6 | 0 | 74 | 0 |
location.region | 0 | 1.00 | 1 | 24 | 0 | 8 | 0 |
wealth.type | 22 | 0.99 | 9 | 24 | 0 | 5 | 0 |
wealth.how.category | 1 | 1.00 | 1 | 18 | 0 | 9 | 0 |
wealth.how.industry | 1 | 1.00 | 1 | 31 | 0 | 19 | 0 |
wealth.how.inherited | 0 | 1.00 | 6 | 24 | 0 | 6 | 0 |
Variable type: logical
skim_variable | n_missing | complete_rate | mean | count |
---|---|---|---|---|
wealth.how.from emerging | 0 | 1 | 1 | TRU: 2614 |
wealth.how.was founder | 0 | 1 | 1 | TRU: 2614 |
wealth.how.was political | 0 | 1 | 1 | TRU: 2614 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
rank | 0 | 1 | 5.996700e+02 | 4.678900e+02 | 1 | 215.0 | 430 | 9.880e+02 | 1.565e+03 | ▇▅▃▂▃ |
year | 0 | 1 | 2.008410e+03 | 7.480000e+00 | 1996 | 2001.0 | 2014 | 2.014e+03 | 2.014e+03 | ▂▂▁▁▇ |
company.founded | 0 | 1 | 1.924710e+03 | 2.437800e+02 | 0 | 1936.0 | 1963 | 1.985e+03 | 2.012e+03 | ▁▁▁▁▇ |
demographics.age | 0 | 1 | 5.334000e+01 | 2.533000e+01 | -42 | 47.0 | 59 | 7.000e+01 | 9.800e+01 | ▁▂▁▇▃ |
location.gdp | 0 | 1 | 1.769103e+12 | 3.547083e+12 | 0 | 0.0 | 0 | 7.250e+11 | 1.060e+13 | ▇▁▁▁▁ |
wealth.worth in billions | 0 | 1 | 3.530000e+00 | 5.090000e+00 | 1 | 1.4 | 2 | 3.500e+00 | 7.600e+01 | ▇▁▁▁▁ |
Data 3
Introduction and data
- Identify the source of the data.
Sam Donald through CORGIS
- State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
The data was published on 10/28/2022. The data was collected from coffee tasting professionals as they give their rankings on different coffees.
- Write a brief description of the observations.
The data includes observations about the location of where the coffee was grown, the production year, the details on how much coffee was tested, and the rankings given for a variety of different categories.
Research question
- A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
- How does altitude of farms affect coffee aroma?
- A description of the research topic along with a concise statement of your hypotheses on this topic.
- This research topic would focuses on coffee taste and smell. Coffee taste/smell is effected my many different factors, we will be looking at one (altitude). We believe that coffee grown at a higher altitude will make the aroma stronger.
- Identify the types of variables in your research question. Categorical? Quantitative?
- Quantitative: Aroma (1-10) & Quantitative: Altitude(ft/m)
Glimpse of data
# add code here
library(readr)
<- read_csv("data/coffee.csv") coffee
Rows: 989 Columns: 23
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): Location.Country, Location.Region, Data.Owner, Data.Type.Species, ...
dbl (16): Location.Altitude.Min, Location.Altitude.Max, Location.Altitude.Av...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
::skim(coffee) skimr
Name | coffee |
Number of rows | 989 |
Number of columns | 23 |
_______________________ | |
Column type frequency: | |
character | 7 |
numeric | 16 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
Location.Country | 0 | 1 | 4 | 28 | 0 | 32 | 0 |
Location.Region | 0 | 1 | 3 | 76 | 0 | 278 | 0 |
Data.Owner | 0 | 1 | 3 | 50 | 0 | 263 | 0 |
Data.Type.Species | 0 | 1 | 7 | 7 | 0 | 2 | 0 |
Data.Type.Variety | 0 | 1 | 3 | 21 | 0 | 28 | 0 |
Data.Type.Processing method | 0 | 1 | 3 | 25 | 0 | 6 | 0 |
Data.Color | 0 | 1 | 4 | 12 | 0 | 5 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Location.Altitude.Min | 0 | 1 | 1640.08 | 9192.52 | 0 | 905.00 | 1300.00 | 1550.00 | 190164.00 | ▇▁▁▁▁ |
Location.Altitude.Max | 0 | 1 | 1675.93 | 9191.96 | 0 | 950.00 | 1310.00 | 1600.00 | 190164.00 | ▇▁▁▁▁ |
Location.Altitude.Average | 0 | 1 | 1658.00 | 9192.06 | 0 | 950.00 | 1300.00 | 1600.00 | 190164.00 | ▇▁▁▁▁ |
Year | 0 | 1 | 2013.55 | 1.66 | 2010 | 2012.00 | 2013.00 | 2015.00 | 2018.00 | ▁▇▃▃▁ |
Data.Production.Number of bags | 0 | 1 | 151.76 | 125.67 | 1 | 15.00 | 170.00 | 275.00 | 600.00 | ▇▁▇▁▁ |
Data.Production.Bag weight | 0 | 1 | 210.49 | 1666.71 | 0 | 1.00 | 60.00 | 69.00 | 19200.00 | ▇▁▁▁▁ |
Data.Scores.Aroma | 0 | 1 | 7.57 | 0.40 | 0 | 7.42 | 7.58 | 7.75 | 8.75 | ▁▁▁▁▇ |
Data.Scores.Flavor | 0 | 1 | 7.52 | 0.42 | 0 | 7.33 | 7.50 | 7.75 | 8.83 | ▁▁▁▁▇ |
Data.Scores.Aftertaste | 0 | 1 | 7.39 | 0.43 | 0 | 7.25 | 7.42 | 7.58 | 8.67 | ▁▁▁▁▇ |
Data.Scores.Acidity | 0 | 1 | 7.54 | 0.40 | 0 | 7.33 | 7.58 | 7.75 | 8.75 | ▁▁▁▁▇ |
Data.Scores.Body | 0 | 1 | 7.51 | 0.39 | 0 | 7.33 | 7.50 | 7.67 | 8.50 | ▁▁▁▁▇ |
Data.Scores.Balance | 0 | 1 | 7.50 | 0.43 | 0 | 7.33 | 7.50 | 7.75 | 8.58 | ▁▁▁▁▇ |
Data.Scores.Uniformity | 0 | 1 | 9.82 | 0.59 | 0 | 10.00 | 10.00 | 10.00 | 10.00 | ▁▁▁▁▇ |
Data.Scores.Sweetness | 0 | 1 | 9.83 | 0.69 | 0 | 10.00 | 10.00 | 10.00 | 10.00 | ▁▁▁▁▇ |
Data.Scores.Moisture | 0 | 1 | 0.09 | 0.04 | 0 | 0.10 | 0.11 | 0.12 | 0.28 | ▃▇▆▁▁ |
Data.Scores.Total | 0 | 1 | 81.97 | 3.86 | 0 | 81.08 | 82.50 | 83.58 | 90.58 | ▁▁▁▁▇ |