Awesome Evee

Proposal

library(tidyverse)
library(skimr)

Data 1

Introduction and data

  • Identify the source of the data.

    • The source of the data is United States Census Beaureu Datasets from https://data.census.gov/table?tid=DECENNIALPL2020.P1.
  • State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

    • There are many datasets found, but the one I am specifically looking at Decennial Census P1|Race from 2020. This dataset was collected from a decennial census, in which The Census Bureau conducts a complete count of every person living in the United States every ten years, as mandated by the Constitution.
  • Write a brief description of the observations.

    • In this dataset, it contains the total population of each US state as well as the amount of people belonging to a certain race or combination of races for each state.

Research question

  • A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
    • Do certain parts of the country have higher concentrations of certain races/combinations of races

    • Do states with liberal/conservative governors hold more of a percentage of a certain race? (Requires input of column of US governors and political standing but this can be done easily)

  • A description of the research topic along with a concise statement of your hypotheses on this topic.
    • My target population is people who want to study the geographical locations of different races in the US and if there are perhaps concentrations of certain races in certain parts of the country. Additionally, this could be for people who want to study how the primary political ideology of the state (which could be properly determined by looking at the political idology of statewide leaders) correlate to the number of residents of a certain race in that state.

    • Hypothesis 1: States closer to the north will have higher percentage of white population that states closer to the south.

    • Hypothesis 2: States with liberal governors will have a higher African American population percent compared to states with conservative governors.

  • Identify the types of variables in your research question. Categorical? Quantitative?
    • There are both categorical and quantitative variables in my research questions. The quantitative variables are the population numbers which will be converted into a percent to properly compare each state. The categorical variables are the different states and races as well as governors and political ideology.

Glimpse of data

# add code here
race_census_raw <- read.csv("data/DECENNIALPL2020.P1-2023-03-14T231744.csv")

skimr::skim(race_census_raw)
Data summary
Name race_census_raw
Number of rows 71
Number of columns 53
_______________________
Column type frequency:
character 53
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Label..Grouping. 0 1 6 147 0 71 0
Alabama 0 1 1 9 0 64 0
Alaska 0 1 1 7 0 63 0
Arizona 0 1 1 9 0 68 0
Arkansas 0 1 1 9 0 61 0
California 0 1 2 10 0 70 0
Colorado 0 1 1 9 0 69 0
Connecticut 0 1 1 9 0 65 0
Delaware 0 1 1 7 0 57 0
District.of.Columbia 0 1 1 7 0 57 0
Florida 0 1 2 10 0 69 0
Georgia 0 1 1 10 0 67 0
Hawaii 0 1 1 9 0 69 0
Idaho 0 1 1 9 0 60 0
Illinois 0 1 1 10 0 66 0
Indiana 0 1 1 9 0 69 0
Iowa 0 1 1 9 0 62 0
Kansas 0 1 1 9 0 61 0
Kentucky 0 1 1 9 0 64 0
Louisiana 0 1 1 9 0 66 0
Maine 0 1 1 9 0 55 0
Maryland 0 1 1 9 0 68 0
Massachusetts 0 1 1 9 0 65 0
Michigan 0 1 1 10 0 65 0
Minnesota 0 1 1 9 0 67 0
Mississippi 0 1 1 9 0 62 0
Missouri 0 1 1 9 0 66 0
Montana 0 1 1 9 0 54 0
Nebraska 0 1 1 9 0 64 0
Nevada 0 1 1 9 0 68 0
New.Hampshire 0 1 1 9 0 58 0
New.Jersey 0 1 2 9 0 70 0
New.Mexico 0 1 1 9 0 64 0
New.York 0 1 2 10 0 69 0
North.Carolina 0 1 1 10 0 68 0
North.Dakota 0 1 1 7 0 56 0
Ohio 0 1 1 10 0 68 0
Oklahoma 0 1 1 9 0 63 0
Oregon 0 1 1 9 0 66 0
Pennsylvania 0 1 1 10 0 64 0
Rhode.Island 0 1 1 9 0 62 0
South.Carolina 0 1 1 9 0 65 0
South.Dakota 0 1 1 7 0 55 0
Tennessee 0 1 1 9 0 64 0
Texas 0 1 1 10 0 69 0
Utah 0 1 1 9 0 67 0
Vermont 0 1 1 7 0 50 0
Virginia 0 1 1 9 0 68 0
Washington 0 1 1 9 0 70 0
West.Virginia 0 1 1 9 0 59 0
Wisconsin 0 1 1 9 0 64 0
Wyoming 0 1 1 7 0 51 0
Puerto.Rico 0 1 1 9 0 62 0

Data 2

Introduction and data

  • Identify the source of the data.

    • Billionaires CSV File - From the CORGIS Data-set Project
      By Ryan Whitcomb - Version 2.0.0, created 5-17-16
  • State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

    • Researchers have compiled a multi-decade database of the super-rich. Building off the Forbes World’s Billionaires lists from 1996-2014, scholars at Peterson Institute for International Economics have added a couple dozen more variables about each billionaire - including whether they were self-made or inherited their wealth. (Roughly half of European billionaires and one-third of U.S. billionaires got a significant financial boost from family, the authors estimate.)
  • Write a brief description of the observations.

    • A compilation of data of billionaires. Displays the source of their wealth in the US, Europe, and other countries: whether their wealth is self-made, inherited.

Research question

  • A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)

    • How do billionaires between the US and Europe differ in the sources of their wealth, industries they work in, and worth in billions?
  • A description of the research topic along with a concise statement of your hypotheses on this topic.

    • This research topic explores different sources of wealth among billionaires, whether inherited or self-made, and how wealth is distributed among different industries, from technology to finance to commodity products. We will also research the differences in the super wealthy between America and Europe, whether they focus in different industries, differ in methods of becoming billionaires, reveal what is responsible for the wealth in each region, and how billionaires differ in worth among regions.

    • We hypothesize that more American billionaires are self made, working in greater diversity of industries, compared to European billionaires that become wealthy through inheritances and work mainly in technology or financial sectors.

  • Identify the types of variables in your research question. Categorical? Quantitative?

    • company.sector - categorical
    • wealth.how.industry - categorical
    • location.region – categorical
    • wealth.type - categorical
    • wealth.how.inherited - categorical
    • wealth.worth.in.billions - quantitative

Glimpse of data

# add code here
billionaires <- read_csv("data/billionaires.csv")
Rows: 2614 Columns: 22
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (13): name, company.name, company.relationship, company.sector, company....
dbl  (6): rank, year, company.founded, demographics.age, location.gdp, wealt...
lgl  (3): wealth.how.from emerging, wealth.how.was founder, wealth.how.was p...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
skimr::skim(billionaires)
Data summary
Name billionaires
Number of rows 2614
Number of columns 22
_______________________
Column type frequency:
character 13
logical 3
numeric 6
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
name 0 1.00 5 45 0 2077 0
company.name 38 0.99 3 59 0 1576 0
company.relationship 46 0.98 3 46 0 73 0
company.sector 23 0.99 3 52 0 505 0
company.type 36 0.99 3 22 0 15 0
demographics.gender 34 0.99 4 14 0 3 0
location.citizenship 0 1.00 4 20 0 73 0
location.country code 0 1.00 3 6 0 74 0
location.region 0 1.00 1 24 0 8 0
wealth.type 22 0.99 9 24 0 5 0
wealth.how.category 1 1.00 1 18 0 9 0
wealth.how.industry 1 1.00 1 31 0 19 0
wealth.how.inherited 0 1.00 6 24 0 6 0

Variable type: logical

skim_variable n_missing complete_rate mean count
wealth.how.from emerging 0 1 1 TRU: 2614
wealth.how.was founder 0 1 1 TRU: 2614
wealth.how.was political 0 1 1 TRU: 2614

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
rank 0 1 5.996700e+02 4.678900e+02 1 215.0 430 9.880e+02 1.565e+03 ▇▅▃▂▃
year 0 1 2.008410e+03 7.480000e+00 1996 2001.0 2014 2.014e+03 2.014e+03 ▂▂▁▁▇
company.founded 0 1 1.924710e+03 2.437800e+02 0 1936.0 1963 1.985e+03 2.012e+03 ▁▁▁▁▇
demographics.age 0 1 5.334000e+01 2.533000e+01 -42 47.0 59 7.000e+01 9.800e+01 ▁▂▁▇▃
location.gdp 0 1 1.769103e+12 3.547083e+12 0 0.0 0 7.250e+11 1.060e+13 ▇▁▁▁▁
wealth.worth in billions 0 1 3.530000e+00 5.090000e+00 1 1.4 2 3.500e+00 7.600e+01 ▇▁▁▁▁

Data 3

Introduction and data

  • Identify the source of the data.

Sam Donald through CORGIS

  • State when and how it was originally collected (by the original data curator, not necessarily how you found the data).

The data was published on 10/28/2022. The data was collected from coffee tasting professionals as they give their rankings on different coffees.

  • Write a brief description of the observations.

The data includes observations about the location of where the coffee was grown, the production year, the details on how much coffee was tested, and the rankings given for a variety of different categories.

Research question

  • A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
    • How does altitude of farms affect coffee aroma?
  • A description of the research topic along with a concise statement of your hypotheses on this topic.
    • This research topic would focuses on coffee taste and smell. Coffee taste/smell is effected my many different factors, we will be looking at one (altitude). We believe that coffee grown at a higher altitude will make the aroma stronger.
  • Identify the types of variables in your research question. Categorical? Quantitative?
    • Quantitative: Aroma (1-10) & Quantitative: Altitude(ft/m)

Glimpse of data

# add code here
library(readr)
coffee <- read_csv("data/coffee.csv")
Rows: 989 Columns: 23
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (7): Location.Country, Location.Region, Data.Owner, Data.Type.Species, ...
dbl (16): Location.Altitude.Min, Location.Altitude.Max, Location.Altitude.Av...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
skimr::skim(coffee)
Data summary
Name coffee
Number of rows 989
Number of columns 23
_______________________
Column type frequency:
character 7
numeric 16
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Location.Country 0 1 4 28 0 32 0
Location.Region 0 1 3 76 0 278 0
Data.Owner 0 1 3 50 0 263 0
Data.Type.Species 0 1 7 7 0 2 0
Data.Type.Variety 0 1 3 21 0 28 0
Data.Type.Processing method 0 1 3 25 0 6 0
Data.Color 0 1 4 12 0 5 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Location.Altitude.Min 0 1 1640.08 9192.52 0 905.00 1300.00 1550.00 190164.00 ▇▁▁▁▁
Location.Altitude.Max 0 1 1675.93 9191.96 0 950.00 1310.00 1600.00 190164.00 ▇▁▁▁▁
Location.Altitude.Average 0 1 1658.00 9192.06 0 950.00 1300.00 1600.00 190164.00 ▇▁▁▁▁
Year 0 1 2013.55 1.66 2010 2012.00 2013.00 2015.00 2018.00 ▁▇▃▃▁
Data.Production.Number of bags 0 1 151.76 125.67 1 15.00 170.00 275.00 600.00 ▇▁▇▁▁
Data.Production.Bag weight 0 1 210.49 1666.71 0 1.00 60.00 69.00 19200.00 ▇▁▁▁▁
Data.Scores.Aroma 0 1 7.57 0.40 0 7.42 7.58 7.75 8.75 ▁▁▁▁▇
Data.Scores.Flavor 0 1 7.52 0.42 0 7.33 7.50 7.75 8.83 ▁▁▁▁▇
Data.Scores.Aftertaste 0 1 7.39 0.43 0 7.25 7.42 7.58 8.67 ▁▁▁▁▇
Data.Scores.Acidity 0 1 7.54 0.40 0 7.33 7.58 7.75 8.75 ▁▁▁▁▇
Data.Scores.Body 0 1 7.51 0.39 0 7.33 7.50 7.67 8.50 ▁▁▁▁▇
Data.Scores.Balance 0 1 7.50 0.43 0 7.33 7.50 7.75 8.58 ▁▁▁▁▇
Data.Scores.Uniformity 0 1 9.82 0.59 0 10.00 10.00 10.00 10.00 ▁▁▁▁▇
Data.Scores.Sweetness 0 1 9.83 0.69 0 10.00 10.00 10.00 10.00 ▁▁▁▁▇
Data.Scores.Moisture 0 1 0.09 0.04 0 0.10 0.11 0.12 0.28 ▃▇▆▁▁
Data.Scores.Total 0 1 81.97 3.86 0 81.08 82.50 83.58 90.58 ▁▁▁▁▇