Accident severity in New York State

Proposal

library(tidyverse)
library(skimr)
library(readr)

Data 1 - Hurricanes

Introduction and data

  • This data set comes from the National Oceanic and Atmospheric Administration (NOAA)’s National Hurricane Center as part of the International Best Track Archive for Climate Stewardship (IBTrACS) project, which is “the most complete global collection of tropical cyclones available”.

  • This particular data set was formed over the last 3 years (2020 - 2022) as IBTrACS continuously merged data on tropical cyclones (hurricanes) from multiple weather agencies around the world, including all the World Meteorological Organization (WMO) Regional Specialized Meteorological Centres.

  • The observations in this data set are hurricane tracking data, with information for each hurricane reported in 3 hours intervals over the duration of the hurricane’s life cycle.

Research question

  • Question: What has been the relationship between the geographic region and strength of hurricanes between 2020 and 2022?
  • Importance: As we live in an era of global warming natural disaster have become all the more powerful, especially hurricanes. Thus it is important to see which geographic areas have recently experiences the highest intensity storms in order to understand how to best mitigate their effects.
  • Description and Hypothesis: The research topic aims to investigate the relationship between the geographic region and strength of hurricanes that occurred between 2020 and 2022. The focus of the study will be to examine whether there is a correlation between the location of hurricane formation and the intensity of the hurricane that forms in that region during the specified time period. To conduct this research, data on the strength and geographic location of hurricanes that occurred between 2020 and 2022 will be collected and analyzed. The study will utilize statistical methods to explore any potential patterns or trends in the data, and to determine whether there is a significant relationship between hurricane strength and location. Our hypothesis is that the North Atlantic region experiences the strongest hurricanes, based on data from past deadly hurricanes.
  • Variables:
    • BASIN: 7 general regions in which the hurricane can be located (categorical)

    • SUBBASIN: A group of sub-regions with each one falling into one of the 7 (categorical)

    • USA_WIND: Maximum sustained wind speed in knots: 0 - 300 kts (quantitative)

    • USA_PRES: Minimum sea level pressure, 850 - 1050 mb. (quantitative)

Glimpse of data

hurricanes <- read_csv("data/ibtracs.last3years.csv")
Rows: 18106 Columns: 163
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (141): SID, SEASON, BASIN, SUBBASIN, NAME, NATURE, LAT, LON, WMO_WIND, ...
dbl   (17): NUMBER, USA_SSHS, TOKYO_GRADE, TOKYO_R50_DIR, TOKYO_R30_DIR, TOK...
lgl    (4): DS824_STAGE, TD9636_STAGE, NEUMANN_CLASS, MLC_CLASS
dttm   (1): ISO_TIME

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
skimr::skim(hurricanes)
Data summary
Name hurricanes
Number of rows 18106
Number of columns 163
_______________________
Column type frequency:
character 141
logical 4
numeric 17
POSIXct 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
SID 1 1.00 13 13 0 344 0
SEASON 0 1.00 4 4 0 5 0
BASIN 3902 0.78 2 2 0 5 0
SUBBASIN 2954 0.84 2 2 0 8 0
NAME 1 1.00 3 16 0 270 0
NATURE 1 1.00 2 2 0 6 0
LAT 0 1.00 7 13 0 10575 0
LON 0 1.00 7 12 0 12828 0
WMO_WIND 12462 0.31 2 3 0 46 0
WMO_PRES 11783 0.35 2 4 0 106 0
WMO_AGENCY 11686 0.35 3 10 0 8 0
TRACK_TYPE 1 1.00 4 16 0 4 0
DIST2LAND 0 1.00 1 4 0 2495 0
LANDFALL 344 0.98 1 4 0 2446 0
IFLAG 1 1.00 14 14 0 93 0
USA_AGENCY 10021 0.45 4 14 0 9 0
USA_ATCF_ID 2243 0.88 8 8 0 311 0
USA_LAT 2313 0.87 7 13 0 7759 0
USA_LON 2313 0.87 7 12 0 9903 0
USA_RECORD 18043 0.00 1 1 0 2 0
USA_STATUS 4559 0.75 2 2 0 14 0
USA_WIND 2410 0.87 2 3 0 127 0
USA_PRES 2556 0.86 2 4 0 127 0
USA_R34_NE 9545 0.47 1 5 0 185 0
USA_R34_SE 9818 0.46 1 5 0 160 0
USA_R34_SW 10797 0.40 1 5 0 147 0
USA_R34_NW 10336 0.43 1 5 0 165 0
USA_R50_NE 14478 0.20 1 5 0 157 0
USA_R50_SE 14809 0.18 1 5 0 149 0
USA_R50_SW 15247 0.16 1 5 0 131 0
USA_R50_NW 14884 0.18 1 5 0 145 0
USA_R64_NE 15935 0.12 1 5 0 87 0
USA_R64_SE 16072 0.11 1 5 0 88 0
USA_R64_SW 16364 0.10 1 5 0 71 0
USA_R64_NW 16145 0.11 1 5 0 76 0
USA_POCI 2819 0.84 2 4 0 33 0
USA_ROCI 2821 0.84 2 5 0 187 0
USA_RMW 2848 0.84 1 5 0 104 0
USA_EYE 17754 0.02 2 5 0 13 0
TOKYO_LAT 15059 0.17 7 13 0 1815 0
TOKYO_LON 15059 0.17 7 12 0 1930 0
TOKYO_WIND 16467 0.09 2 3 0 47 0
TOKYO_PRES 15059 0.17 2 4 0 86 0
TOKYO_R50_LONG 17440 0.04 2 5 0 42 0
TOKYO_R50_SHORT 17440 0.04 2 5 0 38 0
TOKYO_R30_LONG 16467 0.09 2 5 0 70 0
TOKYO_R30_SHORT 16467 0.09 2 5 0 61 0
CMA_LAT 15046 0.17 7 13 0 1802 0
CMA_LON 15046 0.17 7 12 0 1920 0
CMA_WIND 15046 0.17 2 3 0 80 0
CMA_PRES 15046 0.17 2 4 0 79 0
HKO_LAT 15967 0.12 7 13 0 960 0
HKO_LON 15967 0.12 7 12 0 1104 0
HKO_CAT 15920 0.12 1 6 0 7 0
HKO_WIND 15967 0.12 2 3 0 51 0
HKO_PRES 15967 0.12 2 4 0 72 0
NEWDELHI_LAT 17665 0.02 7 13 0 223 0
NEWDELHI_LON 17665 0.02 7 12 0 250 0
NEWDELHI_GRADE 17664 0.02 1 4 0 7 0
NEWDELHI_WIND 17665 0.02 2 3 0 27 0
NEWDELHI_PRES 17665 0.02 2 4 0 47 0
NEWDELHI_DP 17665 0.02 1 2 0 37 0
NEWDELHI_POCI 18105 0.00 2 2 0 1 0
REUNION_LAT 16725 0.08 8 13 0 867 0
REUNION_LON 16725 0.08 7 12 0 1085 0
REUNION_WIND 16784 0.07 2 3 0 74 0
REUNION_PRES 16820 0.07 2 4 0 69 0
REUNION_RMW 17701 0.02 1 5 0 40 0
REUNION_R34_NE 17480 0.03 1 5 0 78 0
REUNION_R34_SE 17480 0.03 1 5 0 102 0
REUNION_R34_SW 17480 0.03 1 5 0 89 0
REUNION_R34_NW 17480 0.03 1 5 0 76 0
REUNION_R50_NE 17812 0.02 1 5 0 42 0
REUNION_R50_SE 17812 0.02 1 5 0 29 0
REUNION_R50_SW 17812 0.02 1 5 0 33 0
REUNION_R50_NW 17812 0.02 1 5 0 30 0
REUNION_R64_NE 18105 0.00 5 5 0 1 0
REUNION_R64_SE 18105 0.00 5 5 0 1 0
REUNION_R64_SW 18105 0.00 5 5 0 1 0
REUNION_R64_NW 18105 0.00 5 5 0 1 0
BOM_LAT 15794 0.13 8 13 0 1385 0
BOM_LON 15794 0.13 7 12 0 1754 0
BOM_WIND 15838 0.13 2 3 0 51 0
BOM_PRES 15852 0.12 2 4 0 79 0
BOM_RMW 17430 0.04 1 5 0 34 0
BOM_R34_NE 17495 0.03 2 5 0 66 0
BOM_R34_SE 17440 0.04 2 5 0 71 0
BOM_R34_SW 17370 0.04 2 5 0 69 0
BOM_R34_NW 17461 0.04 2 5 0 62 0
BOM_R50_NE 17875 0.01 2 5 0 41 0
BOM_R50_SE 17874 0.01 2 5 0 42 0
BOM_R50_SW 17870 0.01 2 5 0 39 0
BOM_R50_NW 17889 0.01 2 5 0 37 0
BOM_R64_NE 18032 0.00 2 5 0 17 0
BOM_R64_SE 18031 0.00 2 5 0 14 0
BOM_R64_SW 18041 0.00 2 5 0 14 0
BOM_R64_NW 18043 0.00 2 5 0 12 0
BOM_ROCI 16147 0.11 2 5 0 124 0
BOM_POCI 16114 0.11 2 4 0 16 0
BOM_EYE 18075 0.00 1 5 0 11 0
NADI_LAT 17795 0.02 8 13 0 242 0
NADI_LON 17795 0.02 7 12 0 276 0
NADI_WIND 17795 0.02 2 3 0 40 0
NADI_PRES 17795 0.02 2 3 0 75 0
WELLINGTON_LAT 17957 0.01 8 13 0 134 0
WELLINGTON_LON 17957 0.01 7 12 0 145 0
WELLINGTON_WIND 17982 0.01 2 3 0 22 0
WELLINGTON_PRES 18072 0.00 2 3 0 17 0
DS824_LAT 18105 0.00 13 13 0 1 0
DS824_LON 18105 0.00 12 12 0 1 0
DS824_WIND 18105 0.00 3 3 0 1 0
DS824_PRES 18105 0.00 2 2 0 1 0
TD9636_LAT 18105 0.00 13 13 0 1 0
TD9636_LON 18105 0.00 12 12 0 1 0
TD9636_WIND 18105 0.00 3 3 0 1 0
TD9636_PRES 18105 0.00 2 2 0 1 0
TD9635_LAT 18105 0.00 13 13 0 1 0
TD9635_LON 18105 0.00 12 12 0 1 0
TD9635_WIND 18105 0.00 3 3 0 1 0
TD9635_PRES 18105 0.00 2 2 0 1 0
TD9635_ROCI 18105 0.00 5 5 0 1 0
NEUMANN_LAT 18105 0.00 13 13 0 1 0
NEUMANN_LON 18105 0.00 12 12 0 1 0
NEUMANN_WIND 18105 0.00 3 3 0 1 0
NEUMANN_PRES 18105 0.00 2 2 0 1 0
MLC_LAT 18105 0.00 13 13 0 1 0
MLC_LON 18105 0.00 12 12 0 1 0
MLC_WIND 18105 0.00 3 3 0 1 0
MLC_PRES 18105 0.00 2 2 0 1 0
USA_GUST 15462 0.15 2 3 0 31 0
BOM_GUST 16932 0.06 2 3 0 28 0
BOM_GUST_PER 16932 0.06 2 6 0 28 0
REUNION_GUST 18105 0.00 3 3 0 1 0
REUNION_GUST_PER 17406 0.04 1 6 0 2 0
USA_SEAHGT 15266 0.16 2 2 0 2 0
USA_SEARAD_NE 15427 0.15 2 5 0 118 0
USA_SEARAD_SE 15746 0.13 2 5 0 121 0
USA_SEARAD_SW 16075 0.11 2 5 0 110 0
USA_SEARAD_NW 15805 0.13 2 5 0 106 0
STORM_SPEED 0 1.00 1 3 0 60 0
STORM_DIR 0 1.00 1 7 0 362 0

Variable type: logical

skim_variable n_missing complete_rate mean count
DS824_STAGE 18106 0 NaN :
TD9636_STAGE 18106 0 NaN :
NEUMANN_CLASS 18106 0 NaN :
MLC_CLASS 18106 0 NaN :

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
NUMBER 1 1.00 52.10 33.75 1.0 21.0 52.0 80.00 121.0 ▇▅▆▅▃
USA_SSHS 0 1.00 -1.16 2.28 -5.0 -3.0 -1.0 0.00 5.0 ▇▃▇▁▁
TOKYO_GRADE 15045 0.17 3.71 1.47 1.0 2.0 3.0 5.00 6.0 ▇▇▃▅▅
TOKYO_R50_DIR 15046 0.17 1.76 3.49 0.0 0.0 0.0 0.00 9.0 ▇▁▁▁▂
TOKYO_R30_DIR 15046 0.17 2.61 3.29 0.0 0.0 1.0 4.00 9.0 ▇▃▁▁▂
TOKYO_LAND 16470 0.09 0.00 0.07 0.0 0.0 0.0 0.00 1.0 ▇▁▁▁▁
CMA_CAT 15024 0.17 3.25 2.71 0.0 1.0 2.0 4.00 9.0 ▇▇▃▁▃
NEWDELHI_CI 17751 0.02 1.95 1.66 0.0 0.0 1.5 2.75 6.5 ▅▇▂▂▁
REUNION_TYPE 16722 0.08 4.03 1.90 1.0 3.0 3.0 6.00 8.0 ▆▇▃▆▂
REUNION_TNUM 17406 0.04 5.50 3.50 1.0 2.5 3.5 9.90 9.9 ▇▅▁▁▇
REUNION_CI 17663 0.02 3.15 1.34 1.0 2.5 3.0 3.50 7.0 ▅▇▅▂▁
BOM_TYPE 16205 0.10 27.81 14.72 10.0 20.0 20.0 30.00 81.0 ▇▂▂▁▁
BOM_TNUM 17362 0.04 2.61 1.17 1.0 2.0 2.5 3.00 7.0 ▇▆▃▁▁
BOM_CI 17358 0.04 2.70 1.20 0.5 2.0 2.5 3.50 7.0 ▃▇▂▂▁
BOM_POS_METHOD 17606 0.03 3.68 1.37 1.0 3.0 3.0 3.00 7.0 ▁▇▁▁▂
BOM_PRES_METHOD 17368 0.04 5.83 1.30 2.0 5.0 5.0 7.00 9.0 ▁▁▇▂▁
NADI_CAT 17785 0.02 2.20 1.54 -1.0 1.0 2.0 3.00 5.0 ▃▆▇▅▆

Variable type: POSIXct

skim_variable n_missing complete_rate min max median n_unique
ISO_TIME 1 1 2020-01-04 2023-03-14 2021-07-07 06:00:00 7872

Data 2 - US Vehicle Accidents 2016-2021

Introduction and data

  • This data comes from the paper Moosavi, Sobhan, Mohammad Hossein Samavatian, Srinivasan Parthasarathy, and Rajiv Ramnath. “A Countrywide Traffic Accident Dataset.”, 2019. This dataset was posted to Kaggle.

  • According to the data publisher, this data was collected using multiple APIs that provide streaming traffic incident (or event) data. These APIs broadcast traffic data captured by a variety of entities, such as the US and state departments of transportation, law enforcement agencies, traffic cameras, and traffic sensors within the road-networks.

  • Each observation is a traffic accident. Each accident has recorded details such as the date/time, location, severity, weather conditions, cause of incident, and other observed details.

  • NOTE: The original dataset has over 2.5 million observations and will not upload as-is to the posit rstudio. On my local computer and version of R Studio, I trimmed down the dataset to the cols of interest named above, and filtered for accidents in the state of NY, for now, but can modify the dataset used based on our questions…

Research question

  • Questions:
    • What factors (weather, cause of accident, nearby road infrastructure, time) contribute to the most severe accidents and can be used as predictors for potential future accidents?
  • Importance:
    • By utilizing the information in the dataset, future predictions can be made about when and where accidents, including the most severe accidents, will occur. This can be helpful for emergency services in determining road closures, how to position EMS teams, and how to better build infrastructure in the future to prevent accidents.
  • Description/Hypothesis:
    • The research topic aims to investigate the relationship between outside factors such as road structure, weather, time, and cause to see when accidents are most common. The focus of the study will be to determine key variables that predict when future accidents may occur. This will be accomplished using a large dataset pertaining to previous accidents that occurred in NY state. Our hypothesis that we will attempt to prove with this information is that notable accidents (determined by severity) are most likely to occur from poor weather and be located around certain road infrastructure hotspots, like roundabouts or stop signs.
  • Variables:
    • City: location data (categorical)

    • County: location data (categorical)

    • State: location data (categorical)

    • Zipcode: location data (categorical)

    • Airport Code: location data (categorical)

    • Wind_Direction: weather data (categorical)

    • Weather_Condition: weather data (categorical)

    • Severity: severity of accident (categorical)

    • Distance.mi: length of road affected by accident (quantitative)

    • Temperature: weather data (quantitative)

    • Wind_Chill: Humidity: weather data (quantitative)

    • Pressure.in: weather data (quantitative)

    • Visibility.mi: weather data (quantitative)

    • Wind_Speed: weather data (quantitative)

    • Crossing: presence of a crossing in nearby area (categorical)

    • Junction: presence of a junction in nearby area (categorical)

    • Stop: presence of a stop in nearby area (categorical)

    • Traffic_Signal: presence of a traffic signal in nearby area (categorical)

    • Start_Time: when the accident occurred (quantitative)

    • End_Time: when the local roadway was clear of the accident (quantitative)

    • Weather_Timestamp: when the weather data was recorded (quantitative)

Glimpse of data

accidents_NY <- read_csv("data/Accidents_NY.csv")
Rows: 108049 Columns: 24
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (8): City, County, State, Zipcode, Airport_Code, Wind_Direction, Weathe...
dbl  (9): Severity, Distance.mi., Temperature.F., Wind_Chill.F., Humidity......
lgl  (4): Crossing, Junction, Stop, Traffic_Signal
dttm (3): Start_Time, End_Time, Weather_Timestamp

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
skimr::skim(accidents_NY)
Data summary
Name accidents_NY
Number of rows 108049
Number of columns 24
_______________________
Column type frequency:
character 8
logical 4
numeric 9
POSIXct 3
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
City 25 1.00 3 22 0 1053 0
County 0 1.00 4 14 0 63 0
State 0 1.00 2 2 0 1 0
Zipcode 2 1.00 5 10 0 13207 0
Airport_Code 195 1.00 4 4 0 55 0
Wind_Direction 1794 0.98 1 8 0 24 0
Weather_Condition 925 0.99 3 28 0 72 0
Sunrise_Sunset 119 1.00 3 5 0 2 0

Variable type: logical

skim_variable n_missing complete_rate mean count
Crossing 0 1 0.07 FAL: 100844, TRU: 7205
Junction 0 1 0.16 FAL: 91022, TRU: 17027
Stop 0 1 0.02 FAL: 105930, TRU: 2119
Traffic_Signal 0 1 0.11 FAL: 95733, TRU: 12316

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Severity 0 1.00 2.22 0.57 1.00 2.0 2.00 2.00 4.00 ▁▇▁▁▁
Distance.mi. 0 1.00 0.83 1.56 0.00 0.1 0.35 0.93 53.31 ▇▁▁▁▁
Temperature.F. 807 0.99 54.01 18.43 -77.80 39.0 54.00 70.00 144.00 ▁▁▇▇▁
Wind_Chill.F. 17035 0.84 49.28 21.51 -30.40 32.0 51.00 68.00 144.00 ▁▆▇▂▁
Humidity… 863 0.99 65.56 20.18 8.00 50.0 66.00 83.00 100.00 ▁▅▇▇▇
Pressure.in. 1072 0.99 29.75 0.40 19.75 29.5 29.80 30.03 30.87 ▁▁▁▁▇
Visibility.mi. 1093 0.99 9.04 2.72 0.00 10.0 10.00 10.00 30.00 ▁▇▁▁▁
Wind_Speed.mph. 4392 0.96 8.89 5.62 0.00 5.0 8.00 12.00 141.50 ▇▁▁▁▁
Precipitation.in. 19660 0.82 0.02 0.34 0.00 0.0 0.00 0.00 10.05 ▇▁▁▁▁

Variable type: POSIXct

skim_variable n_missing complete_rate min max median n_unique
Start_Time 0 1.00 2016-03-23 02:35:03 2021-12-31 23:22:00 2021-01-20 15:24:13 78321
End_Time 0 1.00 2016-03-23 08:35:03 2021-12-31 23:49:05 2021-01-20 20:55:08 93167
Weather_Timestamp 615 0.99 2016-03-23 02:51:00 2021-12-31 23:32:00 2021-01-21 00:53:00 50268

Data 3

Introduction and data

  • This data comes from the Peterson Institute for International Economics, and it is part of a working paper by Caroline Freund and Sarah Oliver. We found this data through CORGIS(The Collection of Really Great, Interesting, Situated Datasets).

  • This data was originally collected by Freund and Oliver through the information published on Forbes’ list of billionaires that started in 1996. The data they gathered spans 20 years, from 1996-2015.

  • Each observation in this dataset contains information about a billionaire (somebody with a net worth of atleast $1 Billion in a given year), such as how they made their money, how much they are worth, whether they inherited money, gender, nationality, etc. The observations provide critical information into how these people made their fortunes, and help to identify similarities and differences between the billionaires. Their are potential ethical concerns with this data because Forbes included 8 monarchs and 4 dictators on the list who made their money through their positions of power, which is usually not allowed for members to be on the list but they made an exception in 1997-1998.

Research question

  • Question:

    What are the most significant predictors of net worth for billionaires that inherit wealth and those that do not?

  • Importance: 

    Answering the research question of what are the most significant predictors of net worth for billionaires who inherit wealth and those who do not can provide insights into how billionaires accumulate wealth and how different paths to wealth affect the final outcome. This research can have policy implications for governments concerned with wealth inequality and can inform investment decisions for those who invest in companies or industries catering to high-net-worth individuals. Understanding the most significant predictors of net worth can help policymakers design policies aimed at reducing wealth inequality or supporting wealth creation for those who do not inherit wealth. It can also enable investors to make informed decisions about which companies or industries to invest in and what factors to consider when evaluating the potential for wealth creation. Overall, answering this research question can have important implications for wealth accumulation, policy, and investment decisions.

  • Description and Hypothesis: 

    The research topic is focused on identifying the most significant predictors of net worth for billionaires who inherit wealth and those who do not. The study seeks to understand how billionaires accumulate wealth and to determine whether the factors that contribute to wealth accumulation differ between those who inherit wealth and those who do not. The research may involve data analysis of a sample of billionaires, with variables such as family connections, access to capital, educational attainment, entrepreneurial experience, and risk-taking propensity being measured and analyzed for their correlation with net worth. 

  • Variables:

    • wealth.worth in billions: How much the person is worth (quantitative)

    • wealth.how.inherited: How did the person inherit their money, could be self made (categorical)

    • wealth.how.was founder: Did that individual found the company (categorical)

    • wealth.how.was.political: Was the person involved in politics (categorical)

    • year: When did they become a billionaire (quantitative)

    • demographics.age: How old were they when they became a billionaire (quantitative)

    • demographics.gender: What is their gender (categorical)

    • company.relationship: What is the persons relationship with the company (categorical)

    • company.sector: What sector is the company a part of (categorical)

    • company.type: What type of company is it ie. public, private, etc.. (categorical)

    • location.region: Where is the billionaire based (categorical)

    • wealth.how sectors: What kind of sector is the business in for trading purposes (categorical)

    • wealth.how industry: What is the industry (categorical)

Glimpse of data

billions <- read_csv("data/billionaires.csv")
Rows: 2614 Columns: 22
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (13): name, company.name, company.relationship, company.sector, company....
dbl  (6): rank, year, company.founded, demographics.age, location.gdp, wealt...
lgl  (3): wealth.how.from emerging, wealth.how.was founder, wealth.how.was p...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
skimr::skim(billions)
Data summary
Name billions
Number of rows 2614
Number of columns 22
_______________________
Column type frequency:
character 13
logical 3
numeric 6
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
name 0 1.00 5 45 0 2077 0
company.name 38 0.99 3 59 0 1576 0
company.relationship 46 0.98 3 46 0 73 0
company.sector 23 0.99 3 52 0 505 0
company.type 36 0.99 3 22 0 15 0
demographics.gender 34 0.99 4 14 0 3 0
location.citizenship 0 1.00 4 20 0 73 0
location.country code 0 1.00 3 6 0 74 0
location.region 0 1.00 1 24 0 8 0
wealth.type 22 0.99 9 24 0 5 0
wealth.how.category 1 1.00 1 18 0 9 0
wealth.how.industry 1 1.00 1 31 0 19 0
wealth.how.inherited 0 1.00 6 24 0 6 0

Variable type: logical

skim_variable n_missing complete_rate mean count
wealth.how.from emerging 0 1 1 TRU: 2614
wealth.how.was founder 0 1 1 TRU: 2614
wealth.how.was political 0 1 1 TRU: 2614

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
rank 0 1 5.996700e+02 4.678900e+02 1 215.0 430 9.880e+02 1.565e+03 ▇▅▃▂▃
year 0 1 2.008410e+03 7.480000e+00 1996 2001.0 2014 2.014e+03 2.014e+03 ▂▂▁▁▇
company.founded 0 1 1.924710e+03 2.437800e+02 0 1936.0 1963 1.985e+03 2.012e+03 ▁▁▁▁▇
demographics.age 0 1 5.334000e+01 2.533000e+01 -42 47.0 59 7.000e+01 9.800e+01 ▁▂▁▇▃
location.gdp 0 1 1.769103e+12 3.547083e+12 0 0.0 0 7.250e+11 1.060e+13 ▇▁▁▁▁
wealth.worth in billions 0 1 3.530000e+00 5.090000e+00 1 1.4 2 3.500e+00 7.600e+01 ▇▁▁▁▁