library(tidyverse)
library(skimr)
Where does it pay to attend college?
Proposal
Data 1
Introduction and data
Identify the source of the data.
Data was downloaded from FBref, they collected and organized the data.
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
This Data was compiled during the 2022 Qatar Fifa world cup. Data was collected by FBref, a website devoted to tracking statistics for football teams and players from around the world.
Write a brief description of the observations.
The observations are statistics for every different player that attended the world cup. These statistics range from those for goalkeepers, for defensive players or for attacking players. They measure performance metrics such as number of shots, number of tackles, number of passes, or crosses. It is very detailed and has information about every player and a very large variety of statistics for every player who attended.
Research question
A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
How does a player’s age and the league they play in relate to their performance in the FIFA World Cup?
A description of the research topic along with a concise statement of your hypotheses on this topic.
We will evaluate players based on their age at the time of the 2022 world cup as well as the soccer league in which they play in. These two variables will then be coupled with a performance variable that is made up of various parameters that evaluate performance (such as minutes played, expected goals, expected assisted goals, # of progressive actions, among many others). At first glance, we believe that a player’s performance will be better if he plays in a European top 5 league (England, Germany, Spain, Italy, or France) and he is in the second half of his 20’s age wise. This question is important because it can be used to make better staffing and recruiting decisions for each country’s team.
Identify the types of variables in your research question. Categorical? Quantitative?
The variables in our research question are quantitative (categorical), league that the players plays in (categorical), and performance (quantitative).
Glimpse of data
# add code here
<- read_csv("data/player_defense.csv") player_defense
Rows: 680 Columns: 22
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): player, position, team, age
dbl (18): birth_year, minutes_90s, tackles, tackles_won, tackles_def_3rd, ta...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(player_defense)
# A tibble: 6 × 22
player position team age birth_year minutes_90s tackles tackles_won
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Aaron Mooy MF Aust… 32-0… 1990 4 9 6
2 Aaron Ramsey MF Wales 31-3… 1990 3 2 0
3 Abdelhamid Sa… MF Moro… 26-0… 1996 2 3 1
4 Abdelkarim Ha… DF Qatar 29-1… 1993 3 7 3
5 Abderrazak Ha… FW Moro… 32-0… 1990 0.8 0 0
6 Abdessamad Ez… FW Moro… 21-0… 2001 1 3 2
# ℹ 14 more variables: tackles_def_3rd <dbl>, tackles_mid_3rd <dbl>,
# tackles_att_3rd <dbl>, dribble_tackles <dbl>, dribbles_vs <dbl>,
# dribble_tackles_pct <dbl>, dribbled_past <dbl>, blocks <dbl>,
# blocked_shots <dbl>, blocked_passes <dbl>, interceptions <dbl>,
# tackles_interceptions <dbl>, clearances <dbl>, errors <dbl>
<- read_csv("data/player_gca.csv") player_gca
Rows: 680 Columns: 22
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): player, position, team, age
dbl (18): birth_year, minutes_90s, sca, sca_per90, sca_passes_live, sca_pass...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(player_gca)
# A tibble: 6 × 22
player position team age birth_year minutes_90s sca sca_per90
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Aaron Mooy MF Aust… 32-0… 1990 4 5 1.25
2 Aaron Ramsey MF Wales 31-3… 1990 3 3 1.02
3 Abdelhamid Sabiri MF Moro… 26-0… 1996 2 4 2.4
4 Abdelkarim Hassan DF Qatar 29-1… 1993 3 4 1.33
5 Abderrazak Hamdal… FW Moro… 32-0… 1990 0.8 0 0
6 Abdessamad Ezzalz… FW Moro… 21-0… 2001 1 4 5.63
# ℹ 14 more variables: sca_passes_live <dbl>, sca_passes_dead <dbl>,
# sca_dribbles <dbl>, sca_shots <dbl>, sca_fouled <dbl>, sca_defense <dbl>,
# gca <dbl>, gca_per90 <dbl>, gca_passes_live <dbl>, gca_passes_dead <dbl>,
# gca_dribbles <dbl>, gca_shots <dbl>, gca_fouled <dbl>, gca_defense <dbl>
<- read_csv("data/player_keepers.csv") player_keepers
Rows: 41 Columns: 25
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): player, position, team, age, club
dbl (20): birth_year, gk_games, gk_games_starts, gk_minutes, minutes_90s, gk...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(player_keepers)
# A tibble: 6 × 25
player position team age club birth_year gk_games gk_games_starts
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 Aimen Dahmen GK Tuni… 25-3… CS S… 1997 3 3
2 Alireza Beiran… GK IR I… 30-0… Pers… 1992 2 2
3 Alisson GK Braz… 30-0… Live… 1992 4 4
4 Andries Noppert GK Neth… 28-2… Heer… 1994 5 5
5 André Onana GK Came… 26-2… Inter 1996 1 1
6 Danny Ward GK Wales 29-1… Leic… 1993 2 1
# ℹ 17 more variables: gk_minutes <dbl>, minutes_90s <dbl>,
# gk_goals_against <dbl>, gk_goals_against_per90 <dbl>,
# gk_shots_on_target_against <dbl>, gk_saves <dbl>, gk_save_pct <dbl>,
# gk_wins <dbl>, gk_ties <dbl>, gk_losses <dbl>, gk_clean_sheets <dbl>,
# gk_clean_sheets_pct <dbl>, gk_pens_att <dbl>, gk_pens_allowed <dbl>,
# gk_pens_saved <dbl>, gk_pens_missed <dbl>, gk_pens_save_pct <dbl>
<- read_csv("data/player_keepersadv.csv") player_keepersadv
Rows: 41 Columns: 31
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): player, position, team, age
dbl (27): birth_year, minutes_90s, gk_goals_against, gk_pens_allowed, gk_fre...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(player_keepersadv)
# A tibble: 6 × 31
player position team age birth_year minutes_90s gk_goals_against
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 Aimen Dahmen GK Tuni… 25-3… 1997 3 1
2 Alireza Beiranva… GK IR I… 30-0… 1992 1.2 1
3 Alisson GK Braz… 30-0… 1992 4.2 2
4 Andries Noppert GK Neth… 28-2… 1994 5.3 4
5 André Onana GK Came… 26-2… 1996 0.9 1
6 Danny Ward GK Wales 29-1… 1993 1 5
# ℹ 24 more variables: gk_pens_allowed <dbl>, gk_free_kick_goals_against <dbl>,
# gk_corner_kick_goals_against <dbl>, gk_own_goals_against <dbl>,
# gk_psxg <dbl>, gk_psnpxg_per_shot_on_target_against <dbl>,
# gk_psxg_net <dbl>, gk_psxg_net_per90 <dbl>,
# gk_passes_completed_launched <dbl>, gk_passes_launched <dbl>,
# gk_passes_pct_launched <dbl>, gk_passes <dbl>, gk_passes_throws <dbl>,
# gk_pct_passes_launched <dbl>, gk_passes_length_avg <dbl>, …
<- read_csv("data/player_misc.csv") player_misc
Rows: 680 Columns: 22
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): player, position, team, age
dbl (18): birth_year, minutes_90s, cards_yellow, cards_red, cards_yellow_red...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(player_misc)
# A tibble: 6 × 22
player position team age birth_year minutes_90s cards_yellow cards_red
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Aaron Mooy MF Aust… 32-0… 1990 4 1 0
2 Aaron Rams… MF Wales 31-3… 1990 3 1 0
3 Abdelhamid… MF Moro… 26-0… 1996 2 1 0
4 Abdelkarim… DF Qatar 29-1… 1993 3 0 0
5 Abderrazak… FW Moro… 32-0… 1990 0.8 0 0
6 Abdessamad… FW Moro… 21-0… 2001 1 0 0
# ℹ 14 more variables: cards_yellow_red <dbl>, fouls <dbl>, fouled <dbl>,
# offsides <dbl>, crosses <dbl>, interceptions <dbl>, tackles_won <dbl>,
# pens_won <dbl>, pens_conceded <dbl>, own_goals <dbl>,
# ball_recoveries <dbl>, aerials_won <dbl>, aerials_lost <dbl>,
# aerials_won_pct <dbl>
<- read_csv("data/player_passing.csv") player_passing
Rows: 680 Columns: 29
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): player, position, team, age
dbl (25): birth_year, minutes_90s, passes_completed, passes, passes_pct, pas...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(player_passing)
# A tibble: 6 × 29
player position team age birth_year minutes_90s passes_completed passes
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Aaron Mooy MF Aust… 32-0… 1990 4 170 217
2 Aaron Ram… MF Wales 31-3… 1990 3 88 112
3 Abdelhami… MF Moro… 26-0… 1996 2 45 58
4 Abdelkari… DF Qatar 29-1… 1993 3 122 161
5 Abderraza… FW Moro… 32-0… 1990 0.8 8 15
6 Abdessama… FW Moro… 21-0… 2001 1 10 13
# ℹ 21 more variables: passes_pct <dbl>, passes_total_distance <dbl>,
# passes_progressive_distance <dbl>, passes_completed_short <dbl>,
# passes_short <dbl>, passes_pct_short <dbl>, passes_completed_medium <dbl>,
# passes_medium <dbl>, passes_pct_medium <dbl>, passes_completed_long <dbl>,
# passes_long <dbl>, passes_pct_long <dbl>, assists <dbl>, xg_assist <dbl>,
# pass_xa <dbl>, xg_assist_net <dbl>, assisted_shots <dbl>,
# passes_into_final_third <dbl>, passes_into_penalty_area <dbl>, …
<- read_csv("data/player_passing_types.csv") player_passing_types
Rows: 680 Columns: 21
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): player, position, team, age
dbl (17): birth_year, minutes_90s, passes, passes_live, passes_dead, passes_...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(player_passing_types)
# A tibble: 6 × 21
player position team age birth_year minutes_90s passes passes_live
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Aaron Mooy MF Aust… 32-0… 1990 4 217 206
2 Aaron Ramsey MF Wales 31-3… 1990 3 112 101
3 Abdelhamid Sab… MF Moro… 26-0… 1996 2 58 55
4 Abdelkarim Has… DF Qatar 29-1… 1993 3 161 148
5 Abderrazak Ham… FW Moro… 32-0… 1990 0.8 15 14
6 Abdessamad Ezz… FW Moro… 21-0… 2001 1 13 12
# ℹ 13 more variables: passes_dead <dbl>, passes_free_kicks <dbl>,
# through_balls <dbl>, passes_switches <dbl>, crosses <dbl>, throw_ins <dbl>,
# corner_kicks <dbl>, corner_kicks_in <dbl>, corner_kicks_out <dbl>,
# corner_kicks_straight <dbl>, passes_completed <dbl>, passes_offsides <dbl>,
# passes_blocked <dbl>
<- read_csv("data/player_playingtime.csv") player_playingtime
Rows: 829 Columns: 27
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): player, position, team, age
dbl (23): birth_year, games, minutes, minutes_per_game, minutes_pct, minutes...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(player_playingtime)
# A tibble: 6 × 27
player position team age birth_year games minutes minutes_per_game
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Aaron Long DF Unit… 30-0… 1992 0 NA NA
2 Aaron Mooy MF Aust… 32-0… 1990 4 360 90
3 Aaron Ramsdale GK Engl… 24-2… 1998 0 NA NA
4 Aaron Ramsey MF Wales 31-3… 1990 3 266 89
5 Abdelhamid Sab… MF Moro… 26-0… 1996 5 181 36
6 Abdelkarim Has… DF Qatar 29-1… 1993 3 270 90
# ℹ 19 more variables: minutes_pct <dbl>, minutes_90s <dbl>,
# games_starts <dbl>, minutes_per_start <dbl>, games_complete <dbl>,
# games_subs <dbl>, minutes_per_sub <dbl>, unused_subs <dbl>,
# points_per_game <dbl>, on_goals_for <dbl>, on_goals_against <dbl>,
# plus_minus <dbl>, plus_minus_per90 <dbl>, plus_minus_wowy <dbl>,
# on_xg_for <dbl>, on_xg_against <dbl>, xg_plus_minus <dbl>,
# xg_plus_minus_per90 <dbl>, xg_plus_minus_wowy <dbl>
<- read_csv("data/player_possession.csv") player_possession
Rows: 680 Columns: 20
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): player, position, team, age
dbl (16): birth_year, minutes_90s, touches, touches_def_pen_area, touches_de...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(player_possession)
# A tibble: 6 × 20
player position team age birth_year minutes_90s touches
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 Aaron Mooy MF Australia 32-094 1990 4 255
2 Aaron Ramsey MF Wales 31-357 1990 3 147
3 Abdelhamid Sabiri MF Morocco 26-020 1996 2 86
4 Abdelkarim Hassan DF Qatar 29-112 1993 3 193
5 Abderrazak Hamdallah FW Morocco 32-001 1990 0.8 28
6 Abdessamad Ezzalzouli FW Morocco 21-001 2001 1 40
# ℹ 13 more variables: touches_def_pen_area <dbl>, touches_def_3rd <dbl>,
# touches_mid_3rd <dbl>, touches_att_3rd <dbl>, touches_att_pen_area <dbl>,
# touches_live_ball <dbl>, dribbles_completed <dbl>, dribbles <dbl>,
# dribbles_completed_pct <dbl>, miscontrols <dbl>, dispossessed <dbl>,
# passes_received <dbl>, progressive_passes_received <dbl>
<- read_csv("data/player_stats.csv") player_stats
Rows: 680 Columns: 31
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): player, position, team, age, club
dbl (26): birth_year, games, games_starts, minutes, minutes_90s, goals, assi...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(player_stats)
# A tibble: 6 × 31
player position team age club birth_year games games_starts minutes
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Aaron Mooy MF Aust… 32-0… Celt… 1990 4 4 360
2 Aaron Ramsey MF Wales 31-3… Nice 1990 3 3 266
3 Abdelhamid S… MF Moro… 26-0… Samp… 1996 5 2 181
4 Abdelkarim H… DF Qatar 29-1… Al S… 1993 3 3 270
5 Abderrazak H… FW Moro… 32-0… Al-I… 1990 4 0 68
6 Abdessamad E… FW Moro… 21-0… Osas… 2001 3 0 93
# ℹ 22 more variables: minutes_90s <dbl>, goals <dbl>, assists <dbl>,
# goals_pens <dbl>, pens_made <dbl>, pens_att <dbl>, cards_yellow <dbl>,
# cards_red <dbl>, goals_per90 <dbl>, assists_per90 <dbl>,
# goals_assists_per90 <dbl>, goals_pens_per90 <dbl>,
# goals_assists_pens_per90 <dbl>, xg <dbl>, npxg <dbl>, xg_assist <dbl>,
# npxg_xg_assist <dbl>, xg_per90 <dbl>, xg_assist_per90 <dbl>,
# xg_xg_assist_per90 <dbl>, npxg_per90 <dbl>, npxg_xg_assist_per90 <dbl>
skim(player_defense)
Name | player_defense |
Number of rows | 680 |
Number of columns | 22 |
_______________________ | |
Column type frequency: | |
character | 4 |
numeric | 18 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
player | 0 | 1 | 4 | 26 | 0 | 680 | 0 |
position | 0 | 1 | 2 | 2 | 0 | 4 | 0 |
team | 0 | 1 | 5 | 14 | 0 | 32 | 0 |
age | 0 | 1 | 6 | 6 | 0 | 634 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
birth_year | 0 | 1.00 | 1994.91 | 4.16 | 1983 | 1992.0 | 1995.0 | 1998.00 | 2004.0 | ▁▅▇▇▃ |
minutes_90s | 0 | 1.00 | 2.12 | 1.64 | 0 | 0.8 | 1.9 | 3.00 | 7.7 | ▇▆▂▁▁ |
tackles | 3 | 1.00 | 3.05 | 3.55 | 0 | 0.0 | 2.0 | 4.00 | 26.0 | ▇▂▁▁▁ |
tackles_won | 0 | 1.00 | 1.76 | 2.33 | 0 | 0.0 | 1.0 | 3.00 | 17.0 | ▇▁▁▁▁ |
tackles_def_3rd | 3 | 1.00 | 1.55 | 2.23 | 0 | 0.0 | 1.0 | 2.00 | 15.0 | ▇▁▁▁▁ |
tackles_mid_3rd | 3 | 1.00 | 1.17 | 1.57 | 0 | 0.0 | 1.0 | 2.00 | 11.0 | ▇▁▁▁▁ |
tackles_att_3rd | 3 | 1.00 | 0.34 | 0.72 | 0 | 0.0 | 0.0 | 0.00 | 5.0 | ▇▁▁▁▁ |
dribble_tackles | 3 | 1.00 | 1.31 | 1.88 | 0 | 0.0 | 1.0 | 2.00 | 17.0 | ▇▁▁▁▁ |
dribbles_vs | 3 | 1.00 | 2.40 | 2.89 | 0 | 0.0 | 2.0 | 3.00 | 28.0 | ▇▁▁▁▁ |
dribble_tackles_pct | 197 | 0.71 | 51.44 | 37.60 | 0 | 0.0 | 50.0 | 92.85 | 100.0 | ▇▃▆▅▇ |
dribbled_past | 3 | 1.00 | 1.09 | 1.48 | 0 | 0.0 | 1.0 | 2.00 | 11.0 | ▇▁▁▁▁ |
blocks | 3 | 1.00 | 2.12 | 2.37 | 0 | 0.0 | 1.0 | 3.00 | 13.0 | ▇▃▁▁▁ |
blocked_shots | 3 | 1.00 | 0.54 | 1.05 | 0 | 0.0 | 0.0 | 1.00 | 7.0 | ▇▁▁▁▁ |
blocked_passes | 3 | 1.00 | 1.58 | 1.93 | 0 | 0.0 | 1.0 | 2.00 | 12.0 | ▇▂▁▁▁ |
interceptions | 0 | 1.00 | 1.57 | 2.08 | 0 | 0.0 | 1.0 | 2.00 | 14.0 | ▇▂▁▁▁ |
tackles_interceptions | 3 | 1.00 | 4.63 | 5.08 | 0 | 1.0 | 3.0 | 7.00 | 35.0 | ▇▂▁▁▁ |
clearances | 3 | 1.00 | 3.55 | 5.02 | 0 | 0.0 | 2.0 | 5.00 | 37.0 | ▇▁▁▁▁ |
errors | 3 | 1.00 | 0.06 | 0.27 | 0 | 0.0 | 0.0 | 0.00 | 3.0 | ▇▁▁▁▁ |
skim(player_gca)
Name | player_gca |
Number of rows | 680 |
Number of columns | 22 |
_______________________ | |
Column type frequency: | |
character | 4 |
numeric | 18 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
player | 0 | 1 | 4 | 26 | 0 | 680 | 0 |
position | 0 | 1 | 2 | 2 | 0 | 4 | 0 |
team | 0 | 1 | 5 | 14 | 0 | 32 | 0 |
age | 0 | 1 | 6 | 6 | 0 | 634 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
birth_year | 0 | 1.00 | 1994.91 | 4.16 | 1983 | 1992.00 | 1995.00 | 1998.00 | 2004.0 | ▁▅▇▇▃ |
minutes_90s | 0 | 1.00 | 2.12 | 1.64 | 0 | 0.80 | 1.90 | 3.00 | 7.7 | ▇▆▂▁▁ |
sca | 3 | 1.00 | 3.76 | 4.89 | 0 | 1.00 | 2.00 | 5.00 | 46.0 | ▇▁▁▁▁ |
sca_per90 | 5 | 0.99 | 2.09 | 3.09 | 0 | 0.33 | 1.46 | 2.81 | 45.0 | ▇▁▁▁▁ |
sca_passes_live | 3 | 1.00 | 2.79 | 3.53 | 0 | 0.00 | 2.00 | 4.00 | 28.0 | ▇▁▁▁▁ |
sca_passes_dead | 3 | 1.00 | 0.32 | 0.98 | 0 | 0.00 | 0.00 | 0.00 | 12.0 | ▇▁▁▁▁ |
sca_dribbles | 3 | 1.00 | 0.17 | 0.57 | 0 | 0.00 | 0.00 | 0.00 | 7.0 | ▇▁▁▁▁ |
sca_shots | 3 | 1.00 | 0.24 | 0.64 | 0 | 0.00 | 0.00 | 0.00 | 8.0 | ▇▁▁▁▁ |
sca_fouled | 3 | 1.00 | 0.19 | 0.52 | 0 | 0.00 | 0.00 | 0.00 | 5.0 | ▇▁▁▁▁ |
sca_defense | 3 | 1.00 | 0.05 | 0.26 | 0 | 0.00 | 0.00 | 0.00 | 3.0 | ▇▁▁▁▁ |
gca | 3 | 1.00 | 0.43 | 0.94 | 0 | 0.00 | 0.00 | 1.00 | 11.0 | ▇▁▁▁▁ |
gca_per90 | 5 | 0.99 | 0.24 | 1.18 | 0 | 0.00 | 0.00 | 0.19 | 22.5 | ▇▁▁▁▁ |
gca_passes_live | 3 | 1.00 | 0.32 | 0.74 | 0 | 0.00 | 0.00 | 0.00 | 7.0 | ▇▁▁▁▁ |
gca_passes_dead | 3 | 1.00 | 0.02 | 0.13 | 0 | 0.00 | 0.00 | 0.00 | 1.0 | ▇▁▁▁▁ |
gca_dribbles | 3 | 1.00 | 0.02 | 0.16 | 0 | 0.00 | 0.00 | 0.00 | 2.0 | ▇▁▁▁▁ |
gca_shots | 3 | 1.00 | 0.03 | 0.22 | 0 | 0.00 | 0.00 | 0.00 | 4.0 | ▇▁▁▁▁ |
gca_fouled | 3 | 1.00 | 0.04 | 0.19 | 0 | 0.00 | 0.00 | 0.00 | 1.0 | ▇▁▁▁▁ |
gca_defense | 3 | 1.00 | 0.00 | 0.04 | 0 | 0.00 | 0.00 | 0.00 | 1.0 | ▇▁▁▁▁ |
skim(player_keepers)
Name | player_keepers |
Number of rows | 41 |
Number of columns | 25 |
_______________________ | |
Column type frequency: | |
character | 5 |
numeric | 20 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
player | 0 | 1 | 5 | 22 | 0 | 41 | 0 |
position | 0 | 1 | 2 | 2 | 0 | 1 | 0 |
team | 0 | 1 | 5 | 14 | 0 | 32 | 0 |
age | 0 | 1 | 6 | 6 | 0 | 41 | 0 |
club | 0 | 1 | 4 | 15 | 0 | 40 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
birth_year | 0 | 1.00 | 1991.46 | 3.94 | 1985.0 | 1988.0 | 1992.00 | 1994 | 1999.00 | ▆▃▇▃▃ |
gk_games | 0 | 1.00 | 3.20 | 1.60 | 1.0 | 2.0 | 3.00 | 4 | 7.00 | ▇▇▅▂▂ |
gk_games_starts | 0 | 1.00 | 3.12 | 1.69 | 0.0 | 2.0 | 3.00 | 4 | 7.00 | ▃▂▇▁▂ |
gk_minutes | 0 | 1.00 | 288.02 | 163.46 | 11.0 | 175.0 | 270.00 | 360 | 690.00 | ▅▇▃▂▂ |
minutes_90s | 0 | 1.00 | 3.20 | 1.82 | 0.1 | 1.9 | 3.00 | 4 | 7.70 | ▅▇▃▂▂ |
gk_goals_against | 0 | 1.00 | 4.20 | 2.64 | 0.0 | 2.0 | 4.00 | 6 | 11.00 | ▇▆▆▆▁ |
gk_goals_against_per90 | 0 | 1.00 | 1.42 | 0.98 | 0.0 | 0.8 | 1.04 | 2 | 4.79 | ▇▇▃▁▁ |
gk_shots_on_target_against | 0 | 1.00 | 12.24 | 7.58 | 0.0 | 6.0 | 11.00 | 18 | 31.00 | ▇▇▅▅▁ |
gk_saves | 0 | 1.00 | 7.98 | 5.71 | 0.0 | 4.0 | 7.00 | 11 | 24.00 | ▇▇▃▂▁ |
gk_save_pct | 1 | 0.98 | 67.72 | 14.20 | 40.0 | 54.5 | 66.70 | 80 | 100.00 | ▃▃▇▆▁ |
gk_wins | 0 | 1.00 | 1.17 | 1.16 | 0.0 | 0.0 | 1.00 | 2 | 5.00 | ▇▂▁▁▁ |
gk_ties | 0 | 1.00 | 0.73 | 0.87 | 0.0 | 0.0 | 1.00 | 1 | 4.00 | ▇▇▂▁▁ |
gk_losses | 0 | 1.00 | 1.15 | 0.79 | 0.0 | 1.0 | 1.00 | 2 | 3.00 | ▃▇▁▆▁ |
gk_clean_sheets | 0 | 1.00 | 0.98 | 0.99 | 0.0 | 0.0 | 1.00 | 2 | 3.00 | ▇▅▁▅▂ |
gk_clean_sheets_pct | 1 | 0.98 | 28.75 | 28.87 | 0.0 | 0.0 | 30.95 | 50 | 100.00 | ▇▃▃▁▁ |
gk_pens_att | 0 | 1.00 | 0.56 | 0.84 | 0.0 | 0.0 | 0.00 | 1 | 4.00 | ▇▅▁▁▁ |
gk_pens_allowed | 0 | 1.00 | 0.41 | 0.67 | 0.0 | 0.0 | 0.00 | 1 | 3.00 | ▇▃▁▁▁ |
gk_pens_saved | 0 | 1.00 | 0.12 | 0.40 | 0.0 | 0.0 | 0.00 | 0 | 2.00 | ▇▁▁▁▁ |
gk_pens_missed | 0 | 1.00 | 0.02 | 0.16 | 0.0 | 0.0 | 0.00 | 0 | 1.00 | ▇▁▁▁▁ |
gk_pens_save_pct | 24 | 0.41 | 20.59 | 39.76 | 0.0 | 0.0 | 0.00 | 0 | 100.00 | ▇▁▁▁▂ |
skim(player_keepersadv)
Name | player_keepersadv |
Number of rows | 41 |
Number of columns | 31 |
_______________________ | |
Column type frequency: | |
character | 4 |
numeric | 27 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
player | 0 | 1 | 5 | 22 | 0 | 41 | 0 |
position | 0 | 1 | 2 | 2 | 0 | 1 | 0 |
team | 0 | 1 | 5 | 14 | 0 | 32 | 0 |
age | 0 | 1 | 6 | 6 | 0 | 41 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
birth_year | 0 | 1.00 | 1991.46 | 3.94 | 1985.00 | 1988.00 | 1992.00 | 1994.00 | 1999.00 | ▆▃▇▃▃ |
minutes_90s | 0 | 1.00 | 3.20 | 1.82 | 0.10 | 1.90 | 3.00 | 4.00 | 7.70 | ▅▇▃▂▂ |
gk_goals_against | 0 | 1.00 | 4.20 | 2.64 | 0.00 | 2.00 | 4.00 | 6.00 | 11.00 | ▇▆▆▆▁ |
gk_pens_allowed | 0 | 1.00 | 0.41 | 0.67 | 0.00 | 0.00 | 0.00 | 1.00 | 3.00 | ▇▃▁▁▁ |
gk_free_kick_goals_against | 0 | 1.00 | 0.05 | 0.22 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
gk_corner_kick_goals_against | 0 | 1.00 | 0.32 | 0.52 | 0.00 | 0.00 | 0.00 | 1.00 | 2.00 | ▇▁▃▁▁ |
gk_own_goals_against | 0 | 1.00 | 0.05 | 0.22 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
gk_psxg | 0 | 1.00 | 4.21 | 2.61 | 0.00 | 2.10 | 3.70 | 6.00 | 10.50 | ▇▇▅▅▂ |
gk_psnpxg_per_shot_on_target_against | 1 | 0.98 | 0.30 | 0.08 | 0.10 | 0.25 | 0.31 | 0.37 | 0.44 | ▂▂▆▇▅ |
gk_psxg_net | 0 | 1.00 | 0.06 | 1.35 | -2.30 | -0.60 | 0.00 | 0.80 | 3.50 | ▅▇▆▃▂ |
gk_psxg_net_per90 | 0 | 1.00 | -0.04 | 0.50 | -1.49 | -0.31 | -0.01 | 0.29 | 0.70 | ▂▁▆▇▆ |
gk_passes_completed_launched | 0 | 1.00 | 13.24 | 9.97 | 1.00 | 6.00 | 11.00 | 16.00 | 40.00 | ▇▇▂▁▂ |
gk_passes_launched | 0 | 1.00 | 37.15 | 25.26 | 3.00 | 16.00 | 30.00 | 52.00 | 100.00 | ▇▇▆▂▂ |
gk_passes_pct_launched | 0 | 1.00 | 37.19 | 13.02 | 8.30 | 30.20 | 36.60 | 42.30 | 75.00 | ▂▇▇▂▂ |
gk_passes | 0 | 1.00 | 78.32 | 44.41 | 3.00 | 47.00 | 74.00 | 109.00 | 178.00 | ▆▇▇▅▃ |
gk_passes_throws | 0 | 1.00 | 13.93 | 9.17 | 0.00 | 7.00 | 13.00 | 17.00 | 46.00 | ▆▇▂▁▁ |
gk_pct_passes_launched | 0 | 1.00 | 35.55 | 14.91 | 7.20 | 25.70 | 36.00 | 47.90 | 66.70 | ▅▆▇▆▂ |
gk_passes_length_avg | 0 | 1.00 | 33.44 | 6.09 | 22.00 | 29.20 | 33.20 | 38.30 | 48.30 | ▆▇▇▇▂ |
gk_goal_kicks | 0 | 1.00 | 23.61 | 13.37 | 2.00 | 14.00 | 21.00 | 33.00 | 56.00 | ▆▇▆▃▂ |
gk_pct_goal_kicks_launched | 0 | 1.00 | 46.55 | 27.43 | 0.00 | 26.90 | 47.20 | 66.70 | 100.00 | ▅▇▅▆▂ |
gk_goal_kick_length_avg | 0 | 1.00 | 40.89 | 12.67 | 16.90 | 30.10 | 41.00 | 50.00 | 67.50 | ▃▇▅▇▂ |
gk_crosses | 0 | 1.00 | 40.44 | 23.14 | 0.00 | 22.00 | 38.00 | 55.00 | 99.00 | ▆▇▆▃▂ |
gk_crosses_stopped | 0 | 1.00 | 2.37 | 2.67 | 0.00 | 0.00 | 2.00 | 3.00 | 12.00 | ▇▃▁▁▁ |
gk_crosses_stopped_pct | 1 | 0.98 | 5.11 | 4.86 | 0.00 | 0.00 | 4.75 | 7.15 | 18.20 | ▇▇▃▁▂ |
gk_def_actions_outside_pen_area | 0 | 1.00 | 2.98 | 3.06 | 0.00 | 0.00 | 2.00 | 5.00 | 14.00 | ▇▃▂▁▁ |
gk_def_actions_outside_pen_area_per90 | 0 | 1.00 | 0.87 | 0.91 | 0.00 | 0.00 | 0.67 | 1.25 | 4.67 | ▇▅▁▁▁ |
gk_avg_distance_def_actions | 1 | 0.98 | 13.11 | 3.75 | 4.70 | 11.23 | 12.95 | 15.27 | 22.00 | ▂▅▇▅▂ |
skim(player_misc)
Name | player_misc |
Number of rows | 680 |
Number of columns | 22 |
_______________________ | |
Column type frequency: | |
character | 4 |
numeric | 18 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
player | 0 | 1 | 4 | 26 | 0 | 680 | 0 |
position | 0 | 1 | 2 | 2 | 0 | 4 | 0 |
team | 0 | 1 | 5 | 14 | 0 | 32 | 0 |
age | 0 | 1 | 6 | 6 | 0 | 634 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
birth_year | 0 | 1.00 | 1994.91 | 4.16 | 1983 | 1992.0 | 1995.0 | 1998.0 | 2004.0 | ▁▅▇▇▃ |
minutes_90s | 0 | 1.00 | 2.12 | 1.64 | 0 | 0.8 | 1.9 | 3.0 | 7.7 | ▇▆▂▁▁ |
cards_yellow | 0 | 1.00 | 0.33 | 0.57 | 0 | 0.0 | 0.0 | 1.0 | 3.0 | ▇▃▁▁▁ |
cards_red | 0 | 1.00 | 0.01 | 0.08 | 0 | 0.0 | 0.0 | 0.0 | 1.0 | ▇▁▁▁▁ |
cards_yellow_red | 0 | 1.00 | 0.00 | 0.07 | 0 | 0.0 | 0.0 | 0.0 | 1.0 | ▇▁▁▁▁ |
fouls | 0 | 1.00 | 2.35 | 2.61 | 0 | 0.0 | 2.0 | 3.0 | 17.0 | ▇▂▁▁▁ |
fouled | 0 | 1.00 | 2.24 | 2.89 | 0 | 0.0 | 1.0 | 3.0 | 22.0 | ▇▁▁▁▁ |
offsides | 0 | 1.00 | 0.37 | 0.86 | 0 | 0.0 | 0.0 | 0.0 | 7.0 | ▇▁▁▁▁ |
crosses | 0 | 1.00 | 3.20 | 5.56 | 0 | 0.0 | 1.0 | 4.0 | 40.0 | ▇▁▁▁▁ |
interceptions | 0 | 1.00 | 1.57 | 2.08 | 0 | 0.0 | 1.0 | 2.0 | 14.0 | ▇▂▁▁▁ |
tackles_won | 0 | 1.00 | 1.76 | 2.33 | 0 | 0.0 | 1.0 | 3.0 | 17.0 | ▇▁▁▁▁ |
pens_won | 3 | 1.00 | 0.03 | 0.17 | 0 | 0.0 | 0.0 | 0.0 | 1.0 | ▇▁▁▁▁ |
pens_conceded | 3 | 1.00 | 0.03 | 0.18 | 0 | 0.0 | 0.0 | 0.0 | 1.0 | ▇▁▁▁▁ |
own_goals | 0 | 1.00 | 0.00 | 0.05 | 0 | 0.0 | 0.0 | 0.0 | 1.0 | ▇▁▁▁▁ |
ball_recoveries | 3 | 1.00 | 9.58 | 9.21 | 0 | 3.0 | 7.0 | 14.0 | 57.0 | ▇▃▁▁▁ |
aerials_won | 3 | 1.00 | 2.55 | 3.34 | 0 | 0.0 | 1.0 | 4.0 | 21.0 | ▇▂▁▁▁ |
aerials_lost | 3 | 1.00 | 2.55 | 2.90 | 0 | 0.0 | 2.0 | 4.0 | 17.0 | ▇▂▁▁▁ |
aerials_won_pct | 99 | 0.85 | 47.62 | 33.03 | 0 | 20.0 | 50.0 | 66.7 | 100.0 | ▇▅▇▅▆ |
skim(player_passing)
Name | player_passing |
Number of rows | 680 |
Number of columns | 29 |
_______________________ | |
Column type frequency: | |
character | 4 |
numeric | 25 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
player | 0 | 1 | 4 | 26 | 0 | 680 | 0 |
position | 0 | 1 | 2 | 2 | 0 | 4 | 0 |
team | 0 | 1 | 5 | 14 | 0 | 32 | 0 |
age | 0 | 1 | 6 | 6 | 0 | 634 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
birth_year | 0 | 1.00 | 1994.91 | 4.16 | 1983.0 | 1992.00 | 1995.00 | 1998.00 | 2004.0 | ▁▅▇▇▃ |
minutes_90s | 0 | 1.00 | 2.12 | 1.64 | 0.0 | 0.80 | 1.90 | 3.00 | 7.7 | ▇▆▂▁▁ |
passes_completed | 3 | 1.00 | 82.47 | 90.05 | 0.0 | 20.00 | 52.00 | 110.00 | 642.0 | ▇▂▁▁▁ |
passes | 3 | 1.00 | 101.63 | 102.48 | 0.0 | 28.00 | 69.00 | 141.00 | 689.0 | ▇▂▁▁▁ |
passes_pct | 6 | 0.99 | 76.86 | 12.69 | 0.0 | 69.20 | 78.55 | 86.00 | 100.0 | ▁▁▂▇▇ |
passes_total_distance | 3 | 1.00 | 1444.41 | 1625.59 | 0.0 | 301.00 | 841.00 | 2017.00 | 12636.0 | ▇▂▁▁▁ |
passes_progressive_distance | 3 | 1.00 | 484.30 | 597.90 | 0.0 | 70.00 | 255.00 | 676.00 | 3349.0 | ▇▂▁▁▁ |
passes_completed_short | 3 | 1.00 | 37.77 | 41.55 | 0.0 | 9.00 | 24.00 | 51.00 | 281.0 | ▇▂▁▁▁ |
passes_short | 3 | 1.00 | 42.24 | 44.89 | 0.0 | 11.00 | 28.00 | 57.00 | 306.0 | ▇▂▁▁▁ |
passes_pct_short | 15 | 0.98 | 86.95 | 12.26 | 0.0 | 81.80 | 89.10 | 95.30 | 100.0 | ▁▁▁▂▇ |
passes_completed_medium | 3 | 1.00 | 34.68 | 43.76 | 0.0 | 7.00 | 19.00 | 47.00 | 416.0 | ▇▁▁▁▁ |
passes_medium | 3 | 1.00 | 39.75 | 46.84 | 0.0 | 9.00 | 24.00 | 56.00 | 435.0 | ▇▁▁▁▁ |
passes_pct_medium | 16 | 0.98 | 81.54 | 18.58 | 0.0 | 73.88 | 85.65 | 94.70 | 100.0 | ▁▁▁▃▇ |
passes_completed_long | 3 | 1.00 | 7.99 | 10.05 | 0.0 | 1.00 | 4.00 | 11.00 | 55.0 | ▇▂▁▁▁ |
passes_long | 3 | 1.00 | 14.29 | 17.29 | 0.0 | 2.00 | 8.00 | 20.00 | 115.0 | ▇▂▁▁▁ |
passes_pct_long | 82 | 0.88 | 55.55 | 27.04 | 0.0 | 40.08 | 55.60 | 71.92 | 100.0 | ▃▃▇▇▅ |
assists | 0 | 1.00 | 0.18 | 0.49 | 0.0 | 0.00 | 0.00 | 0.00 | 3.0 | ▇▁▁▁▁ |
xg_assist | 3 | 1.00 | 0.17 | 0.32 | 0.0 | 0.00 | 0.00 | 0.20 | 3.1 | ▇▁▁▁▁ |
pass_xa | 3 | 1.00 | 0.15 | 0.29 | 0.0 | 0.00 | 0.10 | 0.20 | 3.6 | ▇▁▁▁▁ |
xg_assist_net | 3 | 1.00 | 0.01 | 0.36 | -1.3 | -0.10 | 0.00 | 0.00 | 2.1 | ▁▇▁▁▁ |
assisted_shots | 3 | 1.00 | 1.59 | 2.39 | 0.0 | 0.00 | 1.00 | 2.00 | 21.0 | ▇▁▁▁▁ |
passes_into_final_third | 3 | 1.00 | 5.93 | 8.25 | 0.0 | 1.00 | 3.00 | 8.00 | 71.0 | ▇▁▁▁▁ |
passes_into_penalty_area | 3 | 1.00 | 1.33 | 2.09 | 0.0 | 0.00 | 1.00 | 2.00 | 18.0 | ▇▁▁▁▁ |
crosses_into_penalty_area | 3 | 1.00 | 0.37 | 0.83 | 0.0 | 0.00 | 0.00 | 0.00 | 6.0 | ▇▁▁▁▁ |
progressive_passes | 3 | 1.00 | 5.14 | 6.51 | 0.0 | 1.00 | 3.00 | 7.00 | 61.0 | ▇▁▁▁▁ |
skim(player_passing_types)
Name | player_passing_types |
Number of rows | 680 |
Number of columns | 21 |
_______________________ | |
Column type frequency: | |
character | 4 |
numeric | 17 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
player | 0 | 1 | 4 | 26 | 0 | 680 | 0 |
position | 0 | 1 | 2 | 2 | 0 | 4 | 0 |
team | 0 | 1 | 5 | 14 | 0 | 32 | 0 |
age | 0 | 1 | 6 | 6 | 0 | 634 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
birth_year | 0 | 1 | 1994.91 | 4.16 | 1983 | 1992.0 | 1995.0 | 1998 | 2004.0 | ▁▅▇▇▃ |
minutes_90s | 0 | 1 | 2.12 | 1.64 | 0 | 0.8 | 1.9 | 3 | 7.7 | ▇▆▂▁▁ |
passes | 3 | 1 | 101.63 | 102.48 | 0 | 28.0 | 69.0 | 141 | 689.0 | ▇▂▁▁▁ |
passes_live | 3 | 1 | 92.04 | 96.25 | 0 | 24.0 | 61.0 | 121 | 678.0 | ▇▂▁▁▁ |
passes_dead | 3 | 1 | 9.22 | 12.65 | 0 | 1.0 | 4.0 | 12 | 73.0 | ▇▁▁▁▁ |
passes_free_kicks | 3 | 1 | 2.60 | 3.95 | 0 | 0.0 | 1.0 | 4 | 33.0 | ▇▁▁▁▁ |
through_balls | 3 | 1 | 0.22 | 0.59 | 0 | 0.0 | 0.0 | 0 | 4.0 | ▇▁▁▁▁ |
passes_switches | 3 | 1 | 0.88 | 1.70 | 0 | 0.0 | 0.0 | 1 | 17.0 | ▇▁▁▁▁ |
crosses | 0 | 1 | 3.20 | 5.56 | 0 | 0.0 | 1.0 | 4 | 40.0 | ▇▁▁▁▁ |
throw_ins | 3 | 1 | 3.84 | 8.86 | 0 | 0.0 | 0.0 | 2 | 67.0 | ▇▁▁▁▁ |
corner_kicks | 3 | 1 | 0.84 | 2.85 | 0 | 0.0 | 0.0 | 0 | 28.0 | ▇▁▁▁▁ |
corner_kicks_in | 3 | 1 | 0.35 | 1.31 | 0 | 0.0 | 0.0 | 0 | 13.0 | ▇▁▁▁▁ |
corner_kicks_out | 3 | 1 | 0.30 | 1.23 | 0 | 0.0 | 0.0 | 0 | 11.0 | ▇▁▁▁▁ |
corner_kicks_straight | 3 | 1 | 0.02 | 0.32 | 0 | 0.0 | 0.0 | 0 | 8.0 | ▇▁▁▁▁ |
passes_completed | 3 | 1 | 82.47 | 90.05 | 0 | 20.0 | 52.0 | 110 | 642.0 | ▇▂▁▁▁ |
passes_offsides | 3 | 1 | 0.38 | 0.74 | 0 | 0.0 | 0.0 | 1 | 5.0 | ▇▁▁▁▁ |
passes_blocked | 3 | 1 | 1.79 | 2.23 | 0 | 0.0 | 1.0 | 3 | 16.0 | ▇▁▁▁▁ |
skim(player_playingtime)
Name | player_playingtime |
Number of rows | 829 |
Number of columns | 27 |
_______________________ | |
Column type frequency: | |
character | 4 |
numeric | 23 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
player | 0 | 1 | 4 | 26 | 0 | 829 | 0 |
position | 0 | 1 | 2 | 2 | 0 | 4 | 0 |
team | 0 | 1 | 5 | 14 | 0 | 32 | 0 |
age | 0 | 1 | 6 | 6 | 0 | 762 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
birth_year | 0 | 1.00 | 1994.99 | 4.27 | 1982.00 | 1992.00 | 1995.00 | 1998.00 | 2004.00 | ▁▃▇▇▃ |
games | 0 | 1.00 | 2.41 | 1.78 | 0.00 | 1.00 | 2.00 | 3.00 | 7.00 | ▇▃▇▁▁ |
minutes | 149 | 0.82 | 191.19 | 147.77 | 1.00 | 68.00 | 173.00 | 270.00 | 690.00 | ▇▆▂▁▁ |
minutes_per_game | 149 | 0.82 | 59.62 | 28.91 | 1.00 | 35.00 | 65.00 | 89.00 | 120.00 | ▃▅▃▇▁ |
minutes_pct | 148 | 0.82 | 51.68 | 34.47 | 0.00 | 18.80 | 47.40 | 86.40 | 100.00 | ▇▅▃▃▇ |
minutes_90s | 148 | 0.82 | 2.12 | 1.64 | 0.00 | 0.80 | 1.90 | 3.00 | 7.70 | ▇▆▂▁▁ |
games_starts | 0 | 1.00 | 1.70 | 1.78 | 0.00 | 0.00 | 1.00 | 3.00 | 7.00 | ▇▂▅▁▁ |
minutes_per_start | 309 | 0.63 | 80.05 | 14.29 | 12.00 | 72.00 | 86.00 | 90.00 | 120.00 | ▁▁▃▇▁ |
games_complete | 0 | 1.00 | 0.99 | 1.48 | 0.00 | 0.00 | 0.00 | 2.00 | 7.00 | ▇▁▂▁▁ |
games_subs | 0 | 1.00 | 0.71 | 1.06 | 0.00 | 0.00 | 0.00 | 1.00 | 6.00 | ▇▁▁▁▁ |
minutes_per_sub | 486 | 0.41 | 21.96 | 13.19 | 1.00 | 12.00 | 21.00 | 30.00 | 78.00 | ▇▇▃▁▁ |
unused_subs | 0 | 1.00 | 1.49 | 1.65 | 0.00 | 0.00 | 1.00 | 3.00 | 7.00 | ▇▂▃▁▁ |
points_per_game | 148 | 0.82 | 1.24 | 0.86 | 0.00 | 0.50 | 1.33 | 1.80 | 3.00 | ▇▃▇▃▂ |
on_goals_for | 148 | 0.82 | 2.79 | 3.19 | 0.00 | 0.00 | 2.00 | 4.00 | 16.00 | ▇▂▁▁▁ |
on_goals_against | 148 | 0.82 | 2.79 | 2.41 | 0.00 | 1.00 | 2.00 | 4.00 | 11.00 | ▇▃▂▂▁ |
plus_minus | 148 | 0.82 | 0.00 | 2.92 | -8.00 | -2.00 | 0.00 | 1.00 | 11.00 | ▁▇▇▂▁ |
plus_minus_per90 | 149 | 0.82 | -0.40 | 2.88 | -45.00 | -1.00 | 0.00 | 0.72 | 22.50 | ▁▁▁▇▁ |
plus_minus_wowy | 240 | 0.71 | -0.07 | 4.07 | -44.25 | -1.38 | 0.00 | 1.34 | 43.98 | ▁▁▇▁▁ |
on_xg_for | 152 | 0.82 | 2.76 | 2.67 | 0.00 | 0.80 | 2.30 | 3.80 | 15.20 | ▇▃▁▁▁ |
on_xg_against | 152 | 0.82 | 2.75 | 2.30 | 0.00 | 1.00 | 2.30 | 3.90 | 11.00 | ▇▅▂▁▁ |
xg_plus_minus | 152 | 0.82 | 0.00 | 2.49 | -8.70 | -1.10 | -0.10 | 0.80 | 10.60 | ▁▃▇▁▁ |
xg_plus_minus_per90 | 154 | 0.81 | -0.11 | 3.39 | -46.61 | -0.76 | -0.14 | 0.48 | 46.61 | ▁▁▇▁▁ |
xg_plus_minus_wowy | 233 | 0.72 | -0.05 | 4.57 | -44.55 | -0.90 | -0.05 | 0.83 | 44.55 | ▁▁▇▁▁ |
skim(player_possession)
Name | player_possession |
Number of rows | 680 |
Number of columns | 20 |
_______________________ | |
Column type frequency: | |
character | 4 |
numeric | 16 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
player | 0 | 1 | 4 | 26 | 0 | 680 | 0 |
position | 0 | 1 | 2 | 2 | 0 | 4 | 0 |
team | 0 | 1 | 5 | 14 | 0 | 32 | 0 |
age | 0 | 1 | 6 | 6 | 0 | 634 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
birth_year | 0 | 1.00 | 1994.91 | 4.16 | 1983 | 1992.0 | 1995.0 | 1998.00 | 2004.0 | ▁▅▇▇▃ |
minutes_90s | 0 | 1.00 | 2.12 | 1.64 | 0 | 0.8 | 1.9 | 3.00 | 7.7 | ▇▆▂▁▁ |
touches | 3 | 1.00 | 121.41 | 114.23 | 0 | 34.0 | 88.0 | 171.00 | 710.0 | ▇▃▁▁▁ |
touches_def_pen_area | 3 | 1.00 | 11.63 | 25.95 | 0 | 1.0 | 3.0 | 10.00 | 224.0 | ▇▁▁▁▁ |
touches_def_3rd | 3 | 1.00 | 37.68 | 47.73 | 0 | 5.0 | 17.0 | 53.00 | 297.0 | ▇▂▁▁▁ |
touches_mid_3rd | 3 | 1.00 | 58.28 | 67.50 | 0 | 12.0 | 38.0 | 79.00 | 518.0 | ▇▁▁▁▁ |
touches_att_3rd | 3 | 1.00 | 26.54 | 31.80 | 0 | 5.0 | 15.0 | 37.00 | 239.0 | ▇▁▁▁▁ |
touches_att_pen_area | 3 | 1.00 | 3.56 | 5.25 | 0 | 0.0 | 2.0 | 5.00 | 61.0 | ▇▁▁▁▁ |
touches_live_ball | 3 | 1.00 | 121.38 | 114.20 | 0 | 34.0 | 88.0 | 171.00 | 710.0 | ▇▃▁▁▁ |
dribbles_completed | 3 | 1.00 | 1.09 | 2.20 | 0 | 0.0 | 0.0 | 1.00 | 25.0 | ▇▁▁▁▁ |
dribbles | 3 | 1.00 | 2.90 | 4.87 | 0 | 0.0 | 1.0 | 4.00 | 52.0 | ▇▁▁▁▁ |
dribbles_completed_pct | 245 | 0.64 | 38.57 | 35.22 | 0 | 0.0 | 33.3 | 56.35 | 100.0 | ▇▃▅▂▃ |
miscontrols | 3 | 1.00 | 2.92 | 3.38 | 0 | 0.0 | 2.0 | 4.00 | 21.0 | ▇▂▁▁▁ |
dispossessed | 3 | 1.00 | 1.75 | 2.44 | 0 | 0.0 | 1.0 | 3.00 | 24.0 | ▇▁▁▁▁ |
passes_received | 3 | 1.00 | 81.34 | 84.41 | 0 | 22.0 | 58.0 | 111.00 | 598.0 | ▇▂▁▁▁ |
progressive_passes_received | 3 | 1.00 | 4.99 | 6.71 | 0 | 0.0 | 2.0 | 7.00 | 58.0 | ▇▁▁▁▁ |
skim(player_stats)
Name | player_stats |
Number of rows | 680 |
Number of columns | 31 |
_______________________ | |
Column type frequency: | |
character | 5 |
numeric | 26 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
player | 0 | 1 | 4 | 26 | 0 | 680 | 0 |
position | 0 | 1 | 2 | 2 | 0 | 4 | 0 |
team | 0 | 1 | 5 | 14 | 0 | 32 | 0 |
age | 0 | 1 | 6 | 6 | 0 | 634 | 0 |
club | 1 | 1 | 3 | 33 | 0 | 254 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
birth_year | 0 | 1.00 | 1994.91 | 4.16 | 1983 | 1992.0 | 1995.00 | 1998.00 | 2004.00 | ▁▅▇▇▃ |
games | 0 | 1.00 | 2.93 | 1.52 | 1 | 2.0 | 3.00 | 4.00 | 7.00 | ▇▆▃▁▂ |
games_starts | 0 | 1.00 | 2.07 | 1.75 | 0 | 1.0 | 2.00 | 3.00 | 7.00 | ▇▂▆▁▁ |
minutes | 0 | 1.00 | 191.19 | 147.77 | 1 | 68.0 | 173.00 | 270.00 | 690.00 | ▇▆▂▁▁ |
minutes_90s | 0 | 1.00 | 2.12 | 1.64 | 0 | 0.8 | 1.90 | 3.00 | 7.70 | ▇▆▂▁▁ |
goals | 0 | 1.00 | 0.25 | 0.70 | 0 | 0.0 | 0.00 | 0.00 | 8.00 | ▇▁▁▁▁ |
assists | 0 | 1.00 | 0.18 | 0.49 | 0 | 0.0 | 0.00 | 0.00 | 3.00 | ▇▁▁▁▁ |
goals_pens | 0 | 1.00 | 0.22 | 0.60 | 0 | 0.0 | 0.00 | 0.00 | 6.00 | ▇▁▁▁▁ |
pens_made | 0 | 1.00 | 0.03 | 0.21 | 0 | 0.0 | 0.00 | 0.00 | 4.00 | ▇▁▁▁▁ |
pens_att | 0 | 1.00 | 0.03 | 0.27 | 0 | 0.0 | 0.00 | 0.00 | 5.00 | ▇▁▁▁▁ |
cards_yellow | 0 | 1.00 | 0.33 | 0.57 | 0 | 0.0 | 0.00 | 1.00 | 3.00 | ▇▃▁▁▁ |
cards_red | 0 | 1.00 | 0.01 | 0.08 | 0 | 0.0 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
goals_per90 | 0 | 1.00 | 0.11 | 0.34 | 0 | 0.0 | 0.00 | 0.00 | 3.46 | ▇▁▁▁▁ |
assists_per90 | 0 | 1.00 | 0.10 | 0.88 | 0 | 0.0 | 0.00 | 0.00 | 22.50 | ▇▁▁▁▁ |
goals_assists_per90 | 0 | 1.00 | 0.21 | 0.95 | 0 | 0.0 | 0.00 | 0.14 | 22.50 | ▇▁▁▁▁ |
goals_pens_per90 | 0 | 1.00 | 0.10 | 0.33 | 0 | 0.0 | 0.00 | 0.00 | 3.46 | ▇▁▁▁▁ |
goals_assists_pens_per90 | 0 | 1.00 | 0.20 | 0.95 | 0 | 0.0 | 0.00 | 0.00 | 22.50 | ▇▁▁▁▁ |
xg | 3 | 1.00 | 0.26 | 0.54 | 0 | 0.0 | 0.10 | 0.30 | 6.60 | ▇▁▁▁▁ |
npxg | 3 | 1.00 | 0.23 | 0.42 | 0 | 0.0 | 0.10 | 0.30 | 3.60 | ▇▁▁▁▁ |
xg_assist | 3 | 1.00 | 0.17 | 0.32 | 0 | 0.0 | 0.00 | 0.20 | 3.10 | ▇▁▁▁▁ |
npxg_xg_assist | 3 | 1.00 | 0.40 | 0.61 | 0 | 0.0 | 0.20 | 0.50 | 5.10 | ▇▁▁▁▁ |
xg_per90 | 5 | 0.99 | 0.13 | 0.28 | 0 | 0.0 | 0.04 | 0.13 | 2.86 | ▇▁▁▁▁ |
xg_assist_per90 | 5 | 0.99 | 0.11 | 0.59 | 0 | 0.0 | 0.01 | 0.10 | 14.37 | ▇▁▁▁▁ |
xg_xg_assist_per90 | 5 | 0.99 | 0.24 | 0.66 | 0 | 0.0 | 0.09 | 0.27 | 14.37 | ▇▁▁▁▁ |
npxg_per90 | 5 | 0.99 | 0.12 | 0.27 | 0 | 0.0 | 0.04 | 0.13 | 2.86 | ▇▁▁▁▁ |
npxg_xg_assist_per90 | 5 | 0.99 | 0.23 | 0.65 | 0 | 0.0 | 0.09 | 0.26 | 14.37 | ▇▁▁▁▁ |
Data 2
Introduction and data
- Identify the source of the data.
Data is pulled from the UNdata website, providing official statistics compiled from the UN data system.
- State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
Data was first compiled in 2010 from each country’s official records submitted to questionnaires sent out annually to each national statistical office.
- Write a brief description of the observations.
Includes country, population growth rate, fertility rate, mortality rate, and life expectancy at birth.
Research question
- A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
What social indicators are most closely correlated to a country’s decreasing population growth rate?
- A description of the research topic along with a concise statement of your hypotheses on this topic.
Researching different indicators commonly associated with the development of a country, and how they affect their population growth rate. For example, the country’s GDP, female education rates, labor market indicators, crime rates, etc. We predict an increase in crime rates and an increase in female education rates will be associated with a decrease in population growth rate. This is important to study because it could provide insight into how bolster or hinder a country’s growth rate which can be valauable for government and NGOs.
- Identify the types of variables in your research question. Categorical? Quantitative?
The growth rate is quantitative, while the indicators are categorical.
Glimpse of data
# add code here
<- read.csv("data/UNpopulation_growth.csv")
pop_growth <- read.csv("data/UNLabor_Market.csv")
labor_market
skim(pop_growth)
Name | pop_growth |
Number of rows | 6655 |
Number of columns | 7 |
_______________________ | |
Column type frequency: | |
character | 7 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
T04 | 0 | 1 | 1 | 19 | 0 | 265 | 0 |
Population.growth.and.indicators.of.fertility.and.mortality | 0 | 1 | 0 | 29 | 1 | 265 | 0 |
X | 0 | 1 | 4 | 4 | 0 | 5 | 0 |
X.1 | 0 | 1 | 6 | 56 | 0 | 8 | 0 |
X.2 | 0 | 1 | 1 | 5 | 0 | 1067 | 0 |
X.3 | 0 | 1 | 0 | 364 | 4110 | 83 | 0 |
X.4 | 0 | 1 | 6 | 297 | 0 | 5 | 0 |
skim(labor_market)
Name | labor_market |
Number of rows | 2194 |
Number of columns | 20 |
_______________________ | |
Column type frequency: | |
character | 15 |
numeric | 5 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
Country..code. | 5 | 1 | 0 | 3 | 20 | 174 | 0 |
Country | 0 | 1 | 0 | 42 | 1 | 194 | 0 |
Year | 0 | 1 | 0 | 106 | 2 | 78 | 0 |
Sex | 0 | 1 | 0 | 2 | 20 | 2 | 0 |
Youth.unemployed…000. | 0 | 1 | 0 | 10 | 30 | 1836 | 0 |
Youth.labour.force…000. | 0 | 1 | 0 | 11 | 354 | 1783 | 0 |
Adult.unemployed…000. | 0 | 1 | 0 | 10 | 44 | 1939 | 0 |
Adult.labour.force…000. | 0 | 1 | 0 | 12 | 357 | 1821 | 0 |
Total.unemployed…000. | 0 | 1 | 0 | 10 | 42 | 2014 | 0 |
Youth.population…000. | 0 | 1 | 0 | 12 | 485 | 1673 | 0 |
Repository..code. | 0 | 1 | 0 | 9 | 20 | 8 | 0 |
Type.of.source..code. | 0 | 1 | 0 | 4 | 20 | 9 | 0 |
Coverage..code. | 0 | 1 | 0 | 1 | 20 | 3 | 0 |
Age | 0 | 1 | 0 | 26 | 20 | 21 | 0 |
Notes | 0 | 1 | 0 | 298 | 702 | 168 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Youth.unemployment.rate…. | 345 | 0.84 | 18.05 | 10.97 | 0.7 | 9.8 | 15.9 | 23.70 | 70.9 | ▇▇▂▁▁ |
Ratio.of.youth.unemployment.rate.to.adult.unemployment.rate | 363 | 0.83 | 2.96 | 1.28 | 0.6 | 2.3 | 2.7 | 3.30 | 16.4 | ▇▁▁▁▁ |
Share.of.youth.unemployed.in.total.unemployed…. | 42 | 0.98 | 38.37 | 13.47 | 7.0 | 27.7 | 37.5 | 47.73 | 83.4 | ▂▇▇▃▁ |
Share.of.youth.unemployed.in.youth.population…. | 486 | 0.78 | 8.19 | 4.60 | 0.5 | 4.8 | 7.3 | 10.50 | 30.8 | ▇▇▂▁▁ |
Adult.unemployment.rate…. | 352 | 0.84 | 6.80 | 4.90 | 0.3 | 3.6 | 5.7 | 8.60 | 37.8 | ▇▃▁▁▁ |
Data 3
Introduction and data
Identify the source of the data.
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
Write a brief description of the observations.
The first group of data was obtained by the Wall Street Journal based on data from Payscale, Inc. The data was last updated in 2017. There are three datasets. The first dataset has one observation per degree with salary information on each degree. The other two datasets give salary information per college and degree. (https://www.kaggle.com/datasets/wsj/college-salaries?resource=download)
The second dataset is from the Department of Education Statistics and was collected from 1970-2011. Each observation pertains to a certain year, and contains the percentage of women awarded bachelors degrees in each major for that year.
Research question
- A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
How has the student gender ratio within different fields of study/college majors changed over the years, and is this related to the median earnings for each field of study?
- A description of the research topic along with a concise statement of your hypotheses on this topic.
The demographics of college students is a good indication of the interests of young people, and variables such as gender ratio and median earnings within different fields of study can give us insight into the young workforce. I hypothesize that over recent years, fields of study that have traditionally been attributed to men or women have become less skewed in gender proportion.
- Identify the types of variables in your research question. Categorical? Quantitative?
Field of study - categorical
Gender ratio - quantitative
Median earnings - quantitative
Glimpse of data
<- read.csv("data/college/degrees-that-pay-back.csv")
degrees |> glimpse() degrees
Rows: 50
Columns: 8
$ Undergraduate.Major <chr> "Accounting", "Aeros…
$ Starting.Median.Salary <chr> "$46,000.00", "$57,7…
$ Mid.Career.Median.Salary <chr> "$77,100.00", "$101,…
$ Percent.change.from.Starting.to.Mid.Career.Salary <dbl> 67.6, 75.0, 68.8, 67…
$ Mid.Career.10th.Percentile.Salary <chr> "$42,200.00", "$64,3…
$ Mid.Career.25th.Percentile.Salary <chr> "$56,100.00", "$82,1…
$ Mid.Career.75th.Percentile.Salary <chr> "$108,000.00", "$127…
$ Mid.Career.90th.Percentile.Salary <chr> "$152,000.00", "$161…
<- read.csv("data/college/salaries-by-college-type.csv")
salaries_type |> glimpse() salaries_type
Rows: 269
Columns: 8
$ School.Name <chr> "Massachusetts Institute of Technolo…
$ School.Type <chr> "Engineering", "Engineering", "Engin…
$ Starting.Median.Salary <chr> "$72,200.00", "$75,500.00", "$71,800…
$ Mid.Career.Median.Salary <chr> "$126,000.00", "$123,000.00", "$122,…
$ Mid.Career.10th.Percentile.Salary <chr> "$76,800.00", "N/A", "N/A", "$66,800…
$ Mid.Career.25th.Percentile.Salary <chr> "$99,200.00", "$104,000.00", "$96,00…
$ Mid.Career.75th.Percentile.Salary <chr> "$168,000.00", "$161,000.00", "$180,…
$ Mid.Career.90th.Percentile.Salary <chr> "$220,000.00", "N/A", "N/A", "$190,0…
<- read.csv("data/college/salaries-by-region.csv")
salaries_region |> glimpse() salaries_region
Rows: 320
Columns: 8
$ School.Name <chr> "Stanford University", "California I…
$ Region <chr> "California", "California", "Califor…
$ Starting.Median.Salary <chr> "$70,400.00", "$75,500.00", "$71,800…
$ Mid.Career.Median.Salary <chr> "$129,000.00", "$123,000.00", "$122,…
$ Mid.Career.10th.Percentile.Salary <chr> "$68,400.00", "N/A", "N/A", "$59,500…
$ Mid.Career.25th.Percentile.Salary <chr> "$93,100.00", "$104,000.00", "$96,00…
$ Mid.Career.75th.Percentile.Salary <chr> "$184,000.00", "$161,000.00", "$180,…
$ Mid.Career.90th.Percentile.Salary <chr> "$257,000.00", "N/A", "N/A", "$201,0…
<- read.csv("data/college/percent-bachelors-degrees-women-usa.csv")
college_women |> glimpse() college_women
Rows: 42
Columns: 18
$ Year <int> 1970, 1971, 1972, 1973, 1974, 1975, 1976…
$ Agriculture <dbl> 4.229798, 5.452797, 7.420710, 9.653602, …
$ Architecture <dbl> 11.92101, 12.00311, 13.21459, 14.79161, …
$ Art.and.Performance <dbl> 59.7, 59.9, 60.4, 60.2, 61.9, 60.9, 61.3…
$ Biology <dbl> 29.08836, 29.39440, 29.81022, 31.14791, …
$ Business <dbl> 9.064439, 9.503187, 10.558962, 12.804602…
$ Communications.and.Journalism <dbl> 35.3, 35.5, 36.6, 38.4, 40.5, 41.5, 44.3…
$ Computer.Science <dbl> 13.6, 13.6, 14.9, 16.4, 18.9, 19.8, 23.9…
$ Education <dbl> 74.53533, 74.14920, 73.55452, 73.50181, …
$ Engineering <dbl> 0.8, 1.0, 1.2, 1.6, 2.2, 3.2, 4.5, 6.8, …
$ English <dbl> 65.57092, 64.55649, 63.66426, 62.94150, …
$ Foreign.Languages <dbl> 73.8, 73.9, 74.6, 74.9, 75.3, 75.0, 74.4…
$ Health.Professions <dbl> 77.1, 75.5, 76.9, 77.4, 77.9, 78.9, 79.2…
$ Math.and.Statistics <dbl> 38.0, 39.0, 40.2, 40.9, 41.8, 40.7, 41.5…
$ Physical.Sciences <dbl> 13.8, 14.9, 14.8, 16.5, 18.2, 19.1, 20.0…
$ Psychology <dbl> 44.4, 46.2, 47.6, 50.4, 52.6, 54.5, 56.9…
$ Public.Administration <dbl> 68.4, 65.5, 62.6, 64.3, 66.1, 63.0, 65.6…
$ Social.Sciences.and.History <dbl> 36.8, 36.2, 36.1, 36.4, 37.3, 37.7, 39.2…
skim(degrees)
Name | degrees |
Number of rows | 50 |
Number of columns | 8 |
_______________________ | |
Column type frequency: | |
character | 7 |
numeric | 1 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
Undergraduate.Major | 0 | 1 | 4 | 36 | 0 | 50 | 0 |
Starting.Median.Salary | 0 | 1 | 10 | 10 | 0 | 43 | 0 |
Mid.Career.Median.Salary | 0 | 1 | 10 | 11 | 0 | 49 | 0 |
Mid.Career.10th.Percentile.Salary | 0 | 1 | 10 | 10 | 0 | 45 | 0 |
Mid.Career.25th.Percentile.Salary | 0 | 1 | 10 | 10 | 0 | 48 | 0 |
Mid.Career.75th.Percentile.Salary | 0 | 1 | 10 | 11 | 0 | 44 | 0 |
Mid.Career.90th.Percentile.Salary | 0 | 1 | 10 | 11 | 0 | 43 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Percent.change.from.Starting.to.Mid.Career.Salary | 0 | 1 | 69.27 | 17.91 | 23.4 | 59.12 | 67.8 | 82.42 | 103.5 | ▁▂▇▃▂ |
skim(salaries_type)
Name | salaries_type |
Number of rows | 269 |
Number of columns | 8 |
_______________________ | |
Column type frequency: | |
character | 8 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
School.Name | 0 | 1 | 12 | 67 | 0 | 249 | 0 |
School.Type | 0 | 1 | 5 | 12 | 0 | 5 | 0 |
Starting.Median.Salary | 0 | 1 | 10 | 10 | 0 | 145 | 0 |
Mid.Career.Median.Salary | 0 | 1 | 10 | 11 | 0 | 168 | 0 |
Mid.Career.10th.Percentile.Salary | 0 | 1 | 3 | 10 | 0 | 142 | 0 |
Mid.Career.25th.Percentile.Salary | 0 | 1 | 10 | 11 | 0 | 178 | 0 |
Mid.Career.75th.Percentile.Salary | 0 | 1 | 10 | 11 | 0 | 110 | 0 |
Mid.Career.90th.Percentile.Salary | 0 | 1 | 3 | 11 | 0 | 99 | 0 |
skim(salaries_region)
Name | salaries_region |
Number of rows | 320 |
Number of columns | 8 |
_______________________ | |
Column type frequency: | |
character | 8 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
School.Name | 0 | 1 | 12 | 67 | 0 | 320 | 0 |
Region | 0 | 1 | 7 | 12 | 0 | 5 | 0 |
Starting.Median.Salary | 0 | 1 | 10 | 10 | 0 | 168 | 0 |
Mid.Career.Median.Salary | 0 | 1 | 10 | 11 | 0 | 204 | 0 |
Mid.Career.10th.Percentile.Salary | 0 | 1 | 3 | 10 | 0 | 167 | 0 |
Mid.Career.25th.Percentile.Salary | 0 | 1 | 10 | 11 | 0 | 217 | 0 |
Mid.Career.75th.Percentile.Salary | 0 | 1 | 10 | 11 | 0 | 130 | 0 |
Mid.Career.90th.Percentile.Salary | 0 | 1 | 3 | 11 | 0 | 116 | 0 |
skim(college_women)
Name | college_women |
Number of rows | 42 |
Number of columns | 18 |
_______________________ | |
Column type frequency: | |
numeric | 18 |
________________________ | |
Group variables | None |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Year | 0 | 1 | 1990.50 | 12.27 | 1970.00 | 1980.25 | 1990.50 | 2000.75 | 2011.00 | ▇▇▇▇▇ |
Agriculture | 0 | 1 | 33.85 | 12.55 | 4.23 | 30.84 | 33.32 | 45.66 | 50.04 | ▂▂▇▅▇ |
Architecture | 0 | 1 | 33.69 | 9.57 | 11.92 | 28.52 | 35.99 | 40.79 | 44.50 | ▂▂▂▆▇ |
Art.and.Performance | 0 | 1 | 61.10 | 1.31 | 58.60 | 60.20 | 61.30 | 62.00 | 63.40 | ▅▃▇▇▃ |
Biology | 0 | 1 | 49.43 | 10.09 | 29.09 | 44.31 | 50.97 | 58.68 | 62.17 | ▃▂▃▆▇ |
Business | 0 | 1 | 40.65 | 13.12 | 9.06 | 37.39 | 47.21 | 48.88 | 50.55 | ▂▁▁▁▇ |
Communications.and.Journalism | 0 | 1 | 56.22 | 8.70 | 35.30 | 55.13 | 59.85 | 62.13 | 64.60 | ▂▁▁▂▇ |
Computer.Science | 0 | 1 | 25.81 | 6.69 | 13.60 | 19.12 | 27.30 | 29.77 | 37.10 | ▆▃▇▇▅ |
Education | 0 | 1 | 76.36 | 2.21 | 72.17 | 74.99 | 75.94 | 78.62 | 79.62 | ▅▃▆▃▇ |
Engineering | 0 | 1 | 12.89 | 5.67 | 0.80 | 10.62 | 14.10 | 16.95 | 19.00 | ▂▁▂▅▇ |
English | 0 | 1 | 66.19 | 1.95 | 61.65 | 65.58 | 66.11 | 67.86 | 68.89 | ▃▁▇▅▇ |
Foreign.Languages | 0 | 1 | 71.72 | 1.93 | 69.00 | 70.12 | 71.15 | 73.88 | 75.30 | ▇▇▃▂▆ |
Health.Professions | 0 | 1 | 82.98 | 2.91 | 75.50 | 81.82 | 83.70 | 85.18 | 86.50 | ▂▁▃▃▇ |
Math.and.Statistics | 0 | 1 | 44.48 | 2.65 | 38.00 | 42.87 | 44.90 | 46.50 | 48.30 | ▁▃▆▅▇ |
Physical.Sciences | 0 | 1 | 31.30 | 9.00 | 13.80 | 24.88 | 32.10 | 40.20 | 42.20 | ▃▂▃▃▇ |
Psychology | 0 | 1 | 68.78 | 9.71 | 44.40 | 65.55 | 72.75 | 76.92 | 77.80 | ▂▁▁▃▇ |
Public.Administration | 0 | 1 | 76.09 | 5.88 | 62.60 | 74.62 | 77.45 | 81.10 | 82.10 | ▂▁▁▆▇ |
Social.Sciences.and.History | 0 | 1 | 45.41 | 4.76 | 36.10 | 43.82 | 45.30 | 49.38 | 51.80 | ▃▁▆▂▇ |