library(tidyverse)
library(skimr)
Project Strength
Proposal
Data 1
Introduction and data
Identify the source of the data.
- source: https://www.ncaa.org/sports/2016/12/14/shared-ncaa-research-data.aspx
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
- In 2008 late NCAA President Myles Brand charged the NCAA staff with developing a program to gather the data and provide it to interested scholars.
Write a brief description of the observations.
- The csv contains data on all of the NCAA teams in America, with data on the teams from the past several years on each teams rates of academic progression and success.
Research question
A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
- How has the academic progression of students advanced over the past decade and which NCAA D1 schools stand out in this measurement?
A description of the research topic along with a concise statement of your hypotheses on this topic.
To measure this, we can identify trends of APR per team and per school with APR being the APR system that includes rewards for superior academic performance and penalties for teams that do not achieve certain academic benchmarks. Data are collected annually, and results are announced in the spring. This datapoint will likely have a strong correlation to graduation rates which we can double check with overlapped graphs. By graphing APR rates over the years for each different school, we can answer our research questions.
I believe that Cornell University will have the highest rates of student-athlete academic success because it is the best NCAA D1 school.
Identify the types of variables in your research question. Categorical? Quantitative?
“DATA_TAB_GENERALINFO”, Categorical
“SCL_UNITID”, Categorical
“SCL_NAME”, Categorical
“SPORT_CODE”, Categorical
“SPORT_NAME”, Categorical
“ACADEMIC_YEAR”, Categorical
“SCL_DIV_19”, Categorical
“SCL_SUB_19”, Categorical
“D1_FB_CONF_19”, Categorical
“CONFNAME_19”, Categorical
“SCL_HBCU”, Categorical
“SCL_PRIVATE”, Categorical
“DATA_TAB_MULTIYRRATE”, Quantitative
“MULTIYR_APR_RATE_1000_RAW”, Quantitative
“MULTIYR_APR_RATE_1000_CI”, Quantitative
“MULTIYR_APR_RATE_1000_OFFICIAL”, Quantitative
“RAW_OR_CI”, Quantitative
“MULTIYR_SQUAD_SIZE”, Quantitative
“MULTIYR_ELIG_RATE”, Quantitative
“MULTIYR_RET_RATE”, Quantitative
“DATA_TAB_ANNUALRATE”, Quantitative
“APR_RATE_2019_1000”, Quantitative
“ELIG_RATE_2019”, Quantitative
“RET_RATE_2019”, Quantitative
“NUM_OF_ATHLETES_2019”, Quantitative
All award data is Quantitative
Glimpse of data
<- read_csv("data/2020RES_APR2019PubDataShare.csv") ncaa_data
Rows: 6017 Columns: 101
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): SCL_NAME, SPORT_NAME, D1_FB_CONF_19, CONFNAME_19, SCL_HBCU, SCL_PR...
dbl (90): SCL_UNITID, SPORT_CODE, ACADEMIC_YEAR, SCL_DIV_19, SCL_SUB_19, MUL...
lgl (4): DATA_TAB_GENERALINFO, DATA_TAB_MULTIYRRATE, DATA_TAB_ANNUALRATE, D...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
::skim(ncaa_data) skimr
Name | ncaa_data |
Number of rows | 6017 |
Number of columns | 101 |
_______________________ | |
Column type frequency: | |
character | 7 |
logical | 4 |
numeric | 90 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
SCL_NAME | 0 | 1.00 | 11 | 58 | 0 | 385 | 0 |
SPORT_NAME | 0 | 1.00 | 8 | 28 | 0 | 37 | 0 |
D1_FB_CONF_19 | 1578 | 0.74 | 11 | 35 | 0 | 24 | 0 |
CONFNAME_19 | 0 | 1.00 | 14 | 49 | 0 | 45 | 0 |
SCL_HBCU | 0 | 1.00 | 1 | 1 | 0 | 2 | 0 |
SCL_PRIVATE | 0 | 1.00 | 1 | 1 | 0 | 2 | 0 |
RAW_OR_CI | 0 | 1.00 | 2 | 3 | 0 | 2 | 0 |
Variable type: logical
skim_variable | n_missing | complete_rate | mean | count |
---|---|---|---|---|
DATA_TAB_GENERALINFO | 6017 | 0 | NaN | : |
DATA_TAB_MULTIYRRATE | 6017 | 0 | NaN | : |
DATA_TAB_ANNUALRATE | 6017 | 0 | NaN | : |
DATATAB_PUBLICAWARD | 6017 | 0 | NaN | : |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
SCL_UNITID | 0 | 1.00 | 179749.28 | 45347.46 | 100654.00 | 145637.00 | 185572.00 | 215770.00 | 486840 | ▇▇▁▁▁ |
SPORT_CODE | 0 | 1.00 | 18.48 | 11.59 | 1.00 | 6.00 | 18.00 | 30.00 | 37 | ▇▅▅▅▇ |
ACADEMIC_YEAR | 0 | 1.00 | 2019.00 | 0.00 | 2019.00 | 2019.00 | 2019.00 | 2019.00 | 2019 | ▁▁▇▁▁ |
SCL_DIV_19 | 0 | 1.00 | 1.01 | 0.13 | 1.00 | 1.00 | 1.00 | 1.00 | 3 | ▇▁▁▁▁ |
SCL_SUB_19 | 0 | 1.00 | 1.85 | 0.81 | 0.00 | 1.00 | 2.00 | 3.00 | 3 | ▁▇▁▇▅ |
MULTIYR_APR_RATE_1000_RAW | 15 | 1.00 | 983.44 | 16.84 | 810.00 | 975.00 | 988.00 | 996.00 | 1000 | ▁▁▁▁▇ |
MULTIYR_APR_RATE_1000_CI | 15 | 1.00 | 991.95 | 10.65 | 858.00 | 989.00 | 996.00 | 999.00 | 1000 | ▁▁▁▁▇ |
MULTIYR_APR_RATE_1000_OFFICIAL | 15 | 1.00 | 984.48 | 16.17 | 810.00 | 977.00 | 989.00 | 997.00 | 1000 | ▁▁▁▁▇ |
MULTIYR_SQUAD_SIZE | 15 | 1.00 | 78.29 | 66.83 | 4.00 | 38.00 | 57.00 | 97.00 | 445 | ▇▂▁▁▁ |
MULTIYR_ELIG_RATE | 15 | 1.00 | 0.99 | 0.02 | 0.74 | 0.98 | 0.99 | 1.00 | 1 | ▁▁▁▁▇ |
MULTIYR_RET_RATE | 15 | 1.00 | 0.98 | 0.02 | 0.84 | 0.97 | 0.98 | 0.99 | 1 | ▁▁▁▂▇ |
APR_RATE_2019_1000 | 84 | 0.99 | 982.98 | 25.35 | 714.00 | 972.00 | 1000.00 | 1000.00 | 1000 | ▁▁▁▁▇ |
ELIG_RATE_2019 | 84 | 0.99 | 0.99 | 0.03 | 0.75 | 0.98 | 1.00 | 1.00 | 1 | ▁▁▁▁▇ |
RET_RATE_2019 | 84 | 0.99 | 0.98 | 0.03 | 0.57 | 0.96 | 1.00 | 1.00 | 1 | ▁▁▁▁▇ |
NUM_OF_ATHLETES_2019 | 84 | 0.99 | 20.21 | 17.07 | 4.00 | 10.00 | 15.00 | 25.00 | 116 | ▇▂▁▁▁ |
APR_RATE_2018_1000 | 129 | 0.98 | 982.74 | 25.34 | 684.00 | 972.00 | 1000.00 | 1000.00 | 1000 | ▁▁▁▁▇ |
ELIG_RATE_2018 | 129 | 0.98 | 0.99 | 0.03 | 0.71 | 0.98 | 1.00 | 1.00 | 1 | ▁▁▁▁▇ |
RET_RATE_2018 | 129 | 0.98 | 0.98 | 0.03 | 0.56 | 0.96 | 1.00 | 1.00 | 1 | ▁▁▁▁▇ |
NUM_OF_ATHLETES_2018 | 129 | 0.98 | 20.07 | 16.98 | 4.00 | 10.00 | 15.00 | 24.00 | 163 | ▇▁▁▁▁ |
APR_RATE_2017_1000 | 159 | 0.97 | 983.06 | 25.30 | 773.00 | 973.00 | 1000.00 | 1000.00 | 1000 | ▁▁▁▁▇ |
ELIG_RATE_2017 | 159 | 0.97 | 0.99 | 0.03 | 0.73 | 0.98 | 1.00 | 1.00 | 1 | ▁▁▁▁▇ |
RET_RATE_2017 | 159 | 0.97 | 0.98 | 0.03 | 0.73 | 0.96 | 1.00 | 1.00 | 1 | ▁▁▁▁▇ |
NUM_OF_ATHLETES_2017 | 159 | 0.97 | 19.84 | 16.68 | 4.00 | 10.00 | 15.00 | 24.00 | 113 | ▇▂▁▁▁ |
APR_RATE_2016_1000 | 194 | 0.97 | 982.67 | 25.60 | 667.00 | 972.00 | 1000.00 | 1000.00 | 1000 | ▁▁▁▁▇ |
ELIG_RATE_2016 | 194 | 0.97 | 0.98 | 0.03 | 0.60 | 0.98 | 1.00 | 1.00 | 1 | ▁▁▁▁▇ |
RET_RATE_2016 | 194 | 0.97 | 0.98 | 0.03 | 0.44 | 0.96 | 1.00 | 1.00 | 1 | ▁▁▁▁▇ |
NUM_OF_ATHLETES_2016 | 194 | 0.97 | 19.77 | 16.70 | 4.00 | 10.00 | 15.00 | 24.00 | 115 | ▇▂▁▁▁ |
APR_RATE_2015_1000 | 238 | 0.96 | 981.36 | 27.76 | 682.00 | 971.00 | 993.00 | 1000.00 | 1000 | ▁▁▁▁▇ |
ELIG_RATE_2015 | 238 | 0.96 | 0.98 | 0.03 | 0.56 | 0.98 | 1.00 | 1.00 | 1 | ▁▁▁▁▇ |
RET_RATE_2015 | 238 | 0.96 | 0.98 | 0.03 | 0.67 | 0.96 | 1.00 | 1.00 | 1 | ▁▁▁▁▇ |
NUM_OF_ATHLETES_2015 | 238 | 0.96 | 19.66 | 16.57 | 4.00 | 10.00 | 15.00 | 23.50 | 113 | ▇▂▁▁▁ |
APR_RATE_2014_1000 | 253 | 0.96 | 980.63 | 28.76 | 667.00 | 969.00 | 992.00 | 1000.00 | 1000 | ▁▁▁▁▇ |
ELIG_RATE_2014 | 253 | 0.96 | 0.98 | 0.04 | 0.50 | 0.97 | 1.00 | 1.00 | 1 | ▁▁▁▁▇ |
RET_RATE_2014 | 253 | 0.96 | 0.98 | 0.04 | 0.64 | 0.96 | 1.00 | 1.00 | 1 | ▁▁▁▁▇ |
NUM_OF_ATHLETES_2014 | 253 | 0.96 | 19.49 | 16.49 | 4.00 | 10.00 | 14.00 | 23.00 | 139 | ▇▁▁▁▁ |
APR_RATE_2013_1000 | 348 | 0.94 | 978.17 | 31.86 | 530.00 | 967.00 | 989.00 | 1000.00 | 1000 | ▁▁▁▁▇ |
ELIG_RATE_2013 | 348 | 0.94 | 0.98 | 0.04 | 0.24 | 0.97 | 1.00 | 1.00 | 1 | ▁▁▁▁▇ |
RET_RATE_2013 | 348 | 0.94 | 0.97 | 0.04 | 0.71 | 0.96 | 1.00 | 1.00 | 1 | ▁▁▁▂▇ |
NUM_OF_ATHLETES_2013 | 348 | 0.94 | 19.35 | 16.34 | 4.00 | 10.00 | 14.00 | 23.00 | 112 | ▇▂▁▁▁ |
APR_RATE_2012_1000 | 369 | 0.94 | 975.61 | 35.97 | 472.00 | 962.00 | 986.00 | 1000.00 | 1000 | ▁▁▁▁▇ |
ELIG_RATE_2012 | 369 | 0.94 | 0.98 | 0.05 | 0.00 | 0.96 | 1.00 | 1.00 | 1 | ▁▁▁▁▇ |
RET_RATE_2012 | 369 | 0.94 | 0.97 | 0.04 | 0.67 | 0.95 | 0.98 | 1.00 | 1 | ▁▁▁▂▇ |
NUM_OF_ATHLETES_2012 | 369 | 0.94 | 19.15 | 16.30 | 4.00 | 10.00 | 14.00 | 23.00 | 113 | ▇▂▁▁▁ |
APR_RATE_2011_1000 | 404 | 0.93 | 973.03 | 39.45 | 442.00 | 958.00 | 984.00 | 1000.00 | 1000 | ▁▁▁▁▇ |
ELIG_RATE_2011 | 404 | 0.93 | 0.97 | 0.06 | 0.00 | 0.96 | 1.00 | 1.00 | 1 | ▁▁▁▁▇ |
RET_RATE_2011 | 404 | 0.93 | 0.97 | 0.04 | 0.68 | 0.95 | 0.98 | 1.00 | 1 | ▁▁▁▂▇ |
NUM_OF_ATHLETES_2011 | 404 | 0.93 | 19.02 | 16.13 | 4.00 | 10.00 | 14.00 | 23.00 | 114 | ▇▁▁▁▁ |
APR_RATE_2010_1000 | 417 | 0.93 | 972.14 | 42.12 | 380.00 | 959.00 | 984.00 | 1000.00 | 1000 | ▁▁▁▁▇ |
ELIG_RATE_2010 | 417 | 0.93 | 0.97 | 0.06 | 0.00 | 0.96 | 1.00 | 1.00 | 1 | ▁▁▁▁▇ |
RET_RATE_2010 | 417 | 0.93 | 0.97 | 0.04 | 0.67 | 0.95 | 0.98 | 1.00 | 1 | ▁▁▁▂▇ |
NUM_OF_ATHLETES_2010 | 417 | 0.93 | 18.90 | 16.09 | 4.00 | 10.00 | 14.00 | 23.00 | 120 | ▇▁▁▁▁ |
APR_RATE_2009_1000 | 470 | 0.92 | 972.33 | 36.58 | 667.00 | 958.00 | 983.00 | 1000.00 | 1000 | ▁▁▁▁▇ |
ELIG_RATE_2009 | 470 | 0.92 | 0.97 | 0.05 | 0.56 | 0.96 | 1.00 | 1.00 | 1 | ▁▁▁▁▇ |
RET_RATE_2009 | 470 | 0.92 | 0.97 | 0.04 | 0.56 | 0.95 | 0.98 | 1.00 | 1 | ▁▁▁▁▇ |
NUM_OF_ATHLETES_2009 | 470 | 0.92 | 18.79 | 15.96 | 4.00 | 10.00 | 14.00 | 22.00 | 120 | ▇▁▁▁▁ |
APR_RATE_2008_1000 | 576 | 0.90 | 970.43 | 36.92 | 643.00 | 955.00 | 980.00 | 1000.00 | 1000 | ▁▁▁▁▇ |
ELIG_RATE_2008 | 576 | 0.90 | 0.97 | 0.05 | 0.46 | 0.95 | 1.00 | 1.00 | 1 | ▁▁▁▁▇ |
RET_RATE_2008 | 576 | 0.90 | 0.97 | 0.04 | 0.68 | 0.94 | 0.98 | 1.00 | 1 | ▁▁▁▂▇ |
NUM_OF_ATHLETES_2008 | 576 | 0.90 | 18.82 | 16.04 | 4.00 | 10.00 | 14.00 | 22.00 | 125 | ▇▁▁▁▁ |
APR_RATE_2007_1000 | 644 | 0.89 | 963.51 | 40.90 | 615.00 | 944.00 | 974.00 | 1000.00 | 1000 | ▁▁▁▁▇ |
ELIG_RATE_2007 | 644 | 0.89 | 0.97 | 0.05 | 0.33 | 0.95 | 1.00 | 1.00 | 1 | ▁▁▁▁▇ |
RET_RATE_2007 | 644 | 0.89 | 0.96 | 0.05 | 0.57 | 0.93 | 0.97 | 1.00 | 1 | ▁▁▁▂▇ |
NUM_OF_ATHLETES_2007 | 644 | 0.89 | 18.65 | 15.98 | 4.00 | 10.00 | 14.00 | 22.00 | 128 | ▇▁▁▁▁ |
APR_RATE_2006_1000 | 711 | 0.88 | 961.10 | 42.40 | 643.00 | 940.00 | 971.00 | 1000.00 | 1000 | ▁▁▁▂▇ |
ELIG_RATE_2006 | 711 | 0.88 | 0.96 | 0.05 | 0.50 | 0.94 | 1.00 | 1.00 | 1 | ▁▁▁▁▇ |
RET_RATE_2006 | 711 | 0.88 | 0.95 | 0.05 | 0.63 | 0.93 | 0.96 | 1.00 | 1 | ▁▁▁▂▇ |
NUM_OF_ATHLETES_2006 | 711 | 0.88 | 18.66 | 16.38 | 4.00 | 10.00 | 14.00 | 21.00 | 209 | ▇▁▁▁▁ |
APR_RATE_2005_1000 | 867 | 0.86 | 960.51 | 42.80 | 600.00 | 940.00 | 971.00 | 1000.00 | 1000 | ▁▁▁▁▇ |
ELIG_RATE_2005 | 867 | 0.86 | 0.96 | 0.05 | 0.60 | 0.94 | 1.00 | 1.00 | 1 | ▁▁▁▁▇ |
RET_RATE_2005 | 867 | 0.86 | 0.95 | 0.05 | 0.57 | 0.93 | 0.96 | 1.00 | 1 | ▁▁▁▂▇ |
NUM_OF_ATHLETES_2005 | 867 | 0.86 | 18.55 | 16.38 | 4.00 | 10.00 | 14.00 | 21.00 | 179 | ▇▁▁▁▁ |
APR_RATE_2004_1000 | 900 | 0.85 | 960.59 | 43.28 | 611.00 | 939.00 | 971.00 | 1000.00 | 1000 | ▁▁▁▂▇ |
ELIG_RATE_2004 | 900 | 0.85 | 0.97 | 0.05 | 0.56 | 0.95 | 1.00 | 1.00 | 1 | ▁▁▁▁▇ |
RET_RATE_2004 | 900 | 0.85 | 0.95 | 0.05 | 0.61 | 0.93 | 0.96 | 1.00 | 1 | ▁▁▁▂▇ |
NUM_OF_ATHLETES_2004 | 900 | 0.85 | 18.27 | 16.12 | 4.00 | 9.00 | 14.00 | 21.00 | 123 | ▇▁▁▁▁ |
PUB_AWARD_20 | 41 | 0.99 | 0.23 | 0.42 | 0.00 | 0.00 | 0.00 | 0.00 | 1 | ▇▁▁▁▂ |
PUB_AWARD_19 | 58 | 0.99 | 0.22 | 0.42 | 0.00 | 0.00 | 0.00 | 0.00 | 1 | ▇▁▁▁▂ |
PUB_AWARD_18 | 125 | 0.98 | 0.22 | 0.41 | 0.00 | 0.00 | 0.00 | 0.00 | 1 | ▇▁▁▁▂ |
PUB_AWARD_17 | 152 | 0.97 | 0.20 | 0.40 | 0.00 | 0.00 | 0.00 | 0.00 | 1 | ▇▁▁▁▂ |
PUB_AWARD_16 | 4959 | 0.18 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1 | ▁▁▇▁▁ |
PUB_AWARD_15 | 4957 | 0.18 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1 | ▁▁▇▁▁ |
PUB_AWARD_14 | 5033 | 0.16 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1 | ▁▁▇▁▁ |
PUB_AWARD_13 | 5111 | 0.15 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1 | ▁▁▇▁▁ |
PUB_AWARD_12 | 5134 | 0.15 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1 | ▁▁▇▁▁ |
PUB_AWARD_11 | 5183 | 0.14 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1 | ▁▁▇▁▁ |
PUB_AWARD_10 | 5244 | 0.13 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1 | ▁▁▇▁▁ |
PUB_AWARD_09 | 5315 | 0.12 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1 | ▁▁▇▁▁ |
PUB_AWARD_08 | 5364 | 0.11 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1 | ▁▁▇▁▁ |
PUB_AWARD_07 | 5250 | 0.13 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1 | ▁▁▇▁▁ |
PUB_AWARD_06 | 5065 | 0.16 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1 | ▁▁▇▁▁ |
Data 2
Introduction and data
Identify the source of the data.
- The data was sourced from https://www.kaggle.com/datasets/open-powerlifting/powerlifting-database , which was in turn sourced from https://www.openpowerlifting.org/.
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
- The data is a snapshot of openpowerlifting.org as of April 2019. Contestants would send in their results from around the world based on full competitions.
Write a brief description of the observations.
- The observations are individual entrants and their various stats, which include their sex, age, weight, class, squat, bench, deadlift and the meet at which they achieved these records. The observations are validated by lifting federation results, which are official.
Research question
- A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
- Does bodyweight correlate to increased capacity for bench, deadlift, and squat exercises?
- Other research questions might encompass: do different powerlifting federations perform overall better than others.
- Do overall weightlifting records go up overtime since the first recorded one?
- Does sex have a impact on the amount of weight one can lift overall?
- Does location have any bearing on how heavy one can lift?
- Does having wraps or no wraps(wraps are wrist straps one uses to support weak wrists when lifting)affect the amount of weight lifting overall?
- A description of the research topic along with a concise statement of your hypotheses on this topic.
The research topic encompasses correlations between various attributes of an individual powerlifter, and their records for bench, deadlift, squat. Using data collected, I’ll try to find patterns and relationships between the data of lifters and their records.
I hypothesize that lifters with a higher body weight, tend to lift heavier weights. There are other hypotheses I wish to explore. For example, do different powerlifting federations overall perform better than others. This hypothesis is meant to determine if region plays a part in overall performance per region.
Finally, I want to explore if overall weight has trended upwards since the 1960s, because the data goes back to that time period.
Overall, in regards to my research question, I hypothesize that a higher bodyweight is correlated to a higher bench, squat, and deadlift.
- Identify the types of variables in your research question. Categorical? Quantitative?
- Categorical Variables
Lifter
Federation (lifting federation)
Place
Sex
Equip (refers to if the lifter used wraps or no wraps during the lift)
AgeClass
Division
WeightClassKg
MeetCountry
MeetState
MeetName
Date
Tested
- Quantitative Variables
Age
BodyweightKg
Squat1Kg
Squat2Kg
Squat3Kg
Squat4Kg
Beset3SquatKg
Bench1Kg
Bench2Kg
Bench3Kg
Bench4Kg
Best3BenchKg
Deadlift1Kg
Deadlift2Kg
Deadlift3Kg
Deadlift4Kg
Best3DeadliftKg
TotalKg
Wilks
McCulloch
Glossbrenner
IPFPoints
- Categorical Variables
Glimpse of data
# add code here
<- read.csv("data/openpowerlifting.csv")
openpowerlifting
::skim(openpowerlifting) skimr
Name | openpowerlifting |
Number of rows | 1423354 |
Number of columns | 37 |
_______________________ | |
Column type frequency: | |
character | 15 |
numeric | 22 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
Name | 0 | 1 | 1 | 45 | 0 | 412574 | 0 |
Sex | 0 | 1 | 1 | 1 | 0 | 2 | 0 |
Event | 0 | 1 | 1 | 3 | 0 | 7 | 0 |
Equipment | 0 | 1 | 3 | 10 | 0 | 5 | 0 |
AgeClass | 0 | 1 | 0 | 6 | 636554 | 17 | 0 |
Division | 0 | 1 | 0 | 48 | 8178 | 4843 | 0 |
WeightClassKg | 0 | 1 | 0 | 6 | 13312 | 225 | 0 |
Place | 0 | 1 | 1 | 3 | 0 | 124 | 0 |
Tested | 0 | 1 | 0 | 3 | 329462 | 2 | 0 |
Country | 0 | 1 | 0 | 24 | 1034470 | 177 | 0 |
Federation | 0 | 1 | 2 | 14 | 0 | 222 | 0 |
Date | 0 | 1 | 10 | 10 | 0 | 5367 | 0 |
MeetCountry | 0 | 1 | 2 | 22 | 0 | 96 | 0 |
MeetState | 0 | 1 | 0 | 3 | 481809 | 112 | 0 |
MeetName | 0 | 1 | 1 | 155 | 0 | 11599 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Age | 665827 | 0.53 | 31.50 | 13.37 | 0.00 | 21.00 | 28.00 | 40.00 | 97.00 | ▂▇▃▁▁ |
BodyweightKg | 16732 | 0.99 | 84.23 | 23.22 | 15.10 | 66.70 | 81.80 | 99.15 | 258.00 | ▂▇▁▁▁ |
Squat1Kg | 1085774 | 0.24 | 114.10 | 147.14 | -555.00 | 90.00 | 147.50 | 200.00 | 555.00 | ▁▂▃▇▁ |
Squat2Kg | 1090005 | 0.23 | 92.16 | 173.70 | -580.00 | 68.00 | 145.00 | 205.00 | 566.99 | ▁▂▂▇▁ |
Squat3Kg | 1099512 | 0.23 | 30.06 | 200.41 | -600.50 | -167.50 | 110.00 | 192.50 | 560.00 | ▁▅▂▇▁ |
Squat4Kg | 1419658 | 0.00 | 71.36 | 194.52 | -550.00 | -107.84 | 135.00 | 205.00 | 505.50 | ▁▂▁▇▁ |
Best3SquatKg | 391904 | 0.72 | 174.00 | 69.24 | -477.50 | 122.47 | 167.83 | 217.50 | 575.00 | ▁▁▆▇▁ |
Bench1Kg | 923575 | 0.35 | 83.89 | 105.20 | -480.00 | 57.50 | 105.00 | 145.00 | 467.50 | ▁▁▅▇▁ |
Bench2Kg | 929868 | 0.35 | 55.07 | 130.30 | -507.50 | -52.50 | 95.00 | 145.00 | 487.50 | ▁▂▅▇▁ |
Bench3Kg | 944869 | 0.34 | -18.52 | 144.23 | -575.00 | -140.00 | -60.00 | 117.50 | 478.54 | ▁▃▇▇▁ |
Bench4Kg | 1413849 | 0.01 | 24.85 | 165.63 | -500.00 | -127.50 | 77.50 | 157.50 | 487.61 | ▁▅▅▇▁ |
Best3BenchKg | 147173 | 0.90 | 116.54 | 54.84 | -522.50 | 74.84 | 111.13 | 150.00 | 488.50 | ▁▁▃▇▁ |
Deadlift1Kg | 1059810 | 0.26 | 162.70 | 108.68 | -461.00 | 125.00 | 180.00 | 226.80 | 450.00 | ▁▁▁▇▁ |
Deadlift2Kg | 1067331 | 0.25 | 130.23 | 162.68 | -470.00 | 115.00 | 177.50 | 230.00 | 460.40 | ▁▂▁▇▁ |
Deadlift3Kg | 1083407 | 0.24 | 13.00 | 215.05 | -587.50 | -210.00 | 117.50 | 205.00 | 457.50 | ▁▆▂▇▂ |
Deadlift4Kg | 1414108 | 0.01 | 78.91 | 192.61 | -461.00 | -110.00 | 145.15 | 210.00 | 418.00 | ▁▃▁▇▂ |
Best3DeadliftKg | 341546 | 0.76 | 187.26 | 62.33 | -410.00 | 138.35 | 185.00 | 230.00 | 585.00 | ▁▁▇▇▁ |
TotalKg | 110170 | 0.92 | 395.61 | 201.14 | 2.50 | 232.50 | 378.75 | 540.00 | 1367.50 | ▆▇▃▁▁ |
Wilks | 118947 | 0.92 | 288.22 | 123.18 | 1.47 | 197.90 | 305.20 | 374.56 | 779.38 | ▅▆▇▁▁ |
McCulloch | 119100 | 0.92 | 296.07 | 124.97 | 1.47 | 204.82 | 312.03 | 383.76 | 804.40 | ▅▆▇▁▁ |
Glossbrenner | 118947 | 0.92 | 271.85 | 117.56 | 1.41 | 182.81 | 285.94 | 355.28 | 742.96 | ▅▇▇▁▁ |
IPFPoints | 150068 | 0.89 | 485.43 | 113.35 | 2.16 | 402.86 | 478.05 | 559.70 | 1245.93 | ▁▇▆▁▁ |
Data 3
Introduction and data
Identify the source of the data.
Source: https://www.kaggle.com/datasets/jayrav13/olympic-track-field-results?resource=download
Author: Jay Ravaliya
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
- The data was originally collected from https://olympic.org/athletics using a basic python web-scraper around six years ago.
Write a brief description of the observations.
- Each row represents an athlete, with characteristics detailed across eight columns: gender (athlete gender), event (type of T&F race), location (Olympics host nation), year (Olympics year), medal (Gold, Silver, or Bronze medal), name (athlete name), nationality (athlete nationality), and results (athlete’s race time).
Address ethical concerns about the data, if any.
- No ethical concerns exist with this data.
Research question
A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
Is athlete (male and female) performance affected when competing at events hosted in their home country versus abroad?
This question is important because it looks into factors other than raw athletic ability and performance that may influence an Olympic athlete’s results.
Other potential research question ideas:
How has the average performance of athletes in a particular event changed over time?
What is the correlation between an athlete’s nationality and their chances of winning a gold medal?
A description of the research topic along with a concise statement of your hypotheses on this topic.
- This research topic looks at all Olympic Track and Field results from 1986-2016 to discover the effects of event location on the results of male and female athletes. Countless factors such as altitude, climate, and cultural differences can potentially affect athlete performance, and these factors are also subject to change across locations. Locations with conditions that an athlete is more familiar with or places that they feel a cultural/historical connection with may positively impact athletic performance. We thus hypothesize that athletes participating in events within their home nation perform better compared to those competing abroad.
Identify the types of variables in your research question. Categorical? Quantitative?
Variables to be utilized by main research question:
gender (categorical)
location (categorical)
nationality (categorical)
results (quantitative)
medal (categorical)
name (categorical)
Variables to be utilized by second research question:
results (quantitative)
medal (categorical)
name (categorical)
event (categorical)
year (quantitative)
Variables to be utilized by third research question:
name (categorical)
nationality (categorical)
medal (categorical)
Glimpse of data
# add code here
<- read_csv("data/results.csv") olympics
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
Rows: 2394 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): Gender, Event, Location, Medal, Name, Nationality, Result
dbl (1): Year
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
::skim(olympics) skimr
Name | olympics |
Number of rows | 2394 |
Number of columns | 8 |
_______________________ | |
Column type frequency: | |
character | 7 |
numeric | 1 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
Gender | 0 | 1 | 1 | 1 | 0 | 2 | 0 |
Event | 0 | 1 | 8 | 24 | 0 | 47 | 0 |
Location | 0 | 1 | 3 | 21 | 0 | 23 | 0 |
Medal | 0 | 1 | 1 | 1 | 0 | 3 | 0 |
Name | 0 | 1 | 4 | 30 | 0 | 1682 | 0 |
Nationality | 0 | 1 | 3 | 3 | 0 | 97 | 0 |
Result | 0 | 1 | 3 | 11 | 0 | 1951 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Year | 0 | 1 | 1970.38 | 34.71 | 1896 | 1948 | 1976 | 2000 | 2016 | ▃▃▅▅▇ |