Project Strength
Appendix to report
Data cleaning
The original powerlifting data was sourced from https://www.openpowerlifting.org/ (a website that aggregates data from federation websites) and the original data set was downloaded as a csv file from https://www.kaggle.com/datasets/open-powerlifting/powerlifting-database.
This data set was imported using the read_csv() function.
openpowerlifting <- read.csv("data/openpowerlifting.csv")
This data set was then cleaned, creating a new data set called “clean_lift_data”.
The data was already in a relatively intuitive format and was simply cleaned using base or dplyr functions such as:
clean_lift_data <- openpowerlifting |>
- select() - to select only variables:
select(Age, BodyweightKg, TotalKg, Equipment, Event, Federation, Sex) |>
- Age - the age of the powerlifter who the record belongs to
BodyweightKg - the bodyweight of the powerlifter, in Kg
TotalKg - the total weight, in Kg, lifted by the powerlifter,
Equipment - the type of equipment the powerlifter used
Event - the type of powerlifting event
Federation - the powerlifter’s chosen federation
Sex - the powerlifter’s sex
mutate() - to create a new variable:
mutate(age_group = cut(Age, breaks = seq(10, 80, by = 10))) |>
- age_group - groups ages by 10 from 10-80
- filter() - to select only “wraps” and “raw” for the Equipment variable & only “SBD” for the Event variable
filter(Equipment == "Wraps" | Equipment == "Raw") |>
filter(Event == "SBD")
- na.omit() to remove rows with missing values
clean_lift_data <- na.omit(clean_lift_data)
Thus, the clean and analysis-ready data set contained 283,018 observations (rows) and the eight attributes (columns) Age, BodyweightKg, TotalKg, Equipment, Event, and Federation, Sex, and age_group.