Project Strength

Appendix to report

Data cleaning

The original powerlifting data was sourced from https://www.openpowerlifting.org/ (a website that aggregates data from federation websites) and the original data set was downloaded as a csv file from https://www.kaggle.com/datasets/open-powerlifting/powerlifting-database.

This data set was imported using the read_csv() function.

openpowerlifting <- read.csv("data/openpowerlifting.csv")

This data set was then cleaned, creating a new data set called “clean_lift_data”.

The data was already in a relatively intuitive format and was simply cleaned using base or dplyr functions such as:

clean_lift_data <- openpowerlifting |>

select() - to select only variables:

select(Age, BodyweightKg, TotalKg, Equipment, Event, Federation, Sex) |>

Age - the age of the powerlifter who the record belongs to

BodyweightKg - the bodyweight of the powerlifter, in Kg
TotalKg - the total weight, in Kg, lifted by the powerlifter,
Equipment - the type of equipment the powerlifter used
Event - the type of powerlifting event
Federation - the powerlifter’s chosen federation
Sex - the powerlifter’s sex
mutate() - to create a new variable:

mutate(age_group = cut(Age, breaks = seq(10, 80, by = 10))) |>

age_group - groups ages by 10 from 10-80
filter() - to select only “wraps” and “raw” for the Equipment variable & only “SBD” for the Event variable

filter(Equipment == "Wraps" | Equipment == "Raw") |>

filter(Event == "SBD")

na.omit() to remove rows with missing values

clean_lift_data <- na.omit(clean_lift_data)

Thus, the clean and analysis-ready data set contained 283,018 observations (rows) and the eight attributes (columns) Age, BodyweightKg, TotalKg, Equipment, Event, and Federation, Sex, and age_group.