Does body weight have a significant effect on the amount lifted in the bench press, deadlift, and squat exercises?
How does age affect total weight lifted in powerlifting competitions?
Hypothesis: Younger lifters will generally lift more weight than older lifters due to better physical condition and lower risk of injury.
Does the use of equipment (wraps vs. no wraps) have an impact on lifting performance in powerlifting competitions?
Hypothesis: Lifters who use wraps will generally be able to lift more weight due to increased stability and support, but this may also depend on the individual’s gender and chosen event.
Does the type of federation (lifting organization) have an impact on lifting performance in powerlifting competitions?
Hypothesis: Different lifting federations may have different rules, standards, and competition formats that could potentially impact lifting performance, but this may also depend on the individual lifter’s familiarity and comfort with the particular federation.
Data collection and cleaning
Have an initial draft of your data cleaning appendix. Document every step that takes your raw data file(s) and turns it into the analysis-ready data set that you would submit with your final project. Include text narrative describing your data collection (downloading, scraping, surveys, etc) and any additional data curation/cleaning (merging data frames, filtering, transformations of variables, etc). Include code for data curation/cleaning, but not collection.
First, import the tidyverse package. Then, import the powerlifting dataset using read_csv. Then, using the select function, pick out variables Age, BodyweightKg, TotalKg, Equipment, Event, Federation.
Steps for data collection and cleaning documented in appendicies.qmd.
The analysis-ready dataframe clean_life_data, has 1,432,354 observations or rows and six different attributes or columns. The six attributes are Age, BodyweightKg, TotalKg, Equipment, Event, and Federation. BodyweightKg and TotalKg are both measured in Kilograms because nearly every powerlifting competition even in the United States uses Kilograms as the measurement. TotalKg is the total kilograms each powerlifter lifted in the squat, bench, and deadlift events. Equipment is a variable that says whether or not the powerlifter used wraps when competing and if so, what kind. Event signifies whether the powerlifter competing in squat, bench, deadlift as denoted by ‘SBD’ or a single event instead.
This dataset was funded by the Open Powerlifting Project that is crowd funded to create a centralized and reliable dataset with all of the competitive powerlifting events and competitors. All of the data was taken from registered competition with multiple judges however there is always the chance that some data might be corrupted or misleading. Yet this is likely to be very minimal as there is lots of scrutiny into each of these events and there is very little money to be made except if a world record or something similar is on the line. The Open Powerlifting Project compiled all of the data and is constantly updated the dataset. The publish their data in a very legible manner with columns and rows complete with all the necessary variables and athletes.
Data limitations
Identify any potential problems with your dataset.
Limited Information on Participants: The data set does not include detailed information on participants, such as their training regimen, nutrition, or injury history, which may limit the ability to draw meaningful conclusions from the data. There are still many factors this data set does not account for who’s influences may impact the power lifter’s abilities and their results
Selection Bias: The data set may be biased towards individuals who compete in power lifting meets, and may not accurately represent the overall population of individuals who engage in power lifting. This data is collected by voluntary participation, contestants send in their results. What about all other power lifters who do not share results?
Incomplete Data: There are missing data points for some variables in the data set, we can find many *NA* values in the quantitative variables. This may make certain analyses difficult or impossible.
Exploratory data analysis
# Create scatter plot of Age vs TotalKg, with regression line and shaded confidence intervalggplot(clean_lift_data, aes(x = Age, y = TotalKg)) +geom_point(alpha =0.3, color ="blue") +geom_smooth(method ="lm", se =TRUE, color ="red") +labs(title ="Age vs Total Weight Lifted",x ="Age",y ="Total Weight Lifted (kg)")
`geom_smooth()` using formula = 'y ~ x'
# Create box plot of TotalKg by Age groupclean_lift_data |>mutate(age_group =cut(Age, breaks =seq(10, 80, by =10))) |>ggplot(aes(x = age_group, y = TotalKg)) +geom_boxplot() +labs(title ="Total Weight Lifted by Age Group",x ="Age Group",y ="Total Weight Lifted (kg)")
# Create histogram of Age distributionggplot(clean_lift_data, aes(x = Age)) +geom_histogram(binwidth =5, color ="white", fill ="blue") +labs(title ="Age Distribution",x ="Age",y ="Count of Individuals")
# Create scatter plot of Age vs TotalKg, with different point colors for equipment typeggplot(clean_lift_data, aes(x = Age, y = TotalKg, color = Equipment)) +geom_point(alpha =0.3) +labs(title ="Age vs Total Weight Lifted by Equipment Type",x ="Age",y ="Total Weight Lifted (kg)",color ="Equipment Type")
# Create box plot of TotalKg by Sex grouped by Equipment typeggplot(data = clean_lift_data, mapping =aes(x = Sex, y = TotalKg))+geom_boxplot()+facet_wrap(facets =vars(Equipment))+labs(title ="Total Weight vs Sex Lifted by Equipment Type",x ="Sex",y ="Total Weight Lifted (kg)" )
# Create scatter plot of Equipment vs TotalKg, with different point colors for Event typeggplot(data= clean_lift_data, mapping =aes(x = TotalKg, y = Equipment, color = Event))+geom_jitter(alpha =0.5)+labs(title ="Equipment vs Total Weight Lifted by Event Type",x ="Total Weight Lifted (kg)",y ="Equipment" )