Factors that appear to influence card issuance and final result in EPL matches
Appendix to report
Data cleaning
# A tibble: 8,020 × 12
Div Date HomeTeam AwayTeam FTHG FTAG FTR Referee HY AY HR
<chr> <chr> <chr> <chr> <int> <int> <chr> <chr> <int> <int> <int>
1 E0 19/08/00 Charlton Man City 4 0 H Rob Ha… 1 2 0
2 E0 19/08/00 Chelsea West Ham 4 2 H Graham… 1 2 0
3 E0 19/08/00 Coventry Middles… 1 3 A Barry … 5 3 1
4 E0 19/08/00 Derby Southam… 2 2 D Andy D… 1 1 0
5 E0 19/08/00 Leeds Everton 2 0 H Dermot… 1 3 0
6 E0 19/08/00 Leicester Aston V… 0 0 D Mike R… 2 3 0
7 E0 19/08/00 Liverpool Bradford 1 0 H Paul D… 1 1 0
8 E0 19/08/00 Sunderla… Arsenal 1 0 H Steve … 3 1 0
9 E0 19/08/00 Tottenham Ipswich 3 1 H Alan W… 0 0 0
10 E0 20/08/00 Man Unit… Newcast… 2 0 H Steve … 0 1 0
# ℹ 8,010 more rows
# ℹ 1 more variable: AR <int>
We performed a data-cleaning process to prepare a comprehensive dataset of selected information for all English Premier League (EPL) matches within a specified time period, which can be used for further analysis.
First, we imported 22 CSV files that contain data for the EPL from the 2000-01 to the 2021-22 seasons.
Second, we narrowed down the dataset to only include specific columns relevant to our analysis, including Div, Date, HomeTeam, AwayTeam, FTHG (Full Time Home Goals), FTAG (Full Time Away Goals), FTR (Full Time Result), Referee, HY (Home Yellow Cards), AY (Away Yellow Cards), HR (Home Red Cards), and AR (Away Red Cards).
Third, we combined the selected data into one dataset using the rbind() function.
Fourth, we saved the resulting combined dataset to a new CSV file named “epl.csv”.
Finally, we displayed the combined dataset as a tibble using the tibble() function for reference.