U.S. Airline Delays Analysis

Appendix to report

Data cleaning

In the data cleaning process, we began by loading the dataset from the “airlines.csv” file. A quick overview using the skimr package provided insights into the structure and content of the data. Next, we addressed column names, converting them to snake_case for consistency and removing unnecessary prefixes. Subsequently, we separated entries in the ‘carriers_names’ column, expanding rows to ensure each entry had a single carrier name. Simultaneously, we split the ‘airport_name’ column into ‘state,’ ‘country,’ and ‘airport_name’ columns. We also extracted relevant columns related to delay reasons and renamed columns appropriately. Finally, the cleaned dataset, named ‘airlines_clean,’ was saved to a new CSV file named “airlines_cleaned.csv” for further analysis.

Acknowledgment

This project was developed by a group of two individuals, and as such, certain considerations in the project scope may reflect the constraints of a smaller team. While we have endeavored to provide a comprehensive analysis, the size of our group may impact the depth and breadth of certain aspects of the project. We appreciate your understanding in this regard.