U.S. Airline Delays Analysis

Report

Introduction

Our motivation involves around analyzing the delays in different airlines in the United States. Our primary focus was on gathering information on how many times a particular airline has been delayed. Further, we analysed different reasons on why the airline is delayed and the frequency of delay because of that particular reason. This is achieved through the development of an interactive web application showcasing various graphs displaying the frequency of delays in each airline and also the reason for delay. In conclusion, our goal is comprehensively understand which airline usually showcases the highest frequency of delay and the reason behind the delay.

Justification of approach

Data description

The dataset used for the analysis of flight data in the United States is a comprehensive collection of statistics related to flights, delays, carriers, and airports. This analysis-ready dataset was originally collected from the U.S. Department of Transportation’s Bureau of Transportation Statistics (BTS) and is part of the CORGIS (Collection of Really Great, Interesting, and Situated Datasets) project. The dataset provides valuable insights into the aviation industry and its impact on passengers’ travel experiences.The dataset used for the analysis of flight data in the United States has undergone a series of cleaning and preprocessing steps to ensure it is analysis-ready. We did cleaning by standardizing column names to snake_case and reshaping the data for easier analysis, and we also reshaped the data by using the pivot_longer() function to make it easier to analyze. For the airport name, we are separating from one columns to three, for example, for “Atlanta, GA: Hartsfield-Jackson Atlanta International” the result will be “Atlanta”, “GA”, and “Hartsfield-Jackson Atlanta International”. There were no NA values in the dataset, so we did not need to remove any rows or columns. The dataset is now ready for analysis.

Design process

The project is an interactive user interface, designed to enhance user experience and engagement. This is achieved using a Shiny Application that allows users to navigate and select specific airlines they wish to focus on for their data visualization. This incorporates user experience, by increasing user constraints thereby tailoring to their needs. When users access the interface, a line chart displaying the percentage of flights delayed for a specific airline over the years is presented. They can select the airline using a drop down menu. By default, the airline selected when they enter the application is American Airlines. Upon making a selection, the interface will dynamically update to display visualizations pertinent to the chosen airline. There are other tabs that show other visualizations based on the airline chosen by the user. These visualizations range from graphs, scatter plots to pie charts, all aimed at providing insightful information to the user.

There is also a slider widget for a visualization where the user can hit the play button, and it would show different plots over the years from 2003 to 2016. They can also use the slider widget to select a specific year for which they want to see the visualization.

This pie chart shows the percentage of delayed flights by airline. We can see a pretty average distribution of delays across the airlines. However, we can observe that the airline with the highest percentage of delays is Hawaiian Airlines Inc. The airline with the lowest percentage of delays is Southwest Airlines Co. which only takes up 4.78% of the total delays.

The interface ensures that users can navigate through different pages and interact with the data easily with minimal technical knowledge. The entire process is seamless and is built to ensure effective interactivity for the user.

Design challenge:

The design challenge we faced involved using a really large dataset that made the application run slow. The visualizations took more than 5 seconds to load on the screen because of the high volume of dataset that we used. Apart from this, we also faced the challenge of learning how to use the Shiny Package in R, since half of our team members are not familiar with the package. To overcome this challenge, we used the documentation for the shiny package to understand how the application is rendered. Apart from that, we learnt from Youtube tutorials to understand how the data is rendered on screen.

We also experienced difficulty in creating some visualizations such as a bubble chart. We tried going over online resources to come up with a bubble chart visualization but it wasn’t how we expected it. Thus, we came up with other ideas to make the interface interactive as well as visually appealing.

Insights:

  • This pie chart shows the percentage of delayed flights by airline. We can see a pretty average distribution of delays across the airlines.

  • However, we can observe that the airline with the highest percentage of delays is Hawaiian Airlines Inc. The airline with the lowest percentage of delays is Southwest Airlines Co. which only takes up 4.78% of the total delays.

  • The scatter plot unveils a recurring trend, showcasing elevated flight delays during the summer months of June and July, as well as in December, irrespective of the airline.

  • Additionally, the line graph brings to light a compelling narrative — in 2007, there was a notable surge in flight delays, followed by a gradual decline in subsequent years.

  • These visualizations encapsulate just a glimpse of the invaluable insights awaiting your exploration in our Shiny app.

Limitations

  • The dataset could be inaccurate. The accuracy of the data is paramount to the success of the project. Any inaccuracies in the dataset, such as incorrect flight statistics or airport information, could lead to incorrect conclusions.
  • There will also be temporal limitations. The dataset only covers data from June 2003 to Jan 2016, and this temporal limitation may affect the generalizability of findings, especially when analyzing contemporary trends.
  • There will also be visual limitations. Since most of the columns in the data set are of numeric type, only some kind of visualizations would work while others won’t seem so visually appealing.

Conclusion and Future work

In conclusion, our exploration into the delays in different airlines across the United States has provided valuable insights into the aviation industry. Through the development of an interactive Shiny web application, we aimed to empower users with a user-friendly interface that facilitates the analysis of delay frequencies and reasons for various airlines. The visualizations presented within the application offer a comprehensive view of the patterns and trends in flight delays, allowing users to make informed decisions or draw meaningful conclusions.

there are avenues for future enhancements and expansions. Firstly, addressing the dataset’s temporal limitations by incorporating more recent data would provide a more up-to-date understanding of flight delays in the United States. This could involve regular updates to the dataset to capture evolving trends and patterns in the aviation industry.

Furthermore, there is room for refinement in the visualizations to accommodate a broader range of data types. Exploring different visualization techniques and incorporating diverse chart types could offer a more comprehensive and visually engaging representation of the data.

Collaboration with industry experts and stakeholders could also contribute to a deeper understanding of the factors influencing flight delays. Incorporating real-time data streams and predictive modeling could elevate the application’s capabilities, providing users with proactive insights into potential delays. We could further provide complex visualisations showcasing the relationship between the delays of airlines and the reason behind those delays.

In conclusion, our Shiny web application serves as a foundation for future endeavors, encouraging continuous exploration, improvement, and collaboration to unravel the complexities of airline delays in the United States.

Acknowledgments

We extend our gratitude to the U.S. Department of Transportation’s Bureau of Transportation Statistics for providing the dataset that forms the backbone of our analysis. Additionally, we acknowledge the invaluable contributions of the plotly and ggplot2 packages, instrumental in crafting visually appealing and informative representations of our data.

  1. Data Source:

    https://think.cs.vt.edu/corgis/csv/airlines/

  2. plotly package:

    https://plotly.com/r/

  3. ggplot package:

    https://ggplot2.tidyverse.org/