Storm Event Data Analysis

Report

Click Here for Link to Shiny.

Introduction

With Storm Data sourced from the National Weather Service (NWS), we seek to explore the relationship between storm types, geographical locations, and storm-related fatalities in the US to identify patterns, potential mitigation strategies, and historical trends in storm behavior. Specific questions include:

  • What storm types exhibit the highest fatality rates?

  • Which specific geographical locations (state) experience the most fatalities due to storms?

  • Is there a discernible pattern in fatalities and damages’ demographics (age) based on storm types or locations (zone type)?

  • How have the characteristics of storms, including their frequencies, evolved throughout history?

  • How have the begin time, end time, and duration time of storms influenced the number of fatalities?

  • How have the months of storms influenced the number of fatalities?

  • What are the potential strategies for minimizing storm-related fatalities and damages?
    (After all questions are answered)

With this project, we will visualize storm characteristics and attribute correlations, help uncover patterns/trends through historical data, offer data to support mitigation/awareness recommendations for future planning practices, and present findings through data visualizations and reports.

Justification of approach

Data description

Storm Data, from the National Weather Service offers comprehensive statistics on injuries and damage estimates for U.S. weather incidents from 1950 to the present. The NCDC Storm Event database categorizes various storms by type, state, and date. The NWS uses various data sources to enhance weather monitoring and forecasting. We selected representative years (2003 to 2023) and merged 42 datasets (21 Storm Events dataset and 21 Fatalities dataset), excluding prior years with missing entries and CSV errors in Storm Event Location data.

The full description of the storm data can be accessed in the PDF at this link (codebook also in the data folder): (https://www.ncei.noaa.gov/pub/data/swdi/stormevents/csvfiles/Storm-Data-Bulk-csv-Format.pdf).

The cleaned data includes 21 columns; each row represents a single fatality and includes personal details, relevant storm events, and episode information. This information is linked through the following ID numbers:

  • Fatality_ID, Event_ID, Episode_ID: These ID numbers merge the datasets. An episode may contain multiple storm events and is defined as the occurrence of storms and significant weather phenomena that can lead to loss of life, injuries, property damage, or disruption to commerce.

The time-related data includes:

  • Begin & end date time, CZ timezone: Specifies the storm event’s start and end date and time in the corresponding time zones.

  • Year, month: Indicates the year and month of the event.

The fatality-related data includes:

  • FATALITY_TYPE: Indicates whether the fatality is direct or indirect (D or I).

  • FATALITY_AGE: Numeric age of the person with the fatality.

  • FATALITY_SEX: Biological sex of the person with the fatality.

  • FATALITY_LOCATION: Specifies the location where the fatality occurred, including various categories such as Ball Field, Boating, Business, Camping, Church, Heavy Equipment/Construction, Golfing, In Water, Long Span Roof, Mobile/Trailer Home, Other/Unknown, Outside/Open Areas, Permanent Home, Permanent Structure, School, Telephone, Under Tree.

The event-related data includes:

  • STATE: The state in which the storm occurred.

  • Event type & CZ name & type: Describes the storm event that led to fatalities, injuries, damage, etc. It provides a detailed breakdown with corresponding designators, such as county or zone

  • Injuries direct & indirect, deaths direct & indirect: Records the number of direct and indirect injuries and deaths associated with the storm event.

The mixture of fatality and event-related variables in the dataset enables researchers to make correlations and draw inferences, especially when analyzing data over time, to answer specific questions related to storm events.

Design process

We aim to analyze our extensive dataset by employing various visualizations to establish connections between different data sets. To achieve this, we have integrated interactive features into the dashboard, facilitating diverse graph visualizations.

In the final dashboard design, we aim to present graphs addressing six key questions identified during the project’s exploration phase. Each question will be accompanied by a set of data and a graph providing additional insights. A dedicated map tab will also showcase fatality-related visualizations based on US states. The dashboard features a sidebar, allowing users to dynamically modify parameters, such as the year, storm-related variables, fatality parameters, and specific time and date ranges. These adjustments will instantly reflect on the displayed graphs.

For question 4 specifically, since a static plot was not visually helpful in getting information, we used plotly to make the plot interactive by hovering on data points, hiding/showing storm types, zooming in/out into graphs, etc.

To enhance the overall aesthetics of the final prototype, we will implement a simple design system encompassing a suitable color scheme and font family for the dashboard. The custom color used is #19bc9c. In addition, the website includes a toggle between light and dark themes and an interactive Pikachu gif on the welcome page where users can toggle between 5 gifs.

We considered using other themes such as Bootstrap themes for UI purposes, but in the end we decided that using simple themes with light/dark toggle would aesthetically be suitable and not distract from the visualization themselves.

Project Finding Synthesis

Our exploration of extensive storm data from the National Weather Service (NWS) has revealed crucial insights into storm types, geographical impacts, demographic patterns, historical trends, and the influence of storm characteristics on fatalities. By synthesizing these findings, we draw meaningful deductions contributing to a comprehensive understanding of storm-related dynamics.

Question-Based Findings

1. Fatality Rates Across Storm Types

Analysis shows distinct variations in fatality rates among different storm types. Notably, six storm types, including Coastal Flood, Marine Strong Wind, Hurricane (Typhoon), Heavy Snow, Dust Devil, and Cold/Wind Chill, exhibit a 100% fatality rate. Conversely, Hail records a 0% fatality rate. The significance of these rates is tempered by the influence of limited data on certain storm types, highlighting the importance of cautious interpretation.

2. Geographical Impact on Fatalities

Geographical location emerges as a pivotal factor influencing storm-related fatalities. Nevada, Oklahoma, and California stand out as the states with the highest recorded fatalities. This pattern is closely tied to the prevalence of wildfires, particularly in the southwest, emphasizing the regional nuances of storm impacts. Conversely, states like Alaska, Vermont, and the Virgin Islands report minimal fatalities, aligning with their unique environmental conditions.

3. Demographic Dynamics

Examining the demographics of storm-related fatalities unveils compelling patterns. Age-wise, individuals aged 36-53 and 72-90 experience the highest fatality rates. Males consistently record higher fatalities across age groups, except for individuals over 90, where females have more recorded fatalities. This intersection of age and gender dynamics underscores the multifaceted nature of storm-related risks.

4. Historical Trends in Storm Frequencies

An overview of storm frequencies throughout the years reveals notable spikes in specific periods. 2005 witnessed an unprecedented surge in Hurricane (Typhoon) instances, reaching a record-breaking 1,001 cases. Similarly, 2011 saw a significant increase in Tornado occurrences, totaling 559 instances. These historical trends offer valuable insights into the dynamic nature of storm occurrences over time.

5. Temporal Factors in Fatality Rates

Temporal factors, such as the beginning time, end time, and duration of storms, play a pivotal role in influencing fatality rates. Storms lasting between 0-10,000 seconds and those persisting beyond 100,000 seconds exhibit the highest fatalities, suggesting a correlation between storm duration and fatality rates. However, anomalies, such as fewer fatalities in storms lasting 50,000-100,000 seconds, prompt further exploration into the nuanced characteristics of specific storm types.

6. Seasonal Variations in Fatalities

A closer look at the months of storm occurrences highlights distinct seasonal patterns. Summer months, particularly July, May, and June, record the highest number of fatalities, while colder months experience lower fatality rates. This seasonality underscores the significance of considering weather conditions and patterns in assessing and preparing for storm-related risks.

Overall Insights and Recommendations

Our findings contribute to a nuanced understanding of storm-related dynamics and carry implications for mitigation and preparedness strategies. Recognizing the varying fatality rates across storm types and demographic groups allows for targeted interventions and awareness campaigns. The geographical concentration of fatalities emphasizes the need for region-specific preparedness measures, particularly in states with high storm-related impacts.

The historical trends and temporal factors underscore the dynamic nature of storms, necessitating adaptive strategies. Additionally, the seasonal variations in fatalities emphasize the importance of tailored preparedness efforts during peak months.

Implementing interactive visualizations, such as plotly graphs and state filters, enhances the accessibility of data exploration, enabling stakeholders to derive meaningful insights from the vast storm dataset. As we conclude this project, our synthesized findings provide a foundation for informed decision-making, risk assessment, and strategies to minimize storms’ impact on lives and property.

Limitations

Our primary constraint is the limited time and power available for dataset preparation. Given the extensive information volume and the original files’ size, we cannot incorporate additional timeframes into a single dataset. Consequently, our ability to draw historical inferences or unveil time-dependent patterns may be restricted. To address this, we have spread our data over two decades, encompassing 2003 to 2023, ensuring the inclusion of data from substantial periods.

Furthermore, the original dataset exhibits significant missing data for earlier years (before 2010s), with numerous columns containing missing information. Regrettably, some of these missing data would’ve been helpful for meaningful data analysis. Thus, we had to selectively retain only the most relevant columns, discarding many others with missing information. However, with the cleaned dataset, we’re confident we can create valuable and insightful visualizations.

Acknowledgments

To initiate the development of the Shiny template, we sought inspiration from existing references aligning with our dashboard vision. After reviewing several examples, we drew inspiration from a Shiny dashboard designed to explore extensive hospital data to optimize antimicrobial use. This reference shares similarities with our project, as it deals with a substantial dataset featuring adjustable parameters. Moreover, it incorporates a range of graphs catering to various user interests.

Given that this project provides accessible code for reference, we express our acknowledgment for its valuable contribution to the advancement of our project.

Reference link:https://shiny.posit.co/r/gallery/life-sciences/hospital-data-antimicrobial/