Earthquake Damages

Project Squirtle
Yunjie Liu, Fiona Zheng, Yuhan Tan

2023-12-09

Introduce the topic and motivation

  • Topic: Explore the occurrence of earthquakes in the US and globally and examine the relationship between the time, location, and magnitude of earthquakes that have taken place in the 2022 worldwide.

  • Motivation: Earthquake is one of the most devastating natural disasters on this planet, destroying properties and ending lives. It is important to understand how many earthquakes are happening each month and how powerful they are. Investigating the proposed question will allow us to have a better understanding of how earthquakes are influencing our lives.

Introduce the data

This earthquakes dataset contains earthquakes that have magnitude greater than 2.5 in 2022. It contains 21740 rows, each representing an incident of earthquake and 17 columns that describes each incident. (For example, the date/time, location, magnitude, as well as other factors such as depth are all essential to answer our proposed question.)

  • Types of variables used: categorical variable, numerical variable, and date variables.

  • Data source: https://earthquake.usgs.gov/earthquakes/search

Highlights from EDA

  • United States has the most earthquakes in 2022, followed by Indonesia and Japan.
# A tibble: 5 × 2
  country       count
  <chr>         <int>
1 United States  7972
2 Indonesia      1569
3 Puerto Rico    1412
4 Japan           974
5 Philippines     753
  • The magnitude of most earthquakes are from 2-4.5.

  • The most earthquakes happen in Jan, followed by April and August.

Inference/modeling/other analysis

# A tibble: 1 × 8
   mtry min_n .metric .estimator  mean     n std_err .config              
  <int> <int> <chr>   <chr>      <dbl> <int>   <dbl> <chr>                
1     3    20 roc_auc hand_till  0.855    10 0.00355 Preprocessor1_Model06
  • Comparing to the null model (roc_auc = 0.5) and navie bayes (roc_auc = 0.7819), random forest out-performs other models

  • Therefore it’s selected as our classification model for used to predict earthquake damages in our app

  • Overfitting: train_auc (0.9868) > test_auc (0.8592)

Random Forest Confusion Matrix

Inference/modeling/other analysis

  • The variation of earthquake occurrence throughout the year of 2022 is small
  • This explains why the prediction model has little or no change when we input a different month (Access App)

Conclusions + future work

from: flaticon
  • Conclusions

    • There’s no strong correlation between location, time, magnitude, and frequency of earthquakes around the world.
  • Future Work

    • More data

    • Visualizations

      • More user-controlled interactions
    • Modeling

      • Experiment with more models

      • Use more input variables