# A tibble: 5 × 2
country count
<chr> <int>
1 United States 7972
2 Indonesia 1569
3 Puerto Rico 1412
4 Japan 974
5 Philippines 753
2023-12-09
Topic: Explore the occurrence of earthquakes in the US and globally and examine the relationship between the time, location, and magnitude of earthquakes that have taken place in the 2022 worldwide.
Motivation: Earthquake is one of the most devastating natural disasters on this planet, destroying properties and ending lives. It is important to understand how many earthquakes are happening each month and how powerful they are. Investigating the proposed question will allow us to have a better understanding of how earthquakes are influencing our lives.
This earthquakes dataset contains earthquakes that have magnitude greater than 2.5 in 2022. It contains 21740 rows, each representing an incident of earthquake and 17 columns that describes each incident. (For example, the date/time, location, magnitude, as well as other factors such as depth are all essential to answer our proposed question.)
Types of variables used: categorical variable, numerical variable, and date variables.
Data source: https://earthquake.usgs.gov/earthquakes/search
# A tibble: 5 × 2
country count
<chr> <int>
1 United States 7972
2 Indonesia 1569
3 Puerto Rico 1412
4 Japan 974
5 Philippines 753
The magnitude of most earthquakes are from 2-4.5.
The most earthquakes happen in Jan, followed by April and August.
# A tibble: 1 × 8
mtry min_n .metric .estimator mean n std_err .config
<int> <int> <chr> <chr> <dbl> <int> <dbl> <chr>
1 3 20 roc_auc hand_till 0.855 10 0.00355 Preprocessor1_Model06
Comparing to the null model (roc_auc = 0.5) and navie bayes (roc_auc = 0.7819), random forest out-performs other models
Therefore it’s selected as our classification model for used to predict earthquake damages in our app
Overfitting: train_auc (0.9868) > test_auc (0.8592)
Random Forest Confusion Matrix 

Conclusions
Future Work
More data
Visualizations
Modeling
Experiment with more models
Use more input variables