INFO 2950 Final Project

Appendix to report

Data cleaning

Our data was obtained throuh CSV files from World Bank data. To clean our data to better suite our reasearch questions we started by obtaining the names of all CSV files in the “world_bank_data” directory and read them into a single data frame named world_bank_data using the read_csv() function from the readr package. To make the column names more meaningful, we utilized the colnames() function to rename the columns as ‘Country’, ‘Country Code’, ‘Observation’, ‘Obs Code’, followed by years from 2000 to 2020.

It’s important to note that to ensure that our calculations and analyses on this data are accurate and meaningful, we converted the columns containing numeric data to a numeric data type, using the mutate() and as.numeric() function. When importing data from a CSV file, some columns may be read in as characters, even if they contain numbers. For example, in our original data-set the year columns were named as “2018 [YR2018]“ which was interpreted as <chr>, a character string. Performing calculations or analyses on character strings can lead to errors and incorrect results.

By converting such columns to a numeric data type using the as.numeric() function, we are instructing R to treat these values as numeric data, and we now obtain year columns in the format of an <dbl> This ensures that calculations and analyses are performed accurately, and the results obtained are meaningful. The resulting cleaned and formatted data frame world_data contains meaningful column names and numeric data types, ready for further analysis. The changes mentioned above is what allowed us to create an analysis-ready data-set that we used to obtain our conclusions.

# A tibble: 6,370 × 25
   Country `Country Code` Observation     `Obs Code` `2000` `2001` `2002` `2003`
   <chr>   <chr>          <chr>           <chr>       <dbl>  <dbl>  <dbl>  <dbl>
 1 Angola  AGO            Maternal morta… SH.STA.MM… 827    766    690    628   
 2 Angola  AGO            Mortality rate… SP.DYN.AM… 360.   350.   348.   330.  
 3 Angola  AGO            Mortality rate… SP.DYN.AM… 469.   481.   455.   410.  
 4 Angola  AGO            Mortality rate… SP.DYN.IM… 122.   118.   115.   110.  
 5 Angola  AGO            Mortality rate… SP.DYN.IM… 112.   108.   105.   101.  
 6 Angola  AGO            Mortality rate… SP.DYN.IM… 132.   128.   124.   120.  
 7 Angola  AGO            Suicide mortal… SH.STA.SU…   8.7    8.6    8.6    8.8 
 8 Angola  AGO            Suicide mortal… SH.STA.SU…   3.3    3.2    3.4    3.6 
 9 Angola  AGO            Suicide mortal… SH.STA.SU…  14.2   14.1   13.8   13.9 
10 Angola  AGO            Military expen… MS.MIL.XP…   6.39   4.52   2.87   3.76
# ℹ 6,360 more rows
# ℹ 17 more variables: `2004` <dbl>, `2005` <dbl>, `2006` <dbl>, `2007` <dbl>,
#   `2008` <dbl>, `2009` <dbl>, `2010` <dbl>, `2011` <dbl>, `2012` <dbl>,
#   `2013` <dbl>, `2014` <dbl>, `2015` <dbl>, `2016` <dbl>, `2017` <dbl>,
#   `2018` <dbl>, `2019` <dbl>, `2020` <dbl>