── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.2 ✔ purrr 1.0.0
✔ tibble 3.2.1 ✔ dplyr 1.1.2
✔ tidyr 1.2.1 ✔ stringr 1.5.0
✔ readr 2.1.3 ✔ forcats 0.5.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
── Attaching packages ────────────────────────────────────── tidymodels 1.0.0 ──
✔ broom 1.0.2 ✔ rsample 1.1.1
✔ dials 1.1.0 ✔ tune 1.1.1
✔ infer 1.0.4 ✔ workflows 1.1.2
✔ modeldata 1.0.1 ✔ workflowsets 1.0.0
✔ parsnip 1.0.3 ✔ yardstick 1.1.0
✔ recipes 1.0.6
── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ scales::discard() masks purrr::discard()
✖ dplyr::filter() masks stats::filter()
✖ recipes::fixed() masks stringr::fixed()
✖ dplyr::lag() masks stats::lag()
✖ yardstick::spec() masks readr::spec()
✖ recipes::step() masks stats::step()
• Learn how to get started at https://www.tidymodels.org/start/
Loading required package: airports
Loading required package: cherryblossom
Loading required package: usdata
Attaching package: 'openintro'
The following object is masked from 'package:modeldata':
ames
library (skimr)
library (scales)
coffee <- read.csv ("data/coffee.csv" )
coffee <- coffee |>
select (Location.Country, Location.Region, Year, Data.Type.Species, Data.Scores.Aroma, Data.Scores.Flavor, Data.Scores.Aftertaste, Data.Scores.Acidity, Data.Scores.Balance, Data.Scores.Sweetness, Data.Scores.Moisture, Data.Scores.Total) |>
rename (country = Location.Country, region = Location.Region, year = Year, species = Data.Type.Species, aroma_score = Data.Scores.Aroma, flavor_score = Data.Scores.Flavor, aftertaste_score = Data.Scores.Aftertaste, acidity_score = Data.Scores.Acidity, balance_score = Data.Scores.Balance, sweetness_score = Data.Scores.Sweetness, moisture_score = Data.Scores.Moisture, total_score = Data.Scores.Total)
coffee <- coffee |>
filter (aroma_score != 0 )
#possible research question: How does coffee's aroma, acidity, and total score depend on their original country and region (continent)?
glimpse (coffee)
Rows: 988
Columns: 12
$ country <chr> "United States", "Brazil", "Brazil", "Ethiopia", "Eth…
$ region <chr> "kona", "sul de minas - carmo de minas", "sul de mina…
$ year <int> 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010,…
$ species <chr> "Arabica", "Arabica", "Arabica", "Arabica", "Arabica"…
$ aroma_score <dbl> 8.25, 8.17, 8.42, 7.67, 7.58, 7.50, 7.67, 7.25, 7.42,…
$ flavor_score <dbl> 8.42, 7.92, 7.92, 8.00, 7.83, 7.92, 7.58, 7.25, 7.42,…
$ aftertaste_score <dbl> 8.08, 7.92, 8.00, 7.83, 7.58, 7.42, 7.50, 7.25, 7.50,…
$ acidity_score <dbl> 7.75, 7.75, 7.75, 8.00, 8.00, 7.67, 7.58, 7.33, 7.92,…
$ balance_score <dbl> 7.83, 8.00, 8.00, 7.83, 7.50, 7.58, 7.58, 8.00, 7.58,…
$ sweetness_score <dbl> 10.00, 10.00, 10.00, 10.00, 10.00, 10.00, 10.00, 10.0…
$ moisture_score <dbl> 0.00, 0.08, 0.01, 0.00, 0.10, 0.01, 0.00, 0.10, 0.05,…
$ total_score <dbl> 86.25, 86.17, 86.17, 85.08, 83.83, 83.42, 83.08, 80.3…
Asia = c ("China" , "India" , "Indonesia" , "Laos" , "Philippines" , "Taiwan" , "Thailand" , "Vietnam" , "Papua New Guinea" , "Myanmar" )
North_America = c ("United States" , "Mexico" )
Central_America = c ("Haiti" , "Honduras" , "Nicaragua" , "Panama" , "Costa Rica" , "El Salvador" , "Guatemala" )
South_America = c ("Brazil" , "Colombia" , "Ecuador" , "Peru" )
Africa = c ("Burundi" , "Cote d?Ivoire" , "Ethiopia" , "Kenya" , "Malawi" , "Rwanda" , "Tanzania, United Republic Of" , "Uganda" , "Zambia" )
coffee_country <- coffee |>
group_by (country) |>
mutate (continent = case_when (
country %in% Asia ~ "Asia" ,
country %in% Africa ~ "Africa" ,
country %in% North_America ~ "North America" ,
country %in% Central_America ~ "Central America" ,
country %in% South_America ~ "South America"
))
coffee_country
# A tibble: 988 × 13
# Groups: country [32]
country region year species aroma_score flavor_score aftertaste_score
<chr> <chr> <int> <chr> <dbl> <dbl> <dbl>
1 United States kona 2010 Arabica 8.25 8.42 8.08
2 Brazil sul de… 2010 Arabica 8.17 7.92 7.92
3 Brazil sul de… 2010 Arabica 8.42 7.92 8
4 Ethiopia sidamo 2010 Arabica 7.67 8 7.83
5 Ethiopia sidamo 2010 Arabica 7.58 7.83 7.58
6 United States kona 2010 Arabica 7.5 7.92 7.42
7 Indonesia dolok … 2010 Arabica 7.67 7.58 7.5
8 Ethiopia kelem … 2010 Arabica 7.25 7.25 7.25
9 Ethiopia limu 2010 Arabica 7.42 7.42 7.5
10 Haiti marmel… 2010 Arabica 6.92 6.75 7.08
# ℹ 978 more rows
# ℹ 6 more variables: acidity_score <dbl>, balance_score <dbl>,
# sweetness_score <dbl>, moisture_score <dbl>, total_score <dbl>,
# continent <chr>
write.csv (coffee, "data/new_coffee.csv" )
In the end, we created a new csv file of coffee.csv (new_coffee.csv) that is the updated and cleaned version of the original dataset.