An Exploration of Arabica Coffee and its Attributes

Preregistration of analyses

Analysis #1

We want to investigate the relationship between flavor scores and aroma scores. Specifically, we are interested in how well the flavor score explains or predicts how aromatic the Arabica beans will be. Based on our initial exploratory data analysis, we expect that as flavor scores increase, aroma scores increase linearly. We should thus use the variables `data_scores_flavor` and `data_scores_aroma` to then fit a model predicting aroma from flavor. We can use our model to estimate the aroma score of coffee beans if their flavor score is 8. Since the slope we will calculate may not exactly equal the true slope of the relationship between flavor and aroma scores, we can use bootstrapping to construct a confidence interval to quantify the uncertainty around this estimate. We can also fit a model predicting the log of the aroma score from the flavor score.

Analysis #2

We wish to test how the mean scores of acidity in the Africa region compares to the mean scores of the Asia region. According to our exploratory data analysis, we noticed that in the Africa region, the mean is 7.71, while the Asian region is 7.54. This means that the acidity score for the African religion was greater by 0.17.

Null Hypothesis: There is no difference in the mean acidity score for the African and Asian regions.

\[ H_0: \mu_1 - \mu_2 = 0 \]

Alternative Hypothesis: There is a difference in the mean acidity score for the African and Asian regions.

\[ H_A: \mu_1 - \mu_2 \ne 0 \]

We can use simulation based methods to conduct the hypothesis test above. We will do this by first generating a null distribution. Then, we can calculate the p-value. If the p-value is less than the significance level of 0.05, we can reject the null hypothesis and conclude that there’s a statistically significant difference in mean acidity scores of the two regions. However, if the p-value is greater than the significance level, we would fail to reject the null hypothesis and understand there’s not enough information to support that difference.