Anemia in Women
Preregistration of analyses
Analysis #1
Our first analysis aims to respond to our research question of whether women from developing countries have higher rates of anemia than developed countries. The variables we will be using to analyze the relationship will primarily involve country classification (
country_classification
, i.e. whether the country is developing or developed) and mean hemoglobin levels (mean
).Given that one variable is categorical (
country_classification
) and the other is numerical (mean
), we intend to visualize the analysis using a bar plot to display all relevant country data in the analysis in order to compare mean hemoglobin levels across countries. Additional considerations for the visualization may include ordering of countries by mean hemoglobin level, distinguishing countries visually by classification, or faceting countries by classification and distinguishing individual countries visually.- Additional tidying of data may include summarizing different age/pregnancy status observations (descriptive statistics, which we will present in a table) and the use of
select()
,filter()
,na.omit()
so that our data contains clean and useful values
- Additional tidying of data may include summarizing different age/pregnancy status observations (descriptive statistics, which we will present in a table) and the use of
We will also use a hypothesis test to help give an input on the statistics that we have observed.
Null Hypothesis: There is no difference between mean hemoglobin levels of women in developing countries compared to mean hemoglobin levels of women in developed countries (mean hemoglobin levels of women in developing countries = mean hemoglobin levels of women in developed countries)
\[ H_0 = m_{developing} - m_{developed} = 0 \]
Alternative Hypothesis: There is a difference between mean hemoglobin levels of women in developing countries compared to mean hemoglobin levels of women in developed countries (mean hemoglobin levels of women in developing countries ≠ mean hemoglobin levels of women in developed countries)
\[ H_A = m_{developing}-m_{developed} \neq 0 \]
Or: mean hemoglobin levels of women in developing countries are lower (<) than mean hemoglobin levels of women in developed countries
\[ H_{or} = m_{developing} < m_{developed} \]
We will use a significance level of 0.5 and a confidence interval of 95%.
In terms of analysis, we will be using an ANOVA test because we have a categorical variable with different groups (different countries) as our explanatory variable and we want to determine whether this measurement differs between groups.
Data and results will be reported and any specific limitations will be mentioned.
Analysis #2
We also want to investigate how pregnancy status affects anemia rates in women. We will be looking at the relationship between the pregnancy variable and the mean hemoglobin levels. We will be using the variables
mean
andpregnancy
within the dataset to analyze the relationship.Given that one variable is a binary categorical variable (
pregnancy
) and the other is a continuous variable (mean
). One visualization that we will implement is the boxplot, which we will present side by side with a graph of the means and the 95% confidence interval for the mean. This helps the audience to better view the skewness of the data. Furthermore, we will also decide on ranges and separate the individual means into different ranges. Afterwards, we could use another visualization such as a bar chart to show the distribution of the data for both thepregnant
andnonpregnant
status and facet wrap it or place it side by side so it’s easy to compare.Additional tidying of data may include summarizing pregnancy status and the mean values. In other words, we will be exploring the use of descriptive statistics (mean, maximum, minimum, etc.) to help summarize our data, which we will later implement into a table. We will also further clean and tidy our data through the use of
select()
,filter()
,na.omit()
so that our data contains clean and useful valuesWe will also use a hypothesis test to help give an input on the statistics that we have observed.
Null Hypothesis: There is no difference between mean hemoglobin levels between pregnant women and non-pregnant women.
Alternative Hypothesis: There is a difference between mean hemoglobin levels between pregnant women and non-pregnant women.
\[ H_0 = m_{pregnant} - m_{non-pregnant} = 0 \]
\[ H_a = m_{pregnant} - m_{non-pregnant} \neq 0 \]
We will use a significance level of 0.5 and a confidence interval of 95%.
In terms of analysis, we will be using a linear regression model since the dependent variable is a continuous variable.
Data and results will be reported and any specific limitations will be mentioned.