Project Brilliant Togepi
Report
Introduction
We are examining a data set of divorces from the years 2000-2015. The data set is provided by the Mexican Government and is focused on divorces from the city Xalapa, Mexico. The data set has many variables that are used to characterize the divorce and the partners in the divorce. We plan to explore the relationships between the employment status of the two individuals in the divorce, the difference in education between partners and the effects of age-gaps and the number of divorces.
Data description
The data we are using has 41 columns and 4,900 rows representing divorces filed in Xalapa, Mexico over the years 2000-2015. The 41 columns characterize the divorce, when it was registered, the reason for the divorce, etc… The columns also characterize the partners in the divorce, including their ethnicity, age, and educational background. The columns that we are interested in are the ages of each individual’s age, income, and education level.
This data set was created by the Mexican marriage bureau and who keeps records of all of the divorce cases. The Mexican government does not explicitly say why they created and funded the data collection but it can infer that they are interested in the potential trends.
The data was created by the Mexican marriage bureau by scanning the divorce documents and noting all of the interesting attributes. Thus, the information listed is only what is provided by those getting the divorce, so there are a lot of missing values or values that may not be 100% accurate. Moreover, we only have data from divorces that went through the entire legal divorce process in Mexico, so we have no data on any undocumented divorces that may have occurred at the time.
Once the collection was done, the Mexican government compiled the data into a spreadsheet which we are using now.
It is not noted whether or not the people whose information is in the data set are aware of its usage.
Data analysis
Education Analysis
Income Analysis
Age Gap Analysis
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Evaluation of significance
Education Evaluation of Significance
mean_difference
1 0.4815653
# A tibble: 1 × 1
p_value
<dbl>
1 0
Income Evaluation of Significance
mean_difference
1 1707.504
# A tibble: 1 × 1
p_value
<dbl>
1 0.12
Age Gap Evaluation of Significance
mean_difference
1 4.065904
# A tibble: 1 × 1
p_value
<dbl>
1 1
Interpretation and conclusions
Data Analysis for Education Level and Divorce Rates:
\[ H_o: \mu = 0 \]
\[ H_A: \mu \gt 0 \]
From the p-value calculated from the evaluation above, we can conclude that the majority of divorces have some difference in the education level between divorcees. The average estimated gap in the education level between divorcees was around 0.48156. From the calculated p-value we can thus reject the null hypothesis , suggesting that there is indeed a difference between the average education level between divorcees.
Data Analysis for Incomes and Divorce Rates:
\[ H_o: \mu = 0 \]
\[ H_A: \mu \gt 0 \]
During our draft, we got a very low p value. We noticed that this was largely because we have some significant outliers that caused the population mean to shift to the right. Hence. for this anaylsis we did not consider the data points which represented an income difference of more that 10000 pesos. In this new analysis, we get a new p-value of 0.12 and a new average of about 1707.5 pesos. Thus, since we have a p value >> 0.05 we fail to reject the null hypothesis, there is no significant evidence that the mean of difference in income between divorced partners is greater than 0.
Data Analysis for Age Gaps and Divorce Rates:
\[ H_o: \mu = 5 \]
\[ H_A: \mu \gt 5 \]
From our calculated p-value of 1, we cannot conclude that most divorces have at least a 5-year age gap between partners. The average difference in age between partners is 4.066 years. The probability of observing this mean when the overall mean is 5 is 1. Thus, from our analysis of age gaps and divorce rates, we fail to reject the null hypothesis. There is not enough evidence to suggest that the mean difference in age between partners is greater than 5 years.
Limitations
One limitation of the data is that it does not include the actual marriage rates. We were unable to compare the divorce rates to any lasting marriages, so we can’t see if certain rates are noticeably high because there is a large number of marriages under those types of circumstances with a proportionally normal divorce rate, or if the divorce rates are proportionally higher compared to other circumstances. Thus, it is difficult to determine a correlation between those factors and the divorce rates.
Another limitation is that the data is limited to Xalapa, Mexico. This makes it difficult to generalize the data to other areas, even different cities within Mexico. Because of this, our conclusions are mostly restricted to Xalapa, Mexico.
Lastly, the dataset only includes official divorces, so separations aren’t in the data. This might make the divorce rates seem lower than they might actually be for certain factors.
Acknowledgments
The dataset was downloaded from Kaggle.com from a user named Anton, who collected a dataset from the Mexican government. He proceeded to translate the column headers, making the data set much more usable for us. Other than that, the analyses and such are our own works.