How does the classification of a U.S. college affect the starting and mid-career earnings of graduates from that college? Does this vary across different regions of the U.S.?
For salaries_type_summary and salaries_region_summary, for all the non-N/A median salaries, we converted the values to numeric. Next, we grouped by type of school/region in the US respectively and created a summary row with a row for each type of school/region, with each column containing the mean of the median salaries at each career stage. For the school type summary, we pivoted the table, so each row represents the mean of the median salaries for all values with the same type of school and career stage. For the region summary, we did the same, except each row represents the mean of the median salaries for values with the schools in the same region and career stage.
Finally, we created a merged data set called salaries_joined containing a full_join of salaries_type and salaries_region, meaning that it has a row representing every school from both sets, providing N/As in columns where data isn’t complete.
Each row in the dataset represents a different year, from 1970 to 2011, and each column represents a college major. For each year, every column shows the percentage of women enrolled in a particular major.
Attributes:
year: year of observation
major: college major of observation
pct_female: percentage of students that are female in the observation
The dataset was created by the Department of Education Statistics who releases a dataset annually containing the percentage of bachelor’s degrees granted to women across a variety of categories of degrees.
Funding came from the Department of Education.
Unsure on which processes would have affected data observation and recording at this time.
Data was cleaned via kaggle users and further by our team.
Each row represents a university, the classification of the university, and summary statistics / percentiles on its respective earnings
Attributes
School name
School type (classification ie Party, Engineering, Liberal Arts, Ivy League, State
Starting Median Salary
Mid-Career Median Salary
Mid-Career 10th Percentile Salary
Mid-Career 25th Percentile Salary
Mid-Career 75th Percentile Salary
Mid-Career 90th Percentile Salary
Dataset was created to track general earnings for early and mid career professionals based on their undergraduate college and the classification of that college
Data was obtained by the Wall Street Journal based on data from Payscale, Inc.
Unsure on which processes would have affected data observation and recording at this time.
No preprocessing from any users, obtained directly from Kaggle
Each row represents a university, the geographical region of the university, and summary statistics / percentiles on its respective earnings
Attributes
School name
School region (geographical region within United States)
Starting Median Salary
Mid-Career Median Salary
Mid-Career 10th Percentile Salary
Mid-Career 25th Percentile Salary
Mid-Career 75th Percentile Salary
Mid-Career 90th Percentile Salary
Dataset was created to track general earnings for early and mid career professionals based on their undergraduate college and the geographical region of that college
Data was obtained by the Wall Street Journal based on data from Payscale, Inc.
Unsure on which processes would have affected data observation and recording at this time.
No preprocessing from any users, obtained directly from Kaggle
Data limitations
The schools represented by the salaries_type and salaries_region datasets don’t match up exactly, so there are some NA values when full joining them.
Exploratory data analysis
Perform an (initial) exploratory data analysis.
ggplot(data = college_women, mapping =aes(x = year, y = pct_female, col = major)) +geom_line()
ggplot(data = salaries_type_summary, mapping =aes(y = mean_salary, x = school_type, col = career_stage)) +geom_point()
ggplot(data = salaries_region_summary, mapping =aes(y = mean_salary, x = school_region, col = career_stage)) +geom_point()
Questions for reviewers
List specific questions for your peer reviewers and project mentor to answer in giving you feedback on this phase.
Is our research question refined and sophisticated enough to warrant good quality results?