Where does it pay to attend college?

How school region and school type impact salaries of graduates

Author

Team Impressive Hitmontop
Miles Ostroff, Mateo Rodriguez, Tina Li, Andrea Aranda, Pablo Herrera

Published

May 5, 2023

Topic and motivation

Project motivation: Investigating the earning levels of graduates from different college types and regions.

  • Salaries at career stages: starting salary, mid-career salary.

  • Regions: California, Midwestern, Northeastern, Southern, Western.

  • College Types: Engineering, Ivy League, Liberal Arts, Party, State.

Introducing the data

Data source: Wall Street Journal (obtained by Payscale, Inc.)

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ tibble  3.2.1     ✔ dplyr   1.1.2
✔ tidyr   1.2.1     ✔ stringr 1.5.0
✔ readr   2.1.3     ✔ forcats 0.5.2
✔ purrr   1.0.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
── Attaching packages ────────────────────────────────────── tidymodels 1.0.0 ──

✔ broom        1.0.2     ✔ rsample      1.1.1
✔ dials        1.1.0     ✔ tune         1.1.1
✔ infer        1.0.4     ✔ workflows    1.1.2
✔ modeldata    1.0.1     ✔ workflowsets 1.0.0
✔ parsnip      1.0.3     ✔ yardstick    1.1.0
✔ recipes      1.0.6     

── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ scales::discard() masks purrr::discard()
✖ dplyr::filter()   masks stats::filter()
✖ recipes::fixed()  masks stringr::fixed()
✖ dplyr::lag()      masks stats::lag()
✖ yardstick::spec() masks readr::spec()
✖ recipes::step()   masks stats::step()
• Use tidymodels_prefer() to resolve common conflicts.

Dataset 1: Salary information for colleges with college region

Rows: 320
Columns: 8
$ School.Name                       <chr> "Stanford University", "California I…
$ Region                            <chr> "California", "California", "Califor…
$ Starting.Median.Salary            <chr> "$70,400.00", "$75,500.00", "$71,800…
$ Mid.Career.Median.Salary          <chr> "$129,000.00", "$123,000.00", "$122,…
$ Mid.Career.10th.Percentile.Salary <chr> "$68,400.00", "N/A", "N/A", "$59,500…
$ Mid.Career.25th.Percentile.Salary <chr> "$93,100.00", "$104,000.00", "$96,00…
$ Mid.Career.75th.Percentile.Salary <chr> "$184,000.00", "$161,000.00", "$180,…
$ Mid.Career.90th.Percentile.Salary <chr> "$257,000.00", "N/A", "N/A", "$201,0…

Dataset 2: Salary information for colleges with college type

Rows: 269
Columns: 8
$ School.Name                       <chr> "Massachusetts Institute of Technolo…
$ School.Type                       <chr> "Engineering", "Engineering", "Engin…
$ Starting.Median.Salary            <chr> "$72,200.00", "$75,500.00", "$71,800…
$ Mid.Career.Median.Salary          <chr> "$126,000.00", "$123,000.00", "$122,…
$ Mid.Career.10th.Percentile.Salary <chr> "$76,800.00", "N/A", "N/A", "$66,800…
$ Mid.Career.25th.Percentile.Salary <chr> "$99,200.00", "$104,000.00", "$96,00…
$ Mid.Career.75th.Percentile.Salary <chr> "$168,000.00", "$161,000.00", "$180,…
$ Mid.Career.90th.Percentile.Salary <chr> "$220,000.00", "N/A", "N/A", "$190,0…

Highlights from EDA

Joining with `by = join_by(School.Name, Starting.Median.Salary,
Mid.Career.Median.Salary, Mid.Career.10th.Percentile.Salary,
Mid.Career.25th.Percentile.Salary, Mid.Career.75th.Percentile.Salary,
Mid.Career.90th.Percentile.Salary)`

Analysis: ANOVA significance tests for school regions

  • At \(\alpha = 0.05\), we can reject the null hypothesis (there is no association between median starting salary and region). There is enough statistical evidence to conclude that there is an association between median starting salary and school region.

  • At \(\alpha = 0.05\), we can reject the null hypothesis (there is no association between median mid-career salary and region). There is enough statistical evidence to conclude that there is an association between mid-career salary and school region.

Analysis: p-tests for school types

  • At \(\alpha = 0.05\), we can reject the null hypothesis (there is no difference between the median starting salaries of Ivy League and State schools). We draw the same conclusion for the median mid-career salary of Ivy League and State schools.

  • At \(\alpha = 0.05\), we fail to reject the null hypothesis (there is no difference between the median starting salaries of graduates from Ivy League and Engineering schools). However, for median mid-career salaries we reject the null hypothesis (there is no difference between the median mid-career salaries of graduates from Ivy League and Engineering schools).

Conclusions

  • School region impacts median starting and mid-career salaries.

  • Median starting and mid-career salaries of Ivy League graduates are significantly higher than state school graduates.

  • Median starting salaries of Ivy League graduates are not significantly higher than engineering school graduates, but the median mid-career salaries are significantly higher.