library(tidyverse)
Project proposal
<- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-09-05/demographics.csv')
demographics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-09-05/wages.csv')
wages <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-09-05/states.csv')
states
demographics
# A tibble: 1,327 × 8
year sample_size employment members covered p_members p_covered facet
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1973 49095 75519. 18089. NA 0.240 NA all wage an…
2 1974 48245 77101. 18176. NA 0.236 NA all wage an…
3 1975 46488 75704. 16778. NA 0.222 NA all wage an…
4 1976 47648 78777. 17403. NA 0.221 NA all wage an…
5 1977 57191 81334. 19335. 21535. 0.238 0.265 all wage an…
6 1978 57321 84966. 19548. 21898. 0.230 0.258 all wage an…
7 1979 58080 87117. 20986. 23540. 0.241 0.270 all wage an…
8 1980 68594 87480. 20095. 22493. 0.230 0.257 all wage an…
9 1981 15433 89538. 19137. 21453. 0.214 0.240 all wage an…
10 1983 173932 88290. 17717. 20532. 0.201 0.233 all wage an…
# ℹ 1,317 more rows
wages
# A tibble: 1,247 × 9
year sample_size wage at_cap union_wage nonunion_wage
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1973 39774 3.96 0.00111 4.61 3.75
2 1974 37966 4.26 0.00157 5.02 4.02
3 1975 37812 4.62 0.00227 5.43 4.39
4 1976 37888 4.91 0.00290 5.84 4.65
5 1977 46591 5.24 0.00356 6.38 4.87
6 1978 44577 5.58 0.00389 6.77 5.21
7 1979 44234 6.36 0.00520 7.31 6.03
8 1980 55795 6.89 0.00698 7.88 6.57
9 1981 12543 6.97 0.00768 8.57 6.51
10 1983 149169 8.06 0.0172 9.89 7.59
# ℹ 1,237 more rows
# ℹ 3 more variables: union_wage_premium_raw <dbl>,
# union_wage_premium_adjusted <dbl>, facet <chr>
states
# A tibble: 10,200 × 11
state_census_code state sector observations employment members covered
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 11 Maine Priv. Constr… 85 16918. 2207. 2420.
2 11 Maine Priv. Constr… 93 20170. 2208. 2208.
3 11 Maine Priv. Constr… 95 23412. 2491. 2788.
4 11 Maine Priv. Constr… 89 22873. 1917. 1917.
5 11 Maine Priv. Constr… 114 28033. 3377. 3377.
6 11 Maine Priv. Constr… 117 30535. 2734. 2976.
7 11 Maine Priv. Constr… 119 30998. 1532. 1532.
8 11 Maine Priv. Constr… 109 29472. 1960. 1960.
9 11 Maine Priv. Constr… 78 22491. 2025. 2279.
10 11 Maine Priv. Constr… 62 19440. 262. 904.
# ℹ 10,190 more rows
# ℹ 4 more variables: p_members <dbl>, p_covered <dbl>,
# state_abbreviation <chr>, year <dbl>
```
Dataset
A brief description of your dataset including its provenance, dimensions, etc. as well as the reason why you chose this dataset.
Make sure to load the data and use inline code for some of this information.
This dataset provides annual measures of union, nonunion, and overall wages, beginning in 1973 to 2022, compiled from the U.S. Current Population Surveys. There are three csv files (demographics
, which has 1327 rows and 8 columns, wages
, which has 1247 rows and 9 columns, and states
, which has 10200 rows and 11 columns), and some example variables include wage
, union_wage
, nonunion_wage
, state
and sectors
. We thought that the exploration that we can perform on this dataset would lead us to a better understanding of union workers and indicate possible societal impact.
Questions
How has union membership changed over time in states with significant manufacturing industries?
How union and non-union wages has changed over time?
Analysis plan
A plan for answering each of the questions including the variables involved, variables to be created (if any), external data to be merged in (if any).
Question 1:
To answer this question, we will need to consider the following variables:‘year’, ‘state’, and ‘p_members’, which encodes the percent of workers covered by a union. With these datapoints, we can compare the overall union membership of each state over time. Given the large number of states, we may wish to visualize trends for only some in order to reduce visual clutter and make an overall argument more visible. We wanted to focus on states with significant manufacturing industries, as these are traditionally more unionized. We will need to conduct some early data analysis to further see the underlying trends that we are working with, and therefore finalize our plan of state selection. We should not need to create new variables or merge in any additional data.
Question 2:
To answer this question, we need the ‘year’, ‘union_wage’ and nonunion_wage’ variables. With the wage variables, we can plot a time series with two lines to compare how workers’ wages in and out of unions have changed over the years. We will do this nationally and not by state, and we do not need to create new variables or merge in any additional data.