Project proposal

Author

gold-koala

library(tidyverse)

demographics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-09-05/demographics.csv')
wages <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-09-05/wages.csv')
states <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-09-05/states.csv')

demographics

# A tibble: 1,327 × 8
    year sample_size employment members covered p_members p_covered facet       
   <dbl>       <dbl>      <dbl>   <dbl>   <dbl>     <dbl>     <dbl> <chr>       
 1  1973       49095     75519.  18089.     NA      0.240    NA     all wage an…
 2  1974       48245     77101.  18176.     NA      0.236    NA     all wage an…
 3  1975       46488     75704.  16778.     NA      0.222    NA     all wage an…
 4  1976       47648     78777.  17403.     NA      0.221    NA     all wage an…
 5  1977       57191     81334.  19335.  21535.     0.238     0.265 all wage an…
 6  1978       57321     84966.  19548.  21898.     0.230     0.258 all wage an…
 7  1979       58080     87117.  20986.  23540.     0.241     0.270 all wage an…
 8  1980       68594     87480.  20095.  22493.     0.230     0.257 all wage an…
 9  1981       15433     89538.  19137.  21453.     0.214     0.240 all wage an…
10  1983      173932     88290.  17717.  20532.     0.201     0.233 all wage an…
# ℹ 1,317 more rows

wages

# A tibble: 1,247 × 9
    year sample_size  wage  at_cap union_wage nonunion_wage
   <dbl>       <dbl> <dbl>   <dbl>      <dbl>         <dbl>
 1  1973       39774  3.96 0.00111       4.61          3.75
 2  1974       37966  4.26 0.00157       5.02          4.02
 3  1975       37812  4.62 0.00227       5.43          4.39
 4  1976       37888  4.91 0.00290       5.84          4.65
 5  1977       46591  5.24 0.00356       6.38          4.87
 6  1978       44577  5.58 0.00389       6.77          5.21
 7  1979       44234  6.36 0.00520       7.31          6.03
 8  1980       55795  6.89 0.00698       7.88          6.57
 9  1981       12543  6.97 0.00768       8.57          6.51
10  1983      149169  8.06 0.0172        9.89          7.59
# ℹ 1,237 more rows
# ℹ 3 more variables: union_wage_premium_raw <dbl>,
#   union_wage_premium_adjusted <dbl>, facet <chr>

states

# A tibble: 10,200 × 11
   state_census_code state sector        observations employment members covered
               <dbl> <chr> <chr>                <dbl>      <dbl>   <dbl>   <dbl>
 1                11 Maine Priv. Constr…           85     16918.   2207.   2420.
 2                11 Maine Priv. Constr…           93     20170.   2208.   2208.
 3                11 Maine Priv. Constr…           95     23412.   2491.   2788.
 4                11 Maine Priv. Constr…           89     22873.   1917.   1917.
 5                11 Maine Priv. Constr…          114     28033.   3377.   3377.
 6                11 Maine Priv. Constr…          117     30535.   2734.   2976.
 7                11 Maine Priv. Constr…          119     30998.   1532.   1532.
 8                11 Maine Priv. Constr…          109     29472.   1960.   1960.
 9                11 Maine Priv. Constr…           78     22491.   2025.   2279.
10                11 Maine Priv. Constr…           62     19440.    262.    904.
# ℹ 10,190 more rows
# ℹ 4 more variables: p_members <dbl>, p_covered <dbl>,
#   state_abbreviation <chr>, year <dbl>

```

Dataset

A brief description of your dataset including its provenance, dimensions, etc. as well as the reason why you chose this dataset.

Make sure to load the data and use inline code for some of this information.

This dataset provides annual measures of union, nonunion, and overall wages, beginning in 1973 to 2022, compiled from the U.S. Current Population Surveys. There are three csv files (demographics, which has 1327 rows and 8 columns, wages, which has 1247 rows and 9 columns, and states, which has 10200 rows and 11 columns), and some example variables include wage, union_wage, nonunion_wage, state and sectors. We thought that the exploration that we can perform on this dataset would lead us to a better understanding of union workers and indicate possible societal impact.

Questions

How has union membership changed over time in states with significant manufacturing industries?
How union and non-union wages has changed over time?

Analysis plan

A plan for answering each of the questions including the variables involved, variables to be created (if any), external data to be merged in (if any).

Question 1:

To answer this question, we will need to consider the following variables:‘year’, ‘state’, and ‘p_members’, which encodes the percent of workers covered by a union. With these datapoints, we can compare the overall union membership of each state over time. Given the large number of states, we may wish to visualize trends for only some in order to reduce visual clutter and make an overall argument more visible. We wanted to focus on states with significant manufacturing industries, as these are traditionally more unionized. We will need to conduct some early data analysis to further see the underlying trends that we are working with, and therefore finalize our plan of state selection. We should not need to create new variables or merge in any additional data.

Question 2:

To answer this question, we need the ‘year’, ‘union_wage’ and nonunion_wage’ variables. With the wage variables, we can plot a time series with two lines to compare how workers’ wages in and out of unions have changed over the years. We will do this nationally and not by state, and we do not need to create new variables or merge in any additional data.