Highlights from EDA
Key buzzwords are aggregated into a mean bias score of a tweet, but our buzzwords analysis only involve analysis on word level so it might be hard to judge the intention in the tweet just by the words without connecting them. We also determind tweet sentiment from aggregting word connotation scores. Our most interesting exploratory analyses showed sentiment is higher among Democrats and that they generally have higher total view counts than Republicans.
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.2 ✔ purrr 1.0.0
✔ tibble 3.2.1 ✔ stringr 1.5.0
✔ tidyr 1.2.1 ✔ forcats 0.5.2
✔ readr 2.1.3
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
# A tibble: 2 × 3
affiliation MeanSentiment MeanViews
<chr> <dbl> <dbl>
1 D 0.135 25851.
2 R 0.0689 19790.
Conclusions + future work
Conclusions: - Based on our analysis, Democrats, on average, have a higher following on Twitter than Republicans. Republicans, however, have a higher user engagement rate on Twitter than Democrats, despite having fewer followers.
Republicans seem to land higher on the avg bias score and lower on sentiment. The opposite goes for Democrats; they seem to have more positive sentiment and a lower avg buzzword bias score.
There seems to be a negative relationship between the sentiment and the engagement rate of a Tweet, and it’s more dramatic for Republican Representatives (i.e. the sentiment goes down, the engagement rate increases).
In the future, we would like to refine our political affiliation prediction model. While there is some predictive power to the sentiment and avg_buzzword_bias variables (ROC AUC at 58%), they might not be powerful enough as predictors. Therefore, by using advanced language processing techniques and tuning the prediction model, we hope to arrive at accuracy of at least 85%, making this model useful for real-life applications and market targeting.