US Representatives on Twitter

A Study of Sentiment and Bias

Author

Marvelous Charmander
Kevin Pei, Max Bohun, Cody Tanis, Gabe Hartmann, Luna Lu

Published

May 8, 2023

Topic and motivation

Motivation:

We wanted to select an area of study that we would be really interested in performing analysis with.
We scraped some Twitter posts of members of the United States House of Representatives and performed analysis on them.
Out goal: glean information about engagement with political Twitter posts based on the language and content included, and how those correlate with political views.

Some of the question we’re going to answer include:

What effects do the sentiment and use of buzzwords in Tweets have on user engagement for political figures?
Are Republicans or Democrats more likely to tweet positively?

Obamna Soda GIFfrom Obamna GIFs

Introduce the data

We scraped specific information from Twitter pages of 440 total Congress members who’s pages we were able to find.

We input twitter handles to Apify to scrape the first 50 tweets off everyone’s page.
We merged the two datasets (the one with political affiliations and the one with tweets) by their twitter handles.
We used natural language processing to identify the sentiment of the tweet by analyzing key words and returning a rating (between -1 and 1).
After doing this, we found another dataset that included over 8,000 politically-charged buzzwords and their bias ratings.

We have 20,353 observations (tweets) - includes tweet views, likes, comments, shares, sentiment, political affiliation, etc.

The variables we used in our analysis are:

sentiment - Sentiment rating.
num_buzzwords - Number of buzzwords in a tweet.
avg_buzzword_bias_score - Average buzzword bias score.
created_at - Time the tweet was tweeted.
full_text - The entire tweet.
reply_count - Number of replies of the tweet.
retweet_count - Number of retweets of the tweet.
user_description - User description.
user_favourites_count - Number of user’s favourites.
user_friends_count - Number of user’s friends.
user_name - User’s name.
affiliation - User’s affiliation. “D” or “R”.

Highlights from EDA

Key buzzwords are aggregated into a mean bias score of a tweet, but our buzzwords analysis only involve analysis on word level so it might be hard to judge the intention in the tweet just by the words without connecting them. We also determind tweet sentiment from aggregting word connotation scores. Our most interesting exploratory analyses showed sentiment is higher among Democrats and that they generally have higher total view counts than Republicans.


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.2     ✔ purrr   1.0.0
✔ tibble  3.2.1     ✔ stringr 1.5.0
✔ tidyr   1.2.1     ✔ forcats 0.5.2
✔ readr   2.1.3     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

# A tibble: 2 × 3
  affiliation MeanSentiment MeanViews
  <chr>               <dbl>     <dbl>
1 D                  0.135     25851.
2 R                  0.0689    19790.

Analysis 1: Engagement Rate

Republicans have higher engagement rates than democrats (found via confidence interval and significance tests)
We infer that this is due to republicans creating content that has more shock, entertainment, or relevance value
Right-leaning followers might have more loyal and engaged fan-bases

`summarise()` has grouped output by 'user_name'. You can override using the
`.groups` argument.

Analysis 2: Bias and Sentiment

For the same number of buzzwords, Republicans typically see higher engagement
Republicans are generally more biased and have lower total sentiment
Inverted relationship between sentiment and engagement rate (more dramatic for Republicans); when the sentiment goes down, the engagement rate increases

Warning: Removed 123 rows containing non-finite values (`stat_smooth()`).

Warning: Removed 123 rows containing missing values (`geom_point()`).

`geom_smooth()` using formula = 'y ~ x'

Warning: Removed 6 rows containing non-finite values (`stat_smooth()`).

Warning: Removed 6 rows containing missing values (`geom_point()`).

Conclusions + future work

Conclusions: - Based on our analysis, Democrats, on average, have a higher following on Twitter than Republicans. Republicans, however, have a higher user engagement rate on Twitter than Democrats, despite having fewer followers.

Republicans seem to land higher on the avg bias score and lower on sentiment. The opposite goes for Democrats; they seem to have more positive sentiment and a lower avg buzzword bias score.
There seems to be a negative relationship between the sentiment and the engagement rate of a Tweet, and it’s more dramatic for Republican Representatives (i.e. the sentiment goes down, the engagement rate increases).

In the future, we would like to refine our political affiliation prediction model. While there is some predictive power to the sentiment and avg_buzzword_bias variables (ROC AUC at 58%), they might not be powerful enough as predictors. Therefore, by using advanced language processing techniques and tuning the prediction model, we hope to arrive at accuracy of at least 85%, making this model useful for real-life applications and market targeting.

Thank you for your attention!

via GIPHY