Elegant Starmie

Preregistration of analyses

Analysis #1

Question: Is there a relationship between the number of polls a pollster conducted and analyzed, and the accuracy of said polls?

We will study the relationship between the variables bias and polls_analyzed.

bias is a quantitative variable that measures the statistical bias of a poll by subtracting the actual margin of victory from the predicted margin of victory. Positive values indicate a bias towards Democrats while negative values indicate a bias towards Republicans. In this analysis, bias represents the accuracy of the polls taken by each pollster. polls_analyzed is also a quantitative variable that simply represents the number of polls each pollster conducted that were analyzed by FiveThirtyEight.

Because we are not studying how political party affects the relationship between bias and polls_analyzed, we will manipulate bias to become the absolute value of its existing values.

Then, we will visualize the data using a scatterplot, transforming the x-axis with the scale_x_log10() function to linearize the visualization. We will also fit a linear model predicting bias based on the log of polls_analyzed. From that model, we’ll calculate summary statistics such as correlation, intercept, and slope, and use those to answer the research question.

Analysis #2

In our second analysis, we will be looking at the relationship between variables polls_analyzed and the grade variable.

As described in Analysis #1, polls_analyzed is simply a quantitative variable that simply represents the number of polls each pollster conducted that were analyzed by FiveThirtyEight. The grade variable is a categorical lettering system that goes from A, A/B, … , to F that reflects the accuracy of a polling organization’s polls as well as how well a pollster’s polls will do in the future, according to FiveThirtyEight.

To analyze these variables for significance, we will again visualize the data using a scatterplot. The x-axis will be the number of polls, representing the polls-analyzed variable and will be transformed with scale_x_log10() to fit a linear model. Since grade is a categorical variable, we will change the values to numerical ones. We will give A the value of 11, A/B 10, … and F will get 0. This will allow us to see if a linear model is even the best fit, or if another model will fit better. From this, we will find the best fitting model, conduct hypothesis testing to see if the relationship is significant or not, and interpret the correlation.