ANALYSIS OF THE CORRELATION BETWEEN VACCINE STANCES ON SOCIAL MEDIA AND FACTORS OF THE PANDEMIC SITUATION

University essay from Umeå universitet/Institutionen för datavetenskap

Abstract: Vaccine hesitancy is considered a major threat to global health and social media is playing an increasingly significant role in spreading anti-vaccine sentiments. This study aims to track changes in the proportion of anti-vaccine tweets during a period of the Covid-19 pandemic and explore potential correlations between anti-vaccine stances and various factors of the pandemic situation. In particular, it aims to find which factors can best predict changes in vaccine stances and whether factors in the United States give better predictions than the global average. To gather data on vaccine stances on Twitter, stance detection is used through an implementation of a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model, fine-tuned using tweets labeled based on vaccine stance. This data is used to perform a correlation analysis between changes in proportion of anti-vaccine tweets over time and various factors of the pandemic situation, including confirmed Covid-19 cases and government policies. The analysis is performed by training linear regression models with variables from the pandemic data as independent variables. The model that performed the best, as measured by having the lowest Root Mean Square Error, was the one trained with the independent variables Vaccination policy (in the United States) and Containment health index (global average). Among the top 20 best performing models the most common variables were index-variables consisting of aggregations of individual indicators, suggesting that aggregated data may give more data for the model to base its predictions on. Vaccination policy and Stay at home requirements for the United States were collectively included in the four best performing models, giving support to the view that factors about the United States are better predictors than the global average. On the other hand, among the top 20 best performing models the majority of the independent variables were of the global average, which gives an opposite impression. In the end there are not enough results to draw any strong conclusion on this issue.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)