Classification of COVID-19 Tweets using Bidirectional Encoder Representation for Transformers(BERT)

2021

The outbreak of COVID-19 in the later part of 2019 caused a lot of panic and led to the

loss of millions of lives. Much of the chaos could have been avoided if the spread was

detected in the early stages of the outbreak. Also had there been adequate information

about its mode of transmission, prevention measures and symptoms, the outbreak could

have been controlled. In this thesis we perform a classification of COVID-19 tweets using

BERT.

BERT is a deep learning algorithm that is designed using transformers. It is broadly

used on text data in natural language processing. We modify the BERT architecture for

COVID-19 tweet classification. We also show how to train the algorithm to identify our

tweets using biomedical literature abstracts as an alternate data source.

We discovered that the BERT model performs unsatisfactorily in our tests in comparison

to our baseline model, logistic regression. We also learned that the BERT model requires

a large amount of data for training. This is despite the fact that it has been pre-trained.

We also discovered that training the model with a combination of tweets and literature

abstracts improves its performance as opposed to training it with only literature abstracts.

uis