dc.description.abstract | The outbreak of COVID-19 in the later part of 2019 caused a lot of panic and led to the
loss of millions of lives. Much of the chaos could have been avoided if the spread was
detected in the early stages of the outbreak. Also had there been adequate information
about its mode of transmission, prevention measures and symptoms, the outbreak could
have been controlled. In this thesis we perform a classification of COVID-19 tweets using
BERT.
BERT is a deep learning algorithm that is designed using transformers. It is broadly
used on text data in natural language processing. We modify the BERT architecture for
COVID-19 tweet classification. We also show how to train the algorithm to identify our
tweets using biomedical literature abstracts as an alternate data source.
We discovered that the BERT model performs unsatisfactorily in our tests in comparison
to our baseline model, logistic regression. We also learned that the BERT model requires
a large amount of data for training. This is despite the fact that it has been pre-trained.
We also discovered that training the model with a combination of tweets and literature
abstracts improves its performance as opposed to training it with only literature abstracts. | |