Classification of COVID-19 Tweets using Bidirectional Encoder Representation for Transformers(BERT)
Master thesis
Permanent lenke
https://hdl.handle.net/11250/2786165Utgivelsesdato
2021Metadata
Vis full innførselSamlinger
- Studentoppgaver (TN-IDE) [823]
Sammendrag
The outbreak of COVID-19 in the later part of 2019 caused a lot of panic and led to theloss of millions of lives. Much of the chaos could have been avoided if the spread wasdetected in the early stages of the outbreak. Also had there been adequate informationabout its mode of transmission, prevention measures and symptoms, the outbreak couldhave been controlled. In this thesis we perform a classification of COVID-19 tweets usingBERT.BERT is a deep learning algorithm that is designed using transformers. It is broadlyused on text data in natural language processing. We modify the BERT architecture forCOVID-19 tweet classification. We also show how to train the algorithm to identify ourtweets using biomedical literature abstracts as an alternate data source.We discovered that the BERT model performs unsatisfactorily in our tests in comparisonto our baseline model, logistic regression. We also learned that the BERT model requiresa large amount of data for training. This is despite the fact that it has been pre-trained.We also discovered that training the model with a combination of tweets and literatureabstracts improves its performance as opposed to training it with only literature abstracts.