Cite Worthiness Detection; SOTA, and Model Applicability to Other Domains
Master thesis
Permanent lenke
https://hdl.handle.net/11250/3032533Utgivelsesdato
2022Metadata
Vis full innførselSamlinger
- Studentoppgaver (TN-IDE) [823]
Beskrivelse
Full text not available
Sammendrag
Citations are essential parts of scientific articles and other kinds of texts, and are utilizedfor different purposes, including validating claims. As a result, finding and locating thesuitable places in a text for citations is crucial. This research aims to automate this processby using machine learning and deep learning methods to find sentences worthy ofcitations. After that, it examines the effect of publication year and the possibility of domaingeneralization. This research uses a quantitative research method and developsan experimental design to regard the mentioned problems.
After some pre-processing steps to create the required labeled dataset, this dissertationfirst evaluates the best state-of-the-art (SOTA) models and algorithms suitable forthe problems. The second step is to experiment with the effect of publication year toinclude the best quality data for training the models. These analyses prove that recentpublications are more suitable to be part of training datasets.
As the final step, this research examines the factor of the scientific domain of research.The conventional process is to train, evaluate and test the data considering onefield of study as it is supposed to bring the best result. However, training and testingin the same field of study are not always possible because of the unavailability ofproper data. This scientific work also explores the possibility of training in one domainand generalizing to another domain. It concludes that some domains are closer to eachother and the models created using those domains can be generalized to those similardomains. At the same time, this study suggests that researchers should be more cautiousin generalizing the created model of irrelevant domains to each other.