Cite Worthiness Detection; 
SOTA, and Model Applicability to Other Domains

2022

Full text not available

Citations are essential parts of scientific articles and other kinds of texts, and are utilized

for different purposes, including validating claims. As a result, finding and locating the

suitable places in a text for citations is crucial. This research aims to automate this process

by using machine learning and deep learning methods to find sentences worthy of

citations. After that, it examines the effect of publication year and the possibility of domain

generalization. This research uses a quantitative research method and develops

an experimental design to regard the mentioned problems.

After some pre-processing steps to create the required labeled dataset, this dissertation

first evaluates the best state-of-the-art (SOTA) models and algorithms suitable for

the problems. The second step is to experiment with the effect of publication year to

include the best quality data for training the models. These analyses prove that recent

publications are more suitable to be part of training datasets.

As the final step, this research examines the factor of the scientific domain of research.

The conventional process is to train, evaluate and test the data considering one

field of study as it is supposed to bring the best result. However, training and testing

in the same field of study are not always possible because of the unavailability of

proper data. This scientific work also explores the possibility of training in one domain

and generalizing to another domain. It concludes that some domains are closer to each

other and the models created using those domains can be generalized to those similar

domains. At the same time, this study suggests that researchers should be more cautious

in generalizing the created model of irrelevant domains to each other.

uis