dc.description.abstract | Citations are essential parts of scientific articles and other kinds of texts, and are utilized
for different purposes, including validating claims. As a result, finding and locating the
suitable places in a text for citations is crucial. This research aims to automate this process
by using machine learning and deep learning methods to find sentences worthy of
citations. After that, it examines the effect of publication year and the possibility of domain
generalization. This research uses a quantitative research method and develops
an experimental design to regard the mentioned problems.
After some pre-processing steps to create the required labeled dataset, this dissertation
first evaluates the best state-of-the-art (SOTA) models and algorithms suitable for
the problems. The second step is to experiment with the effect of publication year to
include the best quality data for training the models. These analyses prove that recent
publications are more suitable to be part of training datasets.
As the final step, this research examines the factor of the scientific domain of research.
The conventional process is to train, evaluate and test the data considering one
field of study as it is supposed to bring the best result. However, training and testing
in the same field of study are not always possible because of the unavailability of
proper data. This scientific work also explores the possibility of training in one domain
and generalizing to another domain. It concludes that some domains are closer to each
other and the models created using those domains can be generalized to those similar
domains. At the same time, this study suggests that researchers should be more cautious
in generalizing the created model of irrelevant domains to each other. | |