Machine Learning methods to detect improper and irrelevant citations
Master thesis
Published version

View/ Open
Date
2018-06-15Metadata
Show full item recordCollections
- Studentoppgaver (TN-IDE) [936]
Abstract
The focus of this study is on the relation between papers and their citations using Machine Learning algorithms to detect improper and irrelevant citations. The model takes the paper’s citations and classifies them into two classes, ”Related” and ”Barely related” citations. Here we considered two Machine Learning algorithms, ”Decision tree algorithm” and ”Naive Bayes algorithm” along with introducing the statistical algorithm called ”Prior statistical algorithm” to classify the relation.
During the design process of the classification models, the required data for implementing have been collected from a large-scale and reliable data source. Converting techniques have been used to transform data to the structured format.
The evaluation results show that the Prior statistical model has limitation since it applied on dataset considering only one feature, however from the two machine learning algorithms that we employed, Naive Bayes outperform decision tree since it was extremely fast and did not require a very large training set to obtain a good learning model, however, Decision Tree was easier to implement and understand.
Description
Master's thesis in Computer science