Machine Learning methods to detect improper and irrelevant citations

Shenavari Shirazi, Anousheh

Shenavari Shirazi, Anousheh

Master thesis

Published version

Åpne

Master_Thesis (6.802Mb)

Permanent lenke

http://hdl.handle.net/11250/2564360

Utgivelsesdato

2018-06-15

Metadata

Vis full innførsel

Samlinger

Studentoppgaver (TN-IDE) [866]

Sammendrag

The focus of this study is on the relation between papers and their citations using Machine Learning algorithms to detect improper and irrelevant citations. The model takes the paper’s citations and classifies them into two classes, ”Related” and ”Barely related” citations. Here we considered two Machine Learning algorithms, ”Decision tree algorithm” and ”Naive Bayes algorithm” along with introducing the statistical algorithm called ”Prior statistical algorithm” to classify the relation.

During the design process of the classification models, the required data for implementing have been collected from a large-scale and reliable data source. Converting techniques have been used to transform data to the structured format.

The evaluation results show that the Prior statistical model has limitation since it applied on dataset considering only one feature, however from the two machine learning algorithms that we employed, Naive Bayes outperform decision tree since it was extremely fast and did not require a very large training set to obtain a good learning model, however, Decision Tree was easier to implement and understand.

Beskrivelse

Master's thesis in Computer science

Utgiver

University of Stavanger, Norway

Serie

Masteroppgave/UIS-TN-IDE/2018;

Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal