Optimizing Document Classification: Unleashing the Power of Genetic Algorithms

Mustafa, Ghulam; Rauf, Abid; Al-Shamayleh, Ahmad Sami; Sulaiman, Muhammad; Afzal, Muhammad Tanvir; Akhunzada, Adnan

dc.contributor.author	Mustafa, Ghulam
dc.contributor.author	Rauf, Abid
dc.contributor.author	Al-Shamayleh, Ahmad Sami
dc.contributor.author	Sulaiman, Muhammad
dc.contributor.author	Afzal, Muhammad Tanvir
dc.contributor.author	Akhunzada, Adnan
dc.date.accessioned	2024-04-10T08:56:20Z
dc.date.available	2024-04-10T08:56:20Z
dc.date.created	2023-11-28T11:35:47Z
dc.date.issued	2023
dc.identifier.citation	Mustafa, G., Rauf, A., Al-Shamayleh, A. S., Sulaiman, M., Alrawagfeh, W., Afzal, M. T., & Akhunzada, A. (2023). Optimizing document classification: Unleashing the power of genetic algorithms. IEEE Access.	en_US
dc.identifier.issn	2169-3536
dc.identifier.uri	https://hdl.handle.net/11250/3125721
dc.description.abstract	Many individuals, including researchers, professors, and students, encounter difficulties when searching for scholarly documents, papers, and journals within a specific domain. Consequently, scholars have begun to focus on document classification problem, offering various methods to address this issue. Researchers have utilized diverse data sources, such as citations, metadata, content, and hybrids, in their approaches.In these sources, the meta-data-based approach stands out for research paper classification due to its availability at no cost. Various scholars have employed different metadata parameters of research articles, including the title, abstract, keywords, and general terms, for research paper classification. In this study, we chose four meta-data-based features such as, title, keyword, abstract, and general terms from the SANTOS dataset, which was prepared by ACM. To represent these features numerically, we employed a semantic-based model called BERT instead of the commonly used count-based models. BERT generates a 768-dimensional vector for each record, which introduces significant time complexity during computation. Additionally, our proposed model optimizes the features using a genetic algorithm. Optimal feature selection performances a crucial role in this domain, enhancing the overall accuracy of the document classification system while reducing the time complexity associated with selecting the most relevant features from this large-dimensional space. For classification purposes, we employed GNB and SVM classifiers. The outcomes of our study exposed that the combination of title and keywords outperformed other combinations.	en_US
dc.language.iso	eng	en_US
dc.publisher	IEEE	en_US
dc.rights	Navngivelse 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/deed.no	*
dc.title	Optimizing Document Classification: Unleashing the Power of Genetic Algorithms	en_US
dc.type	Peer reviewed	en_US
dc.type	Journal article	en_US
dc.description.version	publishedVersion	en_US
dc.rights.holder	The authors	en_US
dc.subject.nsi	VDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550	en_US
dc.source.journal	IEEE Access	en_US
dc.identifier.doi	10.1109/ACCESS.2023.3292248
dc.identifier.cristin	2203647
cristin.ispublished	true
cristin.fulltext	original
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: Optimizing_Document_Classifica ...
Størrelse:: 1013.Kb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Publikasjoner fra CRIStin [4377]
Vitenskapelige publikasjoner (TN-IDE) [251]

Vis enkel innførsel

Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal