Comparative Analysis of Sampling Methods for Imbalanced Classification

Li, Rongbing; Koloszyc, Piotr

dc.contributor.advisor	Farmanbar, Mina
dc.contributor.advisor	Sulaiman , Muhammad
dc.contributor.author	Li, Rongbing
dc.contributor.author	Koloszyc, Piotr
dc.date.accessioned	2023-09-07T15:51:20Z
dc.date.available	2023-09-07T15:51:20Z
dc.date.issued	2023
dc.identifier	no.uis:inspera:129718883:111453655
dc.identifier.uri	https://hdl.handle.net/11250/3087984
dc.description.abstract	Assigning class labels to instances is a key component of the machine learning technique known as classification predictive modeling. While concentrating largely on balanced classification problems, which are thought to be the easiest type, the prevalent models and assessment metrics used in classification learning assume an equal distribution of data across class labels. Many machine learning algorithms fail when the distribution of instances among classes is unbalanced, and the assessment measures used, including classification accuracy, become dangerously misleading. Numerous real-world issues, including as fraud detection, churn prediction, medical diagnosis, and many more, frequently include imbalanced class distributions. In fact, it is frequently more frequent to find unbalanced courses than balanced ones, emphasizing how important it is to solve this problem. This thesis primarily investigates innovative strategies for managing imbalanced data. One of the approaches examined is the utilization of the Majority and Minority repositioning Technique (MaMiPot) algorithms in combination with different variations of SMOTE and the application of K-means clustering before repositioning. Another method emphasized in this research is the implementation of Generative Adversarial Networks (GAN), a neural network-based technique designed for addressing imbalanced data issues. The evaluation of these approaches was performed on 25 imbalanced datasets obtained from the KEEL repository, encompassing various levels of class imbalance ratios spanning from 5.14 to 129.44. To assess the performance of the proposed method in mitigating the class imbalance problem, several evaluation metrics were utilized. These metrics include F-score, G- mean, and AUC, which provide valuable insights into the effectiveness of the approach in improving classification results and addressing the challenges posed by imbalanced datasets.
dc.description.abstract
dc.language	eng
dc.publisher	uis
dc.title	Comparative Analysis of Sampling Methods for Imbalanced Classification
dc.type	Master thesis

Tilhørende fil(er)

Filnavn:: no.uis:inspera:129718883:11145 ...
Størrelse:: 1.652Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Studentoppgaver (TN-IDE) [866]
Studentoppgaver i informasjonsteknologi, datateknikk / kybernetikk, signalbehandling

Vis enkel innførsel