Show simple item record

dc.contributor.advisorFarmanbar Mina
dc.contributor.advisorSulaiman Muhammad
dc.contributor.authorKoloszyc Piotr
dc.contributor.authorLi Rongbing
dc.date.accessioned2023-09-07T15:51:21Z
dc.date.available2023-09-07T15:51:21Z
dc.date.issued2023
dc.identifierno.uis:inspera:129718883:22842775
dc.identifier.urihttps://hdl.handle.net/11250/3087985
dc.description.abstractAssigning class labels to instances is a key component of the machine learning technique known as classification predictive modeling. While concentrating largely on balanced classification problems, which are thought to be the easiest type, the prevalent models and assessment metrics used in classification learning assume an equal distribution of data across class labels. Many machine learning algorithms fail when the distribution of instances among classes is unbalanced, and the assessment measures used, including classification accuracy, become dangerously misleading. Numerous real-world issues, including as fraud detection, churn prediction, medical diagnosis, and many more, frequently include imbalanced class distributions. In fact, it is frequently more frequent to find unbalanced courses than balanced ones, emphasizing how important it is to solve this problem. This thesis primarily investigates innovative strategies for managing imbalanced data. One of the approaches examined is the utilization of the Majority and Minority repositioning Technique (MaMiPot) algorithms in combination with different variations of SMOTE and the application of K-means clustering before repositioning. Another method emphasized in this research is the implementation of Generative Adversarial Networks (GAN), a neural network-based technique designed for addressing imbalanced data issues. The evaluation of these approaches was performed on 25 imbalanced datasets obtained from the KEEL repository, encompassing various levels of class imbalance ratios spanning from 5.14 to 129.44. To assess the performance of the proposed method in mitigating the class imbalance problem, several evaluation metrics were utilized. These metrics include F-score, G-mean, and AUC, which provide valuable insights into the effectiveness of the approach in improving classification results and addressing the challenges posed by imbalanced datasets.
dc.description.abstract
dc.languageeng
dc.publisheruis
dc.titleComparative Analysis of Sampling Methods for Imbalanced Classification
dc.typeMaster thesis


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record