<strong>Paper Title</strong><br>

AN EMPIRICAL COMPARISON OF RESAMPLING TECHNIQUES FOR SOFTWARE DEFECT PREDICTION<br>

<br>


<strong>Abstract</strong><br>

In machine learning, class imbalance is a major problem, especially in areas like software defect prediction. In order to address this problem, resampling techniques are frequently used to rebalance the class distribution before training the model. In this empirical study, we compare five machine learning algorithms (random forest, k-nearest neighbors (KNN), neural network, gradient boost, and support vector machine) with ten resampling techniques, comprising five undersampling and five oversampling methods. Using software defect prediction datasets, we evaluate the performance metrics precision, recall, and F1-score. Our results demonstrate how different resampling strategies can enhance the performance of machine learning algorithms on unbalanced datasets. We discuss the implications of our results and provide insights into selecting suitable strategies for addressing class imbalances in machine learning tasks. This research contributes to enhancing the understanding of resampling techniques and their practical application in real-world scenarios, particularly in software engineering domains.

Keywords - Class Imbalance, Machine Learning, Resampling Techniques, Software Defect Prediction