I Declare Myself the #1 Enemy of Over/Undersampling, SMOTE and ADASYN, Here’s Why & How I…

<p>In machine learning, imbalanced data sets are a common and challenging problem. As data scientists, we often find ourselves in scenarios where we&rsquo;re trying to build models with data where one class greatly outnumbers the other. A classic solution to this issue has been the use of over/undersampling techniques. These methods balance the data by either increasing the instances of the minority class (oversampling) or decreasing the instances of the majority class (undersampling). But, after many projects and ample experience, I have come to the conclusion that these techniques are not the best way to deal with imbalanced datasets. Indeed, I have declared myself the #1 enemy of over/undersampling techniques, and here&rsquo;s why.</p> <p>&nbsp;</p> <h1>Over/Undersampling: Not as Effective as You Might Think</h1> <p>Over/undersampling methods come with a handful of drawbacks and risks that may be overlooked in the face of a seemingly quick and easy solution. Let&rsquo;s delve deeper into the issues associated with these techniques.</p> <p><strong>Overfitting and Under-representation:&nbsp;</strong>The oversampling technique, while it may seem logical at first glance, carries a risk of overfitting, especially in the case of simple methods such as random oversampling, which duplicates instances of the minority class. Overfitting occurs when the model, instead of learning the general patterns in the data, starts to memorize these instances. As a result, although the model might perform well on training data, it is likely to perform poorly on unseen data.</p> <p>On the other hand, undersampling can result in the loss of significant information from the majority class, leading to the under-representation of crucial patterns in the data. Hence, both oversampling and undersampling can lead to models that are incapable of generalizing well to new, unseen data.</p> <p><a href="https://juandelacalle.medium.com/i-declare-myself-the-1-enemy-of-over-undersampling-smote-and-adasyn-heres-why-how-i-5889b5073419">Visit Now</a></p>
Tags: SMOTE ADASYN