Class Imbalance: From SMOTE to SMOTE-N

<p>In the previous story we explained how the naive random oversampling and random oversampling examples (ROSE) algorithms work. More importantly, we also defined the class imbalance problem and derived solutions for it with intuition. I highly recommend checking that&nbsp;<a href="https://medium.com/@essamwissam/class-imbalance-from-random-oversampling-to-rose-517e06d7a9b" rel="noopener">story</a>&nbsp;to ensure clear understanding of class imbalance.</p> <p>In this story, we will continue by considering the SMOTE, SMOTE-NC and SMOTE-N algorithms. But before we do, it&rsquo;s worthy to point out the two algorithms we considered in the last story fit the following implementation framework:</p> <ol> <li>Define how the algorithm takes data belonging to class&nbsp;<em>X</em>&nbsp;that needs&nbsp;<em>Nx&nbsp;</em>examples and computes such examples by oversampling</li> <li>Given some ratios hyperparameter, compute the number of points that need to be added for each class</li> <li>For each class run the algorithm then combine all newly added points together with the original data to form the final oversampled dataset</li> </ol> <p>For both the random oversampling and ROSE algorithms it was also true that to generate&nbsp;<em>Nx</em>&nbsp;examples for class&nbsp;<em>X&nbsp;</em>the algorithm does the following:</p> <ul> <li>Choose&nbsp;<em>Nx&nbsp;</em>points randomly with replacement from the data belonging to class&nbsp;<em>X</em></li> <li>Perform logic on each of the chosen points to generate a new point (e.g., replication or placing a Gaussian then sampling from it)</li> </ul> <p>It holds that the rest of the algorithms we will consider in this story also fit the same framework.</p> <p><strong>SMOTE (Synthetic Minority Oversampling Technique)</strong></p> <p>Thus, to explain what SMOTE does we only need to answer one question: What logic is performed on each of the&nbsp;<em>Nx&nbsp;</em>randomly chosen with replacement examples from class&nbsp;<em>X</em>&nbsp;in order to generate&nbsp;<em>Nx&nbsp;</em>new examples?</p> <p><a href="https://towardsdatascience.com/class-imbalance-from-smote-to-smote-n-759d364d535b">Read More</a></p>