Simplify Your Data Preparation With These 4 Lesser-Known Scikit-Learn Classes

<p>Data preparation is famously the least-loved aspect of Data Science. If done right, however, it needn&rsquo;t be such a headache.</p> <p>While scikit-learn has fallen out of vogue as a&nbsp;<em>modelling</em>&nbsp;library in recent years given the meteoric rise of PyTorch, LightGBM, and XGBoost, it&rsquo;s still easily one of the best&nbsp;<em>data preparation&nbsp;</em>libraries out there.</p> <p>And I&rsquo;m not just talking about that old chestnut:&nbsp;<code>train_test_split</code>. If you&rsquo;re prepared to dig a little deeper, you&rsquo;ll find a treasure trove of helpful tools for more advanced data preparation techniques, all of which are perfectly compatible with using other libraries like&nbsp;<code>lightgbm</code>,&nbsp;<code>xgboost</code>&nbsp;and&nbsp;<code>catboost</code>&nbsp;for subsequent modelling.</p> <p>In this article, I&rsquo;ll walk through four scikit-learn classes which significantly speed up my data preparation workflows in my day-to-day job as a Data Scientist.</p> <p><a href="https://towardsdatascience.com/simplify-your-data-preparation-with-these-4-lesser-known-scikit-learn-classes-70270c94569f">Read More</a></p>