6 Embarrassing Sklearn Mistakes You May Be Making And How to Avoid Them
<p>Learn to avoid the six most serious mistakes related to machine learning theory that beginners often make through Sklearn.</p>
<p><img alt="" src="https://miro.medium.com/v2/resize:fit:700/1*BWM6kIaIWoJkDNxi1F4mSA.jpeg" style="height:394px; width:700px" /></p>
<p>Image by me with Leonardo AI</p>
<p>Often, Sklearn throws big red error messages and warnings when you make a mistake. These messages suggest something is seriously wrong with your code, preventing the Sklearn magic from doing its job.</p>
<p>But what happens if you don’t get any errors or warnings? Does that mean you are crushing it so far? <em>Not necessarily</em>. Many knobs and dials make Sklearn the greatest ML library, its world-class <em>code design</em> being an example.</p>
<p>The mistakes while writing Sklearn code can easily be fixed. What <em>can</em> go unnoticed is the mistakes related to the <em>internal logic</em> and ML theory that powers Sklearn algorithms and transformers.</p>
<p>These mistakes are especially more common and subtle when you are a beginner. So this post will be about the six such mistakes I made and learned to avoid when I was a beginner myself.</p>
<h2>. Using <code>fit</code> or <code>fit_transform</code> everywhere</h2>
<p>Let’s start with the most serious mistake — a mistake that is related to <em>data leakage</em>. Data leakage is subtle and can be destructive to model performance.</p>
<p>It occurs when information that would not be available at prediction time is used during the model training. Data leakage causes models to give very optimistic results, even in cross-validation, but perform terribly when testing on <em>actual </em>novel data.</p>
<p><a href="https://towardsdatascience.com/6-embarrassing-sklearn-mistakes-you-may-be-making-and-how-to-avoid-them-6be5bbdbb987"><strong>Click Here</strong></a></p>