Three Simple Things About Regression That Every Data Scientist Should Know

<p>I consider myself more of a mathematician than a data scientist. I can&rsquo;t bring myself to execute methods blindly, with no understanding of what&rsquo;s going on under the hood. I have to get deep into the math to trust the results. That&rsquo;s a good thing because it&rsquo;s very easy nowadays to just run models and go home.</p> <p>A model is only as good as your understanding of it, and I worry that a lot of people are running models and just accepting the first thing that comes out of them. When it comes to regression modeling &mdash; one of the most common forms of modeling out there &mdash; you&rsquo;ll be a better data scientist if you can understand a few simple things about how these models work and why they are set up the way they are.</p> <h2>1. You are predicting an average &mdash; not an actual value</h2> <p>When you run a regression model, usually you are finding a relationship between the input variables and some sort of&nbsp;<strong>mean</strong>&nbsp;value related to the outcome. Let&rsquo;s look at linear regression. When we run a linear regression we are making two very important assumptions about our outcome variable&nbsp;<em>y</em>:</p> <ol> <li>That the possible values of&nbsp;<em>y&nbsp;</em>for any given input variables are distributed around a mean.</li> <li>That the&nbsp;<strong>mean</strong>&nbsp;of&nbsp;<em>y</em>&nbsp;has an&nbsp;<em>additive</em>&nbsp;relationship with the input variables. That is, to get the mean of&nbsp;<em>y&nbsp;</em>you&nbsp;<strong>add up</strong>&nbsp;some numbers that depend on each input variable.</li> </ol> <p><a href="https://keith-mcnulty.medium.com/three-simple-things-about-regression-that-every-data-scientist-should-know-d38ee17c5563"><strong>Click Here</strong></a></p>
Tags: Data Scientist