Spurious Correlations: The Comedy and Drama of Statistics

<p>Since&nbsp;<em>Tyler Vigen</em>&nbsp;coined the term &lsquo;spurious correlations&rsquo; for &ldquo;any random correlations dredged up from silly data&rdquo; (Vigen, 2014) see:&nbsp;<a href="https://www.tylervigen.com/" rel="noopener ugc nofollow" target="_blank">Tyler Vigen&rsquo;s personal website</a>, there have been many articles that pay tribute to the perils and pitfalls of this whimsical tendency to manipulate statistics to make correlation equal causation. See: HBR (2015), Medium (2016), FiveThirtyEight (2016). As data scientists, we are tasked with providing statistical analyses that either accept or reject null hypotheses. We are taught to be ethical in how we source data, extract it, preprocess it, and make statistical assumptions about it. And this is no small matter &mdash; global companies rely on the validity and accuracy of our analyses. It is just as important that our work be reproducible. Yet, in spite of all of the &lsquo;good&rsquo; that we are taught to practice, there may be that ​one occasion (or more) where a boss or client will insist that you work the data until it supports the hypothesis and, above all, show how variable y causes variable x when correlated.&nbsp;</p> <p><a href="https://towardsdatascience.com/spurious-correlations-the-comedy-and-drama-of-statistics-b63bf99169d8"><strong>Learn More</strong></a></p>