Automate the exploratory data analysis (EDA) to understand the data faster and easier

<p>EDA is one of the most important things we need to do as an approach to understand the dataset better. Almost all data analytics or data science professionals do this process before generating insights or doing data modeling.&nbsp;<strong>In real life, this process took a lot of time, depending on the complexity and completeness of the dataset we have.</strong>&nbsp;Of course, more variables make us explore more to get the summary we need before doing the next steps.</p> <p>That&rsquo;s why using R or Python, the most common programming language to do data analysis, some packages help to do that process faster and easier, but not better. Why not better? Because it only shows us a summary, before we focus to explore deeper any variables we find &ldquo;interesting&rdquo;.</p> <blockquote> <p>The &ldquo;80/20 rule&rdquo; applies: 80 percent of a data analyst or scientist&rsquo;s valuable time is spent simply finding, cleansing, and organizing data, leaving only 20 percent to perform analysis.</p> </blockquote> <h1>Which libraries we can use?</h1> <p>In R, we can use these libraries:</p> <ol> <li><code>dataMaid</code></li> <li><code>DataExplorer</code></li> <li><code>SmartEDA</code></li> </ol> <p>In Python, we can use these libraries:</p> <ol> <li><code>ydata-profiling</code></li> <li><code>dtale</code></li> <li><code>sweetviz</code></li> <li><code>autoviz</code></li> </ol> <p>Let&rsquo;s try each library listed above to know what they look like and how they can help us do exploratory data analysis! In this post, I will use the&nbsp;<code><a href="https://en.wikipedia.org/wiki/Iris_flower_data_set" rel="noopener ugc nofollow" target="_blank">iris</a></code>&nbsp;dataset which is common to be used to learn how to code in R or Python.</p> <p><a href="https://medium.com/codex/automate-the-exploratory-data-analysis-eda-to-understand-the-data-faster-not-better-2ed6ff230eed">Visit Now</a></p>
Tags: EDA Python