Master Data Analysis using Trustworthy Databases

<p>Python is an excellent programming language for beginners to handle and analyze data. We&rsquo;ll also talk about how to find relevant data sources and a bit about statistical testing. Let&rsquo;s break this down into a few steps:</p> <h1>Step 1: Finding Relevant and Trustworthy Databases</h1> <p>Depending on your field, there are many places where you can find datasets. For instance, some common data repositories are&nbsp;<a href="https://archive.ics.uci.edu/" rel="noopener ugc nofollow" target="_blank">UCI Machine Learning Repository</a>,&nbsp;<a href="https://www.kaggle.com/datasets" rel="noopener ugc nofollow" target="_blank">Kaggle</a>,&nbsp;<a href="https://datasetsearch.research.google.com/" rel="noopener ugc nofollow" target="_blank">Google Dataset Search</a>, and Government databases (e.g.,&nbsp;<a href="https://data.gov/" rel="noopener ugc nofollow" target="_blank">data.gov</a>,&nbsp;<a href="https://ec.europa.eu/eurostat/data/database" rel="noopener ugc nofollow" target="_blank">Eurostat</a>).</p> <p>Make sure to verify the credibility of the data source and its relevance to your field. I will tell you how to do that in the last section of this article.</p> <p>Also, note the data format (CSV, JSON, SQL, etc.). CSV is the simplest and most common format.</p> <h1>Step 2: Collecting the Data in a Table</h1> <p>Let&rsquo;s use Python&rsquo;s&nbsp;<code>pandas</code>&nbsp;library to handle our data.</p> <p>First, let&rsquo;s import&nbsp;<a href="https://pandas.pydata.org/" rel="noopener ugc nofollow" target="_blank">pandas</a>. If you don&rsquo;t have it, install it via pip:</p> <pre> pip install pandas</pre> <p>Then, in your Python script, do the following:</p> <pre> import pandas as pd</pre> <p>Assuming we have a CSV file named&nbsp;<code>my_data.csv</code>, we can load this into a pandas DataFrame (which is essentially a table) like so:</p> <pre> df = pd.read_csv(&#39;my_data.csv&#39;)</pre> <p>You can visualize the first 5 lines of your dataframe using</p> <p><a href="https://levelup.gitconnected.com/master-data-analysis-using-trustworthy-databases-15d83c7d084e">Read More</a></p>