10 Things I Learned from Reading Fundamentals of Data Engineering
<p>After two enriching years as a Data Engineer, I finally had the chance to dive into <a href="https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/" rel="noopener ugc nofollow" target="_blank"><em>Fundamentals of Data Engineering</em></a> written by the insightful minds of Joe Reis and Matt Housley.</p>
<p>Reading this book inspired me to connect my data experience with its theoretical understanding. The book’s ideas made me reflect on my work as a Data Engineer and helped me bridge the gap between theory and practice.</p>
<p>In this blog post, I am excited to share 5 key learnings from this book. I focus on one key aspect from each chapter to keep this blog post informative.</p>
<h2><strong>What this book is about?</strong></h2>
<p><em>Fundamentals of Data Engineering</em> is the No. 1 fundamental book that’s recommended by lots of experienced data engineers. It helps a data engineer to understand how to apply the concepts of data generation, ingestion, orchestration, transformation, storage, and governance that are critical in any data environment regardless of the underlying technology.</p>
<p><img alt="" src="https://miro.medium.com/v2/resize:fit:250/0*OKKMuGAEopC9s8GW" style="height:328px; width:250px" /></p>
<p>image by O’Reilly</p>
<h2>What is Data Engineering (Chapter 1)</h2>
<p>Data engineering is the design, implementation, and maintenance of systems and processes that transform raw data into high-quality, consistent information to support subsequent use cases, such as analysis and machine learning. Data engineering is where security, data management, DataOps, data architecture, orchestration, and software engineering intersect. A data engineer manages the data engineering lifecycle, from obtaining data from source systems to serving data for use cases such as analysis and machine learning.</p>
<h2>Data Engineering Life Cycle (Chapter 2)</h2>
<p>The Data Engineering lifecycle is divided into the following five stages:</p>
<ul>
<li><strong>Generation: Source systems</strong><br />
A source system is the origin of the data used in the data engineering lifecycle. The data engineer needs to know how source systems work, how they generate data, how often and quickly they generate data, and what kinds of data they generate.</li>
<li><strong>Storage</strong><br />
Choosing a storage solution is key to success in the rest of the data lifecycle. What kind of storage system should you use depends on the use cases, data volumes, frequency of ingestion, data format, and size.</li>
<li><strong>Ingestion</strong><br />
Batch versus streaming: Consider the streaming use cases and benefits, the best tool to utilize in those situations, and a business case to support the trade-off. Streaming isn’t always easy; there are always extra costs and complications. Batch is a great way to do many common things, like training models and sending out weekly reports.</li>
<li><strong>Transformation</strong><br />
The transformation stage is where data begins to create value for downstream user consumption. Typically data is transformed in source systems or in flight during ingestion, and business logic is a major driver of data transformation.</li>
<li><strong>Serving Data</strong><br />
Data has value when it’s used for practical purposes. Some of the popular uses of data include analytics (including Business intelligence, Operational analytics that focuses on the fine-grained details of operations, Embedded analytics), ML, and reverse ETL.</li>
</ul>
<p> </p>
<p><a href="https://blog.det.life/10-things-i-learned-from-reading-fundamentals-of-data-engineering-eea5dc8e5fb7">Visit Now</a></p>