Lakehouse — Databricks vs. AWS EMR
<p><strong>Disclaimer</strong>: <em>The decision on which ETL tool to use took place in May, EMR Serverless was in preview, and it did not support Delta Lake natively back then.</em></p>
<h2>About this blog series</h2>
<p>At claimsforce, our initial approach to big data was a two-tier architecture consisting of a Data Lake stage in Amazon S3 and a Data Warehouse stage in Amazon Redshift (outline<a href="https://aws.amazon.com/blogs/startups/how-claimsforce-built-a-future-proof-lake-house-with-aws/" rel="noopener ugc nofollow" target="_blank"> here</a>). Over time we realized that having two stages comes with disadvantages like engineering and maintenance effort, infrastructure costs, and data staleness. We aim to replace the combination of a Data Lake and Data Warehouse with a unified system — the Lakehouse. In this blog series, we will document our journey toward a Lakehouse setup.</p>
<p><a href="https://medium.com/claimsforce/lakehouse-databricks-vs-aws-emr-87e30c00b791"><strong>Learn More</strong></a></p>