Lakehouse — Databricks vs. AWS EMR

<p><strong>Disclaimer</strong>:&nbsp;<em>The decision on which ETL tool to use took place in May, EMR Serverless was in preview, and it did not support Delta Lake natively back then.</em></p> <h2>About this blog series</h2> <p>At claimsforce, our initial approach to big data was a two-tier architecture consisting of a Data Lake stage in Amazon S3 and a Data Warehouse stage in Amazon Redshift (outline<a href="https://aws.amazon.com/blogs/startups/how-claimsforce-built-a-future-proof-lake-house-with-aws/" rel="noopener ugc nofollow" target="_blank">&nbsp;here</a>). Over time we realized that having two stages comes with disadvantages like engineering and maintenance effort, infrastructure costs, and data staleness. We aim to replace the combination of a Data Lake and Data Warehouse with a unified system &mdash; the Lakehouse. In this blog series, we will document our journey toward a Lakehouse setup.</p> <p><a href="https://medium.com/claimsforce/lakehouse-databricks-vs-aws-emr-87e30c00b791"><strong>Learn More</strong></a></p>
Tags: AWS EMR