Data LakeHouse Part III — Data Ingestion

<p>First step of building your Data LakeHouse begins with bringing your data into it. Word of Caution: If you start with this activity before careful planning and modeling, You may end up with a Data Swamp which happened with lots of DataLakes.</p> <p>Plan on bringing all data patterns to your LakeHouse: Real time (Streaming), Near Real time and Batch. Different patterns will be mandated by the use case (e.g. Credit score for a potential customer), technical capability of the platform generating source information, as well as the business function which is performed.</p> <p>Delta Lake is the storage layer that provides the capability of storing data and tables in the Databricks Lakehouse. It extends Parquet data files with a file-based transaction log for ACID transactions. It is also the default storage format for all operations on Databricks. All tables on Databricks are Delta tables, unless otherwise specified.</p> <p><a href="https://medium.com/@architectmdm/data-lakehouse-part-iii-data-ingestion-60bf6fd5fb2d"><strong>Click Here</strong></a></p>
Tags: Lakehouse Data