Data LakeHouse Part III ??? Data Ingestion

First step of building your Data LakeHouse begins with bringing your data into it. Word of Caution: If you start with this activity before careful planning and modeling, You may end up with a Data Swamp which happened with lots of DataLakes.

Plan on bringing all data patterns to your LakeHouse: Real time (Streaming), Near Real time and Batch. Different patterns will be mandated by the use case (e.g. Credit score for a potential customer), technical capability of the platform generating source information, as well as the business function which is performed.

Delta Lake is the storage layer that provides the capability of storing data and tables in the Databricks Lakehouse. It extends Parquet data files with a file-based transaction log for ACID transactions. It is also the default storage format for all operations on Databricks. All tables on Databricks are Delta tables, unless otherwise specified.

Click Here

Data LakeHouse Part III ??? Data Ingestion

Related posts

Recent posts