Tag: Workflows

Effortless Scalability: How Asynchronous Lambda Invocation Transforms AWS Workflows

Situation: While working on working multiple countries/regions file (with one file dedicated to each country), we encountered the requirement to execute a specific set of data cleaning, validation, transformation and enrichment task for each of these countries. To streamline this process and enable ...

Task Parameters and Values in Databricks Workflows

Databricks provides a set of powerful and dynamic orchestration capabilities that are leveraged to build scalable pipelines supporting data engineering, data science, and data warehousing workloads. Databricks Workflows allow users to create jobs that can have many tasks. Tasks can be exec...

Databricks Workflows: Orchestration Made Easy

When it comes to orchestration frameworks for data engineering, there are many different options. Airflow is either loved or hated based on who you ask, as I’ve discussed in a previous post. As someone who has used Airflow for close to 6 years now, I haven’t seen enough bad to put m...

Automating ML Workflows: Webhooks in Databricks with MLflow

A Use Case: Machine Learning (ML) is a dynamic field where models are continuously improved and updated. Consider an ML engineer at a tech company that deploys models for image recognition. Every time the engineer updates or improves a model, they must ensure it meets the required accuracy and pe...

Docker: The Unsung Hero in Modern Data Science Workflows

In the vast landscape of data science tools, Docker stands out not as the shiniest or the newest, but perhaps as the most transformative. It’s the bridge between development and deployment, theory and practice. But what makes Docker so important, especially in the context of real-world softwar...

Reclaim Your Data Ownership: Leveraging Unix Philosophy for Modern Digital Workflows

Amidst the ever-changing technological landscape, some battle-tested principles remain very relevant. The Unix Philosophy is one such guiding framework, offering not just historical context, but actionable insights into personal data management in today’s world. It represents a path ...