Tag: Apache

Getting started with Apache web server

Apache is an open source software which enables users to deploy their websites on the internet. It is responsible for accepting HTTP request from visitors and sending back the requested information. Apache is developed and maintained by the Apache Software Foundation. Companies such as Linke...

A beginner’s guide to using Apache Hudi for data lake management.

Data lakes have become an essential part of data management in today’s organisations. They provide a centralised repository that can store structured and unstructured data at any scale. However, managing data lakes can be a challenging task, especially for beginners. Apache Hudi is an open-sou...

Apache Doris 2.0.0 is Production-Ready!

We are more than excited to announce that, after six months of coding, testing, and fine-tuning, Apache Doris 2.0.0 is now production-ready. Special thanks to the 275 committers who altogether contributed over 4100 optimizations and fixes to the project. This new version highlights: Auto-sy...

Apache Airflow: Custom Task Triggering for Efficient Data Pipelines

Apache Airflow is an indispensable tool for orchestrating data pipelines, making it a must-know tool for any data engineer in 2023. Like any tool, Airflow has its advantages and disadvantages. While it boasts excellent built-in functionality, there are situations where custom solutions are required ...

Different Types of “Join Strategies” in “Apache Spark”

What is “Join Selection Strategy”? When “Any Type” of “Join”, like the “Left Join”, or, the “Inner Join” is “Performed” between “Two DataFrames”, “Apache Spark” “Internally” decides whic...

Introduction to “Partition” in “Apache Spark”

What is the “Importance” of “Partition”? “Apache Spark” is known for its “Speed”. The “Fast Speed” of “Computing” comes from the “Parallel Processing”. “Partition” is the “Key” for &ld...

How to Set Up Apache in a Docker Container on Ubuntu 22.04

Setting up Apache in a Docker container on Ubuntu 22.04 can be a straightforward process if you follow the step-by-step tutorial below. Docker allows you to isolate applications within containers, making it easier to manage and deploy them across different environments. Step 1: Install Docker ...

Unlocking the Power of Spark on Kubernetes with Apache YuniKorn

On October 12, 2023, a significant event took place at the LinkedIn office in Bangalore, Karnataka. The Hadoop MeetUp featured a variety of engaging talks and discussions on cutting-edge technologies. Among them, one talk that stole the spotlight was “Unlocking the Power of Spark on ...

Setting up Apache-Airflow in Windows using WSL 2

In the previous story, you learned to set up Ubuntu 20.04 on Windows 10 as Linux Subsystem Distribution. In this article, I will walk you through the installation process of Apache Airflow in WSL 2 using a virtual environment. Installation of pip on WSL 2 To set up a virt...

Real-Time Logistics, Shipping, and Transportation with Apache Kafka

Logistics is the detailed organization and implementation of a complex operation. It manages the flow of things between the point of origin and the point of consumption to meet the requirements of customers or corporations. The resources managed in logistics may include tangible goods...