Sign in

With Cloudera stopping free tier version of Big Data solutions, there is big void, specially for those who are looking for free or open source solutions.

Over the next few articles, I will share details about Apache Hadoop, how we can create a production ready cluster and how we can integrate Spark, Hive and Sqoop to handle over 90% of daily workload.

First section contains basics of Apache Hadoop, Architecture, How it works and What are some of the important daemons and processes.

Hadoop is a distributed system for storage and processing. In a HA cluster, two or more separate…


In the previous section, I shared steps to create a high availability production ready Apache Hadoop cluster with HDFS and Yarn. Follow Big Data Solutions using Apache Hadoop with Spark, Hive and Sqoop (2 of 3) for steps to configure Apache Hadoop cluster.

In this section, we will continue our cluster setup with Spark, Hive and Sqoop integration. Spark, Hive and Sqoop are some of the standard add-ons to Apache Hadoop that are needed and can handle 90% of daily workloads.

Spark is used for processing and transforming data, Hive facilitates data stored in HDFS in traditional SQL like data…


In the previous section, I explained about basics of Apache Hadoop, Architecture, How it works and What are some of the important daemons and processes. Follow Big Data Solutions using Apache Hadoop with Spark, Hive and Sqoop (1 of 3) for more details and to understand about abbreviations used.

In this section, we will continue with our cluster setup with a 2+ 3 layout. Two Master node running NN, JN, ZKFC, RM, JHS with Spark, Hive and Sqoop. First Slave node with JN, DN and NM and other Slave node(s) with DN and NM. We will use Apache Hadoop 3.2.1…

Bakul Gupta

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store