Hadoop Interview Questions

Posted 1 CommentPosted in Hadoop, Interview questions

Q : What is a Namenode? A : Namenode is the master node on which Resource manager runs. It contains metadata about data present in datanodes. It maintains and manages the blocks which are present on the datanodes. It is a high-availability machine and single point of failure in HDFS.   Q : Is Secondary Name […]

MapReduce Hello World (Part 2)

Posted 1 CommentPosted in Hadoop

  In this post, we will write the Word count program in Java. We explained the logic of this program in MapReduce Hello World (Part 1).   Before writing the program , here is the data type differences between Java and MapReduce: – Equivalent of int in MapReduce is IntWritable – Equivalent of String is […]

MapReduce Hello World (Part 1)

Posted 2 CommentsPosted in Hadoop, Hadoop

In this post, we will do the following : 1) Understand MapReduce basics 2) Write a word count program in Map Reduce   This is also considered as the Hello World program in MapReduce programming.   What is MapReduce ?   MapReduce is the ‘heart‘ of Hadoop that consists of two parts – ‘map’ and […]

Pig Tutorials/concepts

Posted Posted in Hadoop, Pig

What is Pig ? Pig is an engine for running data flows in Hadoop. Pig uses a language called Pig Latin. Pig runs on Hadoop. It uses HDFS for storage and internally uses MapReduce for data flow operations. Examples of pig operations include join, filter, group by, order by etc.   Why Pig ? Does […]

MapReduce design patterns

Posted 2 CommentsPosted in Hadoop, Hadoop

This post can help with the following : Provide an introduction to MapReduce design patterns Explain MapReduce Design Pattern concepts   Here are the categories of MapReduce design patterns : 1) Summarization pattern 2) Filtering pattern 3) Data Organization pattern 4) Join pattern 5) Meta pattern 6) Input Output pattern   Here is an introduction […]

Hadoop Basics

Posted 1 CommentPosted in Hadoop, Hadoop

This post provides an introduction to following concepts : Hadoop Basics What is HDFS ? What is YARN ?   Lets start with the simplest question first. What is Big Data ? Big data is a term coined for huge volume of data(in terrabytes or petabytes) that is difficult to manage using traditional DBMS.   Here are […]