HadoopHive

Hive Tutorial

What is Hive?

  • Hive is a datawarehousing package built on top of Hadoop
  • Mainly used for data analysis
  • For managing and querying structured data
  • No need to learn Java and APIs and targeted towards SQL developers
  • Similar to SQL and known as HiveQL
  • Allows programmers to plug in custom mappers and reducers
  • Provides tool to enable easy data ETL

Where to use Hive?

  • Log Processing
  • Data mining
  • Document Indexing
  • Client facing BI
  • Hypothesis Testing

Why to use Hive when Pig is there?

Hive Pig
HiveQL is used pigLatin is used
Used by Analyst to generate daily report Used by programmer and researchers
Declarative language like SQL Procedural data flow language
Select * from <table_name>; X= load ‘mytestdata’;
Dump X;
SQL like language PigLatin language
Supports Explicit Schema Supports Implicit Schema
Supports Partition Does not support Partition
Supports Web Interface Does not support Web Interface

Hive Architecture:

HiveArchitecture-2

Hive Components:

  • Shell
  • Metastore
  • Driver
  • Compiler
  • Execution Engine

Limitations of Hive:

  • Hive is not designed for Online transaction processing
  • Hive does not offer row level queries and row level updates
  • Latency for Hive queries are very high

Features of Hive Query Language:

  • Filter rows using where clause
  • Store results of a query in another table
  • Able to manage tables and partitions
  • Store the result of a query in Hadoop DFS
  • Ability to do equijoin

Hive supports below primitive/complex types.

  • Primitive Types
    • Boolean
    • Integer
    • Float/Double
    • String type
  • Composite Type
    • Struts
    • Maps
    • Arrays

Hive data models:

  • Database: Namespace
  • Table: Schemas in namespace
  • Partitions: How data is stored in HDFS
  • Buckets or clusters: Partitions are divided in to buckets

Create Database and Use database:

> Create Database &lt;database_name&gt;;

>use &lt;database_name&gt;;

Create Table:

>

Create table &lt;table_name&gt;(col_name1 datatype,col_name2 datatype..)
  row format delimited fields terminated by ',' stored as textfile;
  

External Table:

  • Create table in another HDFS location. Hive does not delete the table even when the tables are dropped

Syntax:

CREATE EXTERNAL TABLE &lt;TABLE_NAME&gt;(&lt;HDFS_LOC&gt; STRING) LOCATION '/USER/ROOT/EXTERNAL_TABLE';

© 2015, www.techkatak.com. All rights reserved.