HBase Basics

Difference between HBase and RDBMS?

Column Oriented Row Oriented
Flexible schema, add columns on the fly Fixed Schema
Good with sparse table Not optimized for sparse table
Join using MapReduce not optimized Not Applicable
Horizontal Scalability (Add hardware) Hard to shard and scale
Good for structured and semi structured data Good for structured data

When to use HBase?

  • High volume data to be stored
  • column oriented data
  • Unstructured
  • High Scalability
  • Versioned Data
  • Generating data from an MR work flow

When not to use HBase?

  • When you have only few thousand/million rows
  • Lacks RDBMS commands
  • When you have hardware less than 5 Data nodes


