Tuesday, September 10, 2013

HDFS 2 - Hadoop 2.x

Different points captured about the next version of HDFS - talk/meeting at Hortonworks

What high availability (HA) means in Hadoop 1.x vs 2.x

In 1.x, HA is implemented by:
- Linux HA
- Shared storage between NN instances.

In 2.x for HA you do not need a shared storage any more.
Nodes are journaled on a disk - any disk: RM, NN active, NN stand by, even DN (although not recommended).

New HDFS features:
-Write pipeline, append mode
- Ability to understand / take advantage of SSD's ; exposed at the app level.
- Removed the 400 M naming space of Hadoop 1.x in the NN, via the NN federation.
- Block management pool - will be moved to the DN in the next 2.x iteration.
- Snapshots. These will be stored in HDFS, in the same system.
- Short circuit reads : going to the local disk directly for faster response.
- Use of NFS v4 - no gateway
- n + k fail-over.
- Use of Protocol buffers (also implemented in next version of HBase). Will replace transparently Writable interface for serialization.
- Stinger / Tez initiative.


  1. There are lots of information about latest technology and how to get trained in them, like Hadoop training institutes in chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Hadoop Training in Chennai). By the way you are running a great blog. Thanks for sharing this.

    Big Data Hadoop Training in Chennai | Hadoop Course in Chennai

  2. Managing a business data is not an easy thing, it is very complex process to handle the corporate information both Hadoop and cognos doing this in a easy manner with help of business software suite, thanks for sharing this useful post….
    cognos Training in Chennai|cognos Training Chennai|cognos Training


Note: Only a member of this blog may post a comment.