Tuesday, September 10, 2013

HDFS 2 - Hadoop 2.x

Different points captured about the next version of HDFS - talk/meeting at Hortonworks


What high availability (HA) means in Hadoop 1.x vs 2.x

In 1.x, HA is implemented by:
- Linux HA
- Shared storage between NN instances.

In 2.x for HA you do not need a shared storage any more.
Nodes are journaled on a disk - any disk: RM, NN active, NN stand by, even DN (although not recommended).

New HDFS features:
-Write pipeline, append mode
- Ability to understand / take advantage of SSD's ; exposed at the app level.
- Removed the 400 M naming space of Hadoop 1.x in the NN, via the NN federation.
- Block management pool - will be moved to the DN in the next 2.x iteration.
- Snapshots. These will be stored in HDFS, in the same system.
- Short circuit reads : going to the local disk directly for faster response.
- Use of NFS v4 - no gateway
- n + k fail-over.
- Use of Protocol buffers (also implemented in next version of HBase). Will replace transparently Writable interface for serialization.
- Stinger / Tez initiative.

2 comments:

  1. There are lots of information about latest technology and how to get trained in them, like Hadoop training institutes in chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Hadoop Training in Chennai). By the way you are running a great blog. Thanks for sharing this.

    Big Data Hadoop Training in Chennai | Hadoop Course in Chennai

    ReplyDelete
  2. Managing a business data is not an easy thing, it is very complex process to handle the corporate information both Hadoop and cognos doing this in a easy manner with help of business software suite, thanks for sharing this useful post….
    Regards,
    cognos Training in Chennai|cognos Training Chennai|cognos Training

    ReplyDelete

Note: Only a member of this blog may post a comment.