Hadoop HA setup

With the advent of Hadoop’s 2.x version, there finally is a working
High-Availability solution. Even two of those. Now it really is easy to
configure and use those solutions. It no longer require external
components, like
DRBD.
It all is just neatly packed into Cloudera Hadoop distribution – the
precursor of this solution.

Read on to find out how to use it.

The most important weakness of previous Hadoop releases was the
single-point-of-failure, which happend to be NameNode. NameNode as a key
component of every Hadoop cluster, is responsible for managing
filesystem namespace information and block location. Loosing its data results in loosing all the data
stored on DataNodes. HDFS is no longer able to reach for specific files,
or its blocks. This renders your cluster inoperable.

So it is crucial to be able to detect and counter problems with NameNode.
The most desirable behavior is to have a hot backup, that would ensure
a no-downtime cluster operation. To achieve this, the second NameNode
need to have up-to-date information on filesystem metadata and it needs
to be also up and running. Starting NameNode with existing set of data
may easily take many minutes to parse the actual filesystem state.

Previously used solution – depoying SecondaryNameNode – was somewhat
flawed. It took long time to recover after failure. It was not a
hot-backup solution, which also added to the problem. Some other
solution was required.

So, what needed to be made redundant is the edits dir contents and
sending block location maps from each of the DataNodes to NameNodes –
in case of HA deployment – to both NameNodes. This was accomplished in
two steps. The first one with the release of CDH 4 beta – solution based
on sharing edits directory. Than, with CDH 4.1 came quorum based solution.

Find out how to configure those on your cluster.

Shared edits directory solution

For this kind of setup, there is an assumption, that in a cluster exists
a shared storage directory. It should be deployed using some kind of
network-based filesystem. You could try with NFS or GlusterFS.

<property>
  <name>fs.default.name/name>
  <value>hdfs://example-cluster</value>
</property>
<!-- common server name -->
<property>
  <name>dfs.nameservices</name>
  <value>example-cluster</value>
</property>

<!-- HA configuration -->
<property>
  <name>dfs.ha.namenodes.example-cluster</name>
  <value>nn1,nn2</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.example-cluster.nn1</name>
  <value>master1:8020</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.example-cluster.nn2</name>
  <value>master2:8020</value>
</property>
<property>
  <name>dfs.namenode.http-address.example-cluster.nn1</name>
  <value>0.0.0.0:50070</value>
</property>
<property>
  <name>dfs.namenode.http-address.example-cluster.nn2</name>
  <value>0.0.0.0:50070</value>
</property>

<!-- Storage for edits' files -->
<property>
  <name>dfs.namenode.shared.edits.dir</name>
  <value>file:///mnt/filer1/dfs/ha-name-dir-shared</value>
</property>

<!-- Client failover -->
<property>
  <name>dfs.client.failover.proxy.provider.example-cluster</name>
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

<!-- Fencing configuration -->
<property>
  <name>dfs.ha.fencing.methods</name>
  <value>sshfence</value>
</property>
 <property>
  <name>dfs.ha.fencing.ssh.private-key-files</name>
  <value>/home/user/.ssh/id_dsa</value>
</property>


<!-- Automatic failover configuration -->
<property>
  <name>dfs.ha.automatic-failover.enabled</name>
  <value>true</value>
</property>
<property>
  <name>ha.zookeeper.quorum</name>
  <value>zk1:2181,zk2:2181,zk3:2181</value>
</property>

This setup is quite OK, as long as you’re comfortable with maintaining a
separate service (network storage) for handling the HA state. It seems
error prone to me, because it adds another service which high
availability should be ensured. NFS seems to be a bad choice here,
because AFAIK it does not offer HA out of the box.

On the other hand, we have GlusterFS, which is a distributed filesystem,
you can deploy on multiple bricks and increase the replication level.

Nevertheless, it still brings additional burden of another service to
maintain.

Quorum based solution

With the release of CDH 4.1.0 we are now able to use a much better
integrated solution called JournalNode. Now all the updates are
synchronized through a JournalNode. Each JournalNode have the same data
and all the NameNodes are able to recive filesystem state updates from
that daemons.

This solution is much more consistent with Hadoop ecosystem.

Please note, that the config is almost identical to the one needed for
shared edits directory solution. The only difference is the value for
dfs.namenode.shared.edits.dir. This now points to all the journal
nodes deployed in our cluster.

<property>
  <name>fs.default.name/name>
  <value>hdfs://example-cluster</value>
</property>
<!-- common server name -->
<property>
  <name>dfs.nameservices</name>
  <value>example-cluster</value>
</property>

<!-- HA configuration -->
<property>
  <name>dfs.ha.namenodes.example-cluster</name>
  <value>nn1,nn2</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.example-cluster.nn1</name>
  <value>master1:8020</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.example-cluster.nn2</name>
  <value>master2:8020</value>
</property>
<property>
  <name>dfs.namenode.http-address.example-cluster.nn1</name>
  <value>0.0.0.0:50070</value>
</property>
<property>
  <name>dfs.namenode.http-address.example-cluster.nn2</name>
  <value>0.0.0.0:50070</value>
</property>

<!-- Storage for edits' files -->
<property>
  <name>dfs.namenode.shared.edits.dir</name>
  <value>qjournal://node1:8485;node2:8485;node3:8485/example-cluster</value>
</property>

<!-- Client failover -->
<property>
  <name>dfs.client.failover.proxy.provider.example-cluster</name>
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

<!-- Fencing configuration -->
<property>
  <name>dfs.ha.fencing.methods</name>
  <value>sshfence</value>
</property>
 <property>
  <name>dfs.ha.fencing.ssh.private-key-files</name>
  <value>/home/user/.ssh/id_dsa</value>
</property>


<!-- Automatic failover configuration -->
<property>
  <name>dfs.ha.automatic-failover.enabled</name>
  <value>true</value>
</property>
<property>
  <name>ha.zookeeper.quorum</name>
  <value>zk1:2181,zk2:2181,zk3:2181</value>
</property>

Infrastructure

In both cases you need to run Zookeeper-based Failover Controller
(hadoop-hdfs-zkfc). This daemon negotiates which NameNode should
become active and which standby.

But that’s not all. Depending on the way you’ve choosen to deploy HA you
need to do some other things:

Shared edits dir

With shared edits dir you need to deploy networked filesystem, and mount
it on your NameNodes. After that you can run your cluster and be happy
with your new HA.

Quroum based

For QJournal to operate you need to install one new package called
hadoop-hdfs-journalnode. This provides startup scripts for Journal
Node daemons. Choose at least three nodes that will be responsible for
handling edits state and deploy journal nodes on them.

Conclusion

Thanks to guys from Cloudera we now can use an enterprise grade High
Availability features for Hadoop. Eliminating the single point of
failure in your cluster is essential for easy maintainability of your
infrastructure.

Given the above choices, I’d suggest using QJournal setup, becasue of
its relatively small impact on the overal cluster architecture. It’s
good performance and fairly simple setup enable the users to easily
start using Hadoop in HA setup.

Are you using Hadoop with HA? What are your impressions?

You May Also Like

Spring Security by example: securing methods

This is a part of a simple Spring Security tutorial:

1. Set up and form authentication
2. User in the backend (getting logged user, authentication, testing)
3. Securing web resources
4. Securing methods
5. OpenID (login via gmail)
6. OAuth2 (login via Facebook)
7. Writing on Facebook wall with Spring Social

Securing web resources is all nice and cool, but in a well designed application it's more natural to secure methods (for example on backend facade or even domain objects). While we may get away with role-based authorization in many intranet business applications, nobody will ever handle assigning roles to users in a public, free to use Internet service. We need authorization based on rules described in our domain.

For example: there is a service AlterStory, that allows cooperative writing of stories, where one user is a director (like a movie director), deciding which chapter proposed by other authors should make it to the final story.

The method for accepting chapters, looks like this:

Read more »

Need to make a quick json fixes – JSONPath for rescue

From time to time I have a need to do some fixes in my json data. In a world of flat files I do this with grep/sed/awk tool chain. How to handle it for JSON? Searching for a solution I came across the JSONPath. It quite mature tool (from 2007) but I haven't hear about it so I decided to share my experience with others.

First of all you can try it without pain online: http://jsonpath.curiousconcept.com/. Full syntax is described at http://goessner.net/articles/JsonPath/



But also you can download python binding and run it from command line:
$ sudo apt-get install python-jsonpath-rw
$ sudo apt-get install python-setuptools
$ sudo easy_install -U jsonpath

After that you can use inside python or with simple cli wrapper:
#!/usr/bin/python
import sys, json, jsonpath

path = sys.argv[
1]

result = jsonpath.jsonpath(json.load(sys.stdin), path)
print json.dumps(result, indent=2)

… you can use it in your shell e.g. for json:
{
"store": {
"book": [
{
"category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{
"category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
},
{
"category": "fiction",
"author": "Herman Melville",
"title": "Moby Dick",
"isbn": "0-553-21311-3",
"price": 8.99
},
{
"category": "fiction",
"author": "J. R. R. Tolkien",
"title": "The Lord of the Rings",
"isbn": "0-395-19395-8",
"price": 22.99
}
],
"bicycle": {
"color": "red",
"price": 19.95
}
}
}

You can print only book nodes with price lower than 10 by:
$ jsonpath '$..book[?(@.price 

Result:
[
{
"category": "reference",
"price": 8.95,
"title": "Sayings of the Century",
"author": "Nigel Rees"
},
{
"category": "fiction",
"price": 8.99,
"title": "Moby Dick",
"isbn": "0-553-21311-3",
"author": "Herman Melville"
}
]

Have a nice JSON hacking!From time to time I have a need to do some fixes in my json data. In a world of flat files I do this with grep/sed/awk tool chain. How to handle it for JSON? Searching for a solution I came across the JSONPath. It quite mature tool (from 2007) but I haven't hear about it so I decided to share my experience with others.