{"id":10656,"date":"2012-10-30T12:40:00","date_gmt":"2012-10-30T11:40:00","guid":{"rendered":"http:\/\/zygm0nt.github.com\/blog\/2012\/10\/30\/hadoop-ha"},"modified":"2023-03-22T16:34:23","modified_gmt":"2023-03-22T15:34:23","slug":"hadoop-ha-setup","status":"publish","type":"post","link":"https:\/\/touk.pl\/blog\/2012\/10\/30\/hadoop-ha-setup\/","title":{"rendered":"Hadoop HA setup"},"content":{"rendered":"<p>With the advent of Hadoop&#8217;s 2.x version, there finally is a working<br \/>\nHigh-Availability solution. Even two of those. Now it really is easy to<br \/>\nconfigure and use those solutions. It no longer require external<br \/>\ncomponents, like<br \/>\n<a href=\"http:\/\/blog.cloudera.com\/blog\/2009\/07\/hadoop-ha-configuration\/\">DRBD<\/a>.<br \/>\nIt all is just neatly packed into Cloudera Hadoop distribution &#8211; the<br \/>\nprecursor of this solution.<\/p>\n<p>Read on to find out how to use it.<\/p>\n<p><!-- more --><\/p>\n<p>The most important weakness of previous Hadoop releases was the<br \/>\nsingle-point-of-failure, which happend to be NameNode. NameNode as a key<br \/>\ncomponent of every Hadoop cluster, is responsible for managing<br \/>\nfilesystem namespace information and block location. Loosing its data results in loosing all the data<br \/>\nstored on DataNodes. HDFS is no longer able to reach for specific files,<br \/>\nor its blocks. This renders your cluster inoperable.<\/p>\n<p>So it is crucial to be able to detect and counter problems with NameNode.<br \/>\nThe most desirable behavior is to have a hot backup, that would ensure<br \/>\na no-downtime cluster operation. To achieve this, the second NameNode<br \/>\nneed to have up-to-date information on filesystem metadata and it needs<br \/>\nto be also up and running. Starting NameNode with existing set of data<br \/>\nmay easily take many minutes to parse the actual filesystem state.<\/p>\n<p>Previously used solution &#8211; depoying SecondaryNameNode &#8211; was somewhat<br \/>\nflawed. It took long time to recover after failure. It was not a<br \/>\nhot-backup solution, which also added to the problem. Some other<br \/>\nsolution was required.<\/p>\n<p>So, what needed to be made redundant is the edits dir contents and<br \/>\nsending block location maps from each of the DataNodes to NameNodes &#8211;<br \/>\nin case of HA deployment &#8211; to both NameNodes. This was accomplished in<br \/>\ntwo steps. The first one with the release of CDH 4 beta &#8211; solution based<br \/>\non sharing edits directory. Than, with CDH 4.1 came quorum based solution.<\/p>\n<p>Find out how to configure those on your cluster.<\/p>\n<h2 id=\"shared-edits-directory-solution\">Shared edits directory solution<\/h2>\n<p>For this kind of setup, there is an assumption, that in a cluster exists<br \/>\na shared storage directory. It should be deployed using some kind of<br \/>\nnetwork-based filesystem. You could try with NFS or GlusterFS.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"xml\">&lt;property&gt;\r\n  &lt;name&gt;fs.default.name\/name&gt;\r\n  &lt;value&gt;hdfs:\/\/example-cluster&lt;\/value&gt;\r\n&lt;\/property&gt;\r\n<\/pre>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"xml\">&lt;!-- common server name --&gt;\r\n&lt;property&gt;\r\n  &lt;name&gt;dfs.nameservices&lt;\/name&gt;\r\n  &lt;value&gt;example-cluster&lt;\/value&gt;\r\n&lt;\/property&gt;\r\n\r\n&lt;!-- HA configuration --&gt;\r\n&lt;property&gt;\r\n  &lt;name&gt;dfs.ha.namenodes.example-cluster&lt;\/name&gt;\r\n  &lt;value&gt;nn1,nn2&lt;\/value&gt;\r\n&lt;\/property&gt;\r\n&lt;property&gt;\r\n  &lt;name&gt;dfs.namenode.rpc-address.example-cluster.nn1&lt;\/name&gt;\r\n  &lt;value&gt;master1:8020&lt;\/value&gt;\r\n&lt;\/property&gt;\r\n&lt;property&gt;\r\n  &lt;name&gt;dfs.namenode.rpc-address.example-cluster.nn2&lt;\/name&gt;\r\n  &lt;value&gt;master2:8020&lt;\/value&gt;\r\n&lt;\/property&gt;\r\n&lt;property&gt;\r\n  &lt;name&gt;dfs.namenode.http-address.example-cluster.nn1&lt;\/name&gt;\r\n  &lt;value&gt;0.0.0.0:50070&lt;\/value&gt;\r\n&lt;\/property&gt;\r\n&lt;property&gt;\r\n  &lt;name&gt;dfs.namenode.http-address.example-cluster.nn2&lt;\/name&gt;\r\n  &lt;value&gt;0.0.0.0:50070&lt;\/value&gt;\r\n&lt;\/property&gt;\r\n\r\n&lt;!-- Storage for edits' files --&gt;\r\n&lt;property&gt;\r\n  &lt;name&gt;dfs.namenode.shared.edits.dir&lt;\/name&gt;\r\n  &lt;value&gt;file:\/\/\/mnt\/filer1\/dfs\/ha-name-dir-shared&lt;\/value&gt;\r\n&lt;\/property&gt;\r\n\r\n&lt;!-- Client failover --&gt;\r\n&lt;property&gt;\r\n  &lt;name&gt;dfs.client.failover.proxy.provider.example-cluster&lt;\/name&gt;\r\n  &lt;value&gt;org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider&lt;\/value&gt;\r\n&lt;\/property&gt;\r\n\r\n&lt;!-- Fencing configuration --&gt;\r\n&lt;property&gt;\r\n  &lt;name&gt;dfs.ha.fencing.methods&lt;\/name&gt;\r\n  &lt;value&gt;sshfence&lt;\/value&gt;\r\n&lt;\/property&gt;\r\n &lt;property&gt;\r\n  &lt;name&gt;dfs.ha.fencing.ssh.private-key-files&lt;\/name&gt;\r\n  &lt;value&gt;\/home\/user\/.ssh\/id_dsa&lt;\/value&gt;\r\n&lt;\/property&gt;\r\n\r\n\r\n&lt;!-- Automatic failover configuration --&gt;\r\n&lt;property&gt;\r\n  &lt;name&gt;dfs.ha.automatic-failover.enabled&lt;\/name&gt;\r\n  &lt;value&gt;true&lt;\/value&gt;\r\n&lt;\/property&gt;\r\n&lt;property&gt;\r\n  &lt;name&gt;ha.zookeeper.quorum&lt;\/name&gt;\r\n  &lt;value&gt;zk1:2181,zk2:2181,zk3:2181&lt;\/value&gt;\r\n&lt;\/property&gt;\r\n<\/pre>\n<p>This setup is quite OK, as long as you&#8217;re comfortable with maintaining a<br \/>\nseparate service (network storage) for handling the HA state. It seems<br \/>\nerror prone to me, because it adds another service which high<br \/>\navailability should be ensured. NFS seems to be a bad choice here,<br \/>\nbecause AFAIK it does not offer HA out of the box.<\/p>\n<p>On the other hand, we have GlusterFS, which is a distributed filesystem,<br \/>\nyou can deploy on multiple bricks and increase the replication level.<\/p>\n<p>Nevertheless, it still brings additional burden of another service to<br \/>\nmaintain.<\/p>\n<h2 id=\"quorum-based-solution\">Quorum based solution<\/h2>\n<p>With the release of CDH 4.1.0 we are now able to use a much better<br \/>\nintegrated solution called JournalNode. Now all the updates are<br \/>\nsynchronized through a JournalNode. Each JournalNode have the same data<br \/>\nand all the NameNodes are able to recive filesystem state updates from<br \/>\nthat daemons.<\/p>\n<p>This solution is much more consistent with Hadoop ecosystem.<\/p>\n<p>Please note, that the config is almost identical to the one needed for<br \/>\nshared edits directory solution. The only difference is the value for<br \/>\n<em>dfs.namenode.shared.edits.dir<\/em>. This now points to all the journal<br \/>\nnodes deployed in our cluster.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"xml\">&lt;property&gt;\r\n  &lt;name&gt;fs.default.name\/name&gt;\r\n  &lt;value&gt;hdfs:\/\/example-cluster&lt;\/value&gt;\r\n&lt;\/property&gt;<\/pre>\n<\/div>\n<div>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"xml\">&lt;!-- common server name --&gt;\r\n&lt;property&gt;\r\n  &lt;name&gt;dfs.nameservices&lt;\/name&gt;\r\n  &lt;value&gt;example-cluster&lt;\/value&gt;\r\n&lt;\/property&gt;\r\n\r\n&lt;!-- HA configuration --&gt;\r\n&lt;property&gt;\r\n  &lt;name&gt;dfs.ha.namenodes.example-cluster&lt;\/name&gt;\r\n  &lt;value&gt;nn1,nn2&lt;\/value&gt;\r\n&lt;\/property&gt;\r\n&lt;property&gt;\r\n  &lt;name&gt;dfs.namenode.rpc-address.example-cluster.nn1&lt;\/name&gt;\r\n  &lt;value&gt;master1:8020&lt;\/value&gt;\r\n&lt;\/property&gt;\r\n&lt;property&gt;\r\n  &lt;name&gt;dfs.namenode.rpc-address.example-cluster.nn2&lt;\/name&gt;\r\n  &lt;value&gt;master2:8020&lt;\/value&gt;\r\n&lt;\/property&gt;\r\n&lt;property&gt;\r\n  &lt;name&gt;dfs.namenode.http-address.example-cluster.nn1&lt;\/name&gt;\r\n  &lt;value&gt;0.0.0.0:50070&lt;\/value&gt;\r\n&lt;\/property&gt;\r\n&lt;property&gt;\r\n  &lt;name&gt;dfs.namenode.http-address.example-cluster.nn2&lt;\/name&gt;\r\n  &lt;value&gt;0.0.0.0:50070&lt;\/value&gt;\r\n&lt;\/property&gt;\r\n\r\n&lt;!-- Storage for edits' files --&gt;\r\n&lt;property&gt;\r\n  &lt;name&gt;dfs.namenode.shared.edits.dir&lt;\/name&gt;\r\n  &lt;value&gt;qjournal:\/\/node1:8485;node2:8485;node3:8485\/example-cluster&lt;\/value&gt;\r\n&lt;\/property&gt;\r\n\r\n&lt;!-- Client failover --&gt;\r\n&lt;property&gt;\r\n  &lt;name&gt;dfs.client.failover.proxy.provider.example-cluster&lt;\/name&gt;\r\n  &lt;value&gt;org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider&lt;\/value&gt;\r\n&lt;\/property&gt;\r\n\r\n&lt;!-- Fencing configuration --&gt;\r\n&lt;property&gt;\r\n  &lt;name&gt;dfs.ha.fencing.methods&lt;\/name&gt;\r\n  &lt;value&gt;sshfence&lt;\/value&gt;\r\n&lt;\/property&gt;\r\n &lt;property&gt;\r\n  &lt;name&gt;dfs.ha.fencing.ssh.private-key-files&lt;\/name&gt;\r\n  &lt;value&gt;\/home\/user\/.ssh\/id_dsa&lt;\/value&gt;\r\n&lt;\/property&gt;\r\n\r\n\r\n&lt;!-- Automatic failover configuration --&gt;\r\n&lt;property&gt;\r\n  &lt;name&gt;dfs.ha.automatic-failover.enabled&lt;\/name&gt;\r\n  &lt;value&gt;true&lt;\/value&gt;\r\n&lt;\/property&gt;\r\n&lt;property&gt;\r\n  &lt;name&gt;ha.zookeeper.quorum&lt;\/name&gt;\r\n  &lt;value&gt;zk1:2181,zk2:2181,zk3:2181&lt;\/value&gt;\r\n&lt;\/property&gt;<\/pre>\n<\/div>\n<h1 id=\"infrastructure\">Infrastructure<\/h1>\n<p>In both cases you need to run Zookeeper-based Failover Controller<br \/>\n(<em>hadoop-hdfs-zkfc<\/em>). This daemon negotiates which NameNode should<br \/>\nbecome active and which standby.<\/p>\n<p>But that&#8217;s not all. Depending on the way you&#8217;ve choosen to deploy HA you<br \/>\nneed to do some other things:<\/p>\n<h2 id=\"shared-edits-dir\">Shared edits dir<\/h2>\n<p>With shared edits dir you need to deploy networked filesystem, and mount<br \/>\nit on your NameNodes. After that you can run your cluster and be happy<br \/>\nwith your new HA.<\/p>\n<h2 id=\"quroum-based\">Quroum based<\/h2>\n<p>For QJournal to operate you need to install one new package called<br \/>\n<em>hadoop-hdfs-journalnode<\/em>. This provides startup scripts for Journal<br \/>\nNode daemons. Choose at least three nodes that will be responsible for<br \/>\nhandling edits state and deploy journal nodes on them.<\/p>\n<h1 id=\"conclusion\">Conclusion<\/h1>\n<p>Thanks to guys from Cloudera we now can use an enterprise grade High<br \/>\nAvailability features for Hadoop. Eliminating the single point of<br \/>\nfailure in your cluster is essential for easy maintainability of your<br \/>\ninfrastructure.<\/p>\n<p>Given the above choices, I&#8217;d suggest using QJournal setup, becasue of<br \/>\nits relatively small impact on the overal cluster architecture. It&#8217;s<br \/>\ngood performance and fairly simple setup enable the users to easily<br \/>\nstart using Hadoop in HA setup.<\/p>\n<p>Are you using Hadoop with HA? What are your impressions?<\/p>\n","protected":false},"excerpt":{"rendered":"With the advent of Hadoop&#8217;s 2.x version, there finally is a working High-Availability solution. Even two of those.&hellip;\n","protected":false},"author":11,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11],"tags":[],"_links":{"self":[{"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/posts\/10656"}],"collection":[{"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/users\/11"}],"replies":[{"embeddable":true,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/comments?post=10656"}],"version-history":[{"count":9,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/posts\/10656\/revisions"}],"predecessor-version":[{"id":15544,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/posts\/10656\/revisions\/15544"}],"wp:attachment":[{"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/media?parent=10656"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/categories?post=10656"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/tags?post=10656"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}