[:en] Operational problems with Zookeeper

This post is a summary of what has been presented by Kathleen Ting on StrangeLoop conference. You can watch the original here: http://www.infoq.com/presentations/Misconfiguration-ZooKeeper I’ve decided to put this selection here for quick reference. …This post is a summary of what has been presented by Kathleen Ting on StrangeLoop conference. You can watch the original here: http://www.infoq.com/presentations/Misconfiguration-ZooKeeper I’ve decided to put this selection here for quick reference. …

This post is a summary of what has been presented by Kathleen Ting on
StrangeLoop conference. You can watch the original here:
http://www.infoq.com/presentations/Misconfiguration-ZooKeeper

I’ve decided to put this selection here for quick reference.

Connection mismanagement

  • too many connections
    WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@247] - Too many connections from /xx.x.xx.xxx - max is 60
    
  • running out of ZK connections?
    • set maxClientCnxns=200 in zoo.cfg
  • HBase client leaking connections?
    • fixed in HBASE-3777, HBASE-4773, HBASE-5466
    • manually close connections
  • connection closes prematurely
    ERROR: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to connect to ZooKeeper but the connection closes immediately.
  • in hbase-site.xml set hbase.zookeeper.recoverable.waittime=30000ms
  • pig hangs connecting to HBase

    CAUSE: location of ZK quorum is not known to Pig

    • use Pig 10, which includes PIG-2115
    • if there is an overlap between TaskTrackers and ZK quorum nodes
      • set hbase.zookeeper.quorum to final in hbase-site.xml
      • otherwise add hbaze.zoopeeker.quorum=hadoophbasemaster.lan:2181 in pig.properties
WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectionException: Connection refused!

Time mismanagement

  • client session timed out
INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session <id>, timeout of 40000ms exceeded
    • ZK and HBase need the same session timeout values
      • zoo.cfg: maxSession=Timeout=180000
      • hbase-site.xml: zookeeper.session.timeout=180000
    • don’t co-locate ZK with IO-intense DataNode or RegionServer
    • specify right amount of heap and tune GC flags
      • turn on parallel/CMS/incremental GC
  •  
  • clients lose connections
    WARN org.apache.zookeeper.ClientCnxn - Session <id> for server <name>, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Broken pipe
    
    • don’t use SSD drive for ZK transaction log

Disk management

  • unable to load database – unable to run quorum server
    FATAL Unable to load database on disk !  java.io.IOException: Failed to process transaction type: 2 error: KeeperErrorCode = NoNode for <file> at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:152)!
    
    • archive and wipe /var/zookeeper/version-2 if other two ZK servers
      are running
  • unable to load database – unreasonable length exception
    FATAL Unable to load database on disk java.io.IOException: Unreasonable length = 1048583 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
    • server allows a client to set data larger than the server can read from disk
    • if a znode is not readable, increase jute.maxbuffer
      • look for "Packet len <xx> is out of range" in the client log
      • increase it by 20%
      • set in JVMFLAGS="-Djute.maxbuffer=yy" bin/zkCli.sh
      • fixed in ZOOKEEPER-151
  • failure to follow leader
    WARN org.apache.zookeeper.server.quorum.Learner: Exception when following the leader java.net.SocketTimeoutException: Read timed out

    CAUSE:

    • disk IO contention, network issues
    • ZK snapshot is too large (lots of ZK nodes)

    SOLVE:

    • reduce IO contention by putting dataDir on dedicated spindle
    • increase initLimit on all ZK servers and restart, see
      ZOOKEEPER-1521
    • monitor network

Best Practices

DOs

  • separate spindles for dataDir & dataLogDir
  • allocate 3 or 5 ZK servers
  • tune garbage collection
  • run zkCleanup.sh script via cron

DON’Ts

  • dont’ co-locate ZK with I/O intense DataNode or RegionServer
  • don’t use SSD drive for ZK transaction log

You may use Zookeeper as an observer – a non-voting member:

  • in zoo.cfg
    peerType=observer
You May Also Like

Simple HBase ORM

When dealing with data stored in HBase, you are quick to come to a conclusion, that it is extremaly inconvenient to reach to it via HBase native API. It is very verbose and you always need to convert between bytes and simple types - a pain. While I wa...When dealing with data stored in HBase, you are quick to come to a conclusion, that it is extremaly inconvenient to reach to it via HBase native API. It is very verbose and you always need to convert between bytes and simple types - a pain. While I wa...

Thought static method can’t be easy to mock, stub nor track? Wrong!

No matter why, no matter is it a good idea. Sometimes one just wants to check or it's necessary to be done. Mock a static method, woot? Impossibru!

In pure Java world it is still a struggle. But Groovy allows you to do that really simple. Well, not groovy alone, but with a great support of Spock.

Lets move on straight to the example. To catch some context we have an abstract for the example needs. A marketing project with a set of offers. One to many.

import spock.lang.Specification

class OfferFacadeSpec extends Specification {

    OfferFacade facade = new OfferFacade()

    def setup() {
        GroovyMock(Project, global: true)
    }

    def 'delegates an add offer call to the domain with proper params'() {
        given:
            Map params = [projId: projectId, name: offerName]

        when:
            Offer returnedOffer = facade.add(params)

        then:
            1 * Project.addOffer(projectId, _) >> { projId, offer -> offer }
            returnedOffer.name == params.name

        where:
            projectId | offerName
            1         | 'an Offer'
            15        | 'whasup!?'
            123       | 'doskonała oferta - kup teraz!'
    }
}
So we test a facade responsible for handling "add offer to the project" call triggered  somewhere in a GUI.
We want to ensure that static method Project.addOffer(long, Offer) will receive correct params when java.util.Map with user form input comes to the facade.add(params).
This is unit test, so how Project.addOffer() works is out of scope. Thus we want to stub it.

The most important is a GroovyMock(Project, global: true) statement.
What it does is modifing Project class to behave like a Spock's mock. 
GroovyMock() itself is a method inherited from SpecificationThe global flag is necessary to enable mocking static methods.
However when one comes to the need of mocking static method, author of Spock Framework advice to consider redesigning of implementation. It's not a bad advice, I must say.

Another important thing are assertions at then: block. First one checks an interaction, if the Project.addOffer() method was called exactly once, with a 1st argument equal to the projectId and some other param (we don't have an object instance yet to assert anything about it).
Right shit operator leads us to the stub which replaces original method implementation by such statement.
As a good stub it does nothing. The original method definition has return type Offer. The stub needs to do the same. So an offer passed as the 2nd argument is just returned.
Thanks to this we can assert about name property if it's equal with the value from params. If no return was designed the name could be checked inside the stub Closure, prefixed with an assert keyword.

Worth of  mentioning is that if you want to track interactions of original static method implementation without replacing it, then you should try using GroovySpy instead of GroovyMock.

Unfortunately static methods declared at Java object can't be treated in such ways. Though regular mocks and whole goodness of Spock can be used to test pure Java code, which is awesome anyway :)No matter why, no matter is it a good idea. Sometimes one just wants to check or it's necessary to be done. Mock a static method, woot? Impossibru!

In pure Java world it is still a struggle. But Groovy allows you to do that really simple. Well, not groovy alone, but with a great support of Spock.

Lets move on straight to the example. To catch some context we have an abstract for the example needs. A marketing project with a set of offers. One to many.

import spock.lang.Specification

class OfferFacadeSpec extends Specification {

    OfferFacade facade = new OfferFacade()

    def setup() {
        GroovyMock(Project, global: true)
    }

    def 'delegates an add offer call to the domain with proper params'() {
        given:
            Map params = [projId: projectId, name: offerName]

        when:
            Offer returnedOffer = facade.add(params)

        then:
            1 * Project.addOffer(projectId, _) >> { projId, offer -> offer }
            returnedOffer.name == params.name

        where:
            projectId | offerName
            1         | 'an Offer'
            15        | 'whasup!?'
            123       | 'doskonała oferta - kup teraz!'
    }
}
So we test a facade responsible for handling "add offer to the project" call triggered  somewhere in a GUI.
We want to ensure that static method Project.addOffer(long, Offer) will receive correct params when java.util.Map with user form input comes to the facade.add(params).
This is unit test, so how Project.addOffer() works is out of scope. Thus we want to stub it.

The most important is a GroovyMock(Project, global: true) statement.
What it does is modifing Project class to behave like a Spock's mock. 
GroovyMock() itself is a method inherited from SpecificationThe global flag is necessary to enable mocking static methods.
However when one comes to the need of mocking static method, author of Spock Framework advice to consider redesigning of implementation. It's not a bad advice, I must say.

Another important thing are assertions at then: block. First one checks an interaction, if the Project.addOffer() method was called exactly once, with a 1st argument equal to the projectId and some other param (we don't have an object instance yet to assert anything about it).
Right shit operator leads us to the stub which replaces original method implementation by such statement.
As a good stub it does nothing. The original method definition has return type Offer. The stub needs to do the same. So an offer passed as the 2nd argument is just returned.
Thanks to this we can assert about name property if it's equal with the value from params. If no return was designed the name could be checked inside the stub Closure, prefixed with an assert keyword.

Worth of  mentioning is that if you want to track interactions of original static method implementation without replacing it, then you should try using GroovySpy instead of GroovyMock.

Unfortunately static methods declared at Java object can't be treated in such ways. Though regular mocks and whole goodness of Spock can be used to test pure Java code, which is awesome anyway :)

Super Confitura Man

How Super Confitura Man came to be :)

Recently at TouK we had a one-day hackathon. There was no main theme for it, you just could post a project idea, gather people around it and hack on that idea for a whole day - drinks and pizza included.

My main idea was to create something that could be fun to build and be useful somehow to others. I’d figured out that since Confitura was just around a corner I could make a game, that would be playable at TouK’s booth at the conference venue. This idea seemed good enough to attract Rafał Nowak @RNowak3 and Marcin Jasion @marcinjasion - two TouK employees, that with me formed a team for the hackathon.

Confitura 01

The initial plan was to develop a simple mario-style game, with preceduraly generated levels, random collectible items and enemies. One of the ideas was to introduce Confitura Man as the main character, but due to time constraints, this fall through. We’ve decided to just choose a random available sprite for a character - hence the onion man :)

Confitura 02

How the game is played?

Since we wanted to have a scoreboard and have unique users, we’ve printed out QR codes. A person that would like to play the game could pick up a QR code, show it against a camera attached to the play booth. The start page scanned the QR code and launched the game with username read from paper code.

The rest of the game was playable with gamepad or keyboard.

Confitura game screen

Technicalities

Writing a game takes a lot of time and effort. We wanted to deliver, so we’ve decided to spend some time in the days before the hackathon just to bootstrap the technology stack of our enterprise.

We’ve decided that the game would be written in some Javascript based engine, with Google Chrome as a web platform. There are a lot of HTML5 game engines - list of html5 game engines and you could easily create a game with each and every of them. We’ve decided to use Phaser IO which handles a lot of difficult, game-related stuff on its own. So, we didn’t have to worry about physics, loading and storing assets, animations, object collisions, controls input/output. Go see for yourself, it is really nice and easy to use.

Scoreboard would be a rip-off from JIRA Survivor with stats being served from some web server app. To make things harder, the backend server was written in Clojure. With no experience in that language in the team, it was a bit risky, but the tasks of the server were trivial, so if all that clojure effort failed, it could be rewritten in something we know.

Statistics

During the whole Confitura day there were 69 unique players (69 QR codes were used), and 1237 games were played. The final score looked like this:

  1. Barister Lingerie 158 - 1450 points
  2. Boilerdang Custardbath 386 - 1060 points
  3. Benadryl Clarytin 306 - 870 points

And the obligatory scoreboard screenshot:

Confitura 03

Obstacles

The game, being created in just one day, had to have problems :) It wasn’t play tested enough, there were some rough edges. During the day we had to make a few fixes:

  • the server did not respect the highest score by specific user, it was just overwritting a user’s score with it’s latest one,
  • there was one feature not supported on keyboard, that was available on gamepad - turbo button
  • server was opening a database connection each time it got a request, so after around 5 minutes it would exhaust open file limit for MongoDB (backend database), this was easily fixed - thou the fix is a bit hackish :)

These were easily identified and fixed. Unfortunately there were issues that we were unable to fix while the event was on:

  • google chrome kept asking for the permission to use webcam - this was very annoying, and all the info found on the web did not work - StackOverflow thread
  • it was hard to start the game with QR code - either the codes were too small, or the lighting around that area was inappropriate - I think this issue could be fixed by printing larger codes,

Technology evaluation

All in all we were pretty happy with the chosen stack. Phaser was easy to use and left us with just the fun parts of the game creation process. Finding the right graphics with appropriate licensing was rather hard. We didn’t have enough time to polish all the visual aspects of the game before Confitura.

Writing a server in clojure was the most challenging part, with all the new syntax and new libraries. There were tasks, trivial in java/scala, but hard in Clojure - at least for a whimpy beginners :) Nevertheless Clojure seems like a really handy tool and I’d like to dive deeper into its ecosystem.

Source code

All of the sources for the game can be found here TouK/confitura-man.

The repository is split into two parts:

  • game - HTML5 game
  • server - clojure based backend server

To run the server you need to have a local MongoDB installation. Than in server’s directory run: $ lein ring server-headless This will start a server on http://localhost:3000

To run the game you need to install dependencies with bower and than run $ grunt from game’s directory.

To launch the QR reading part of the game, you enter http://localhost:9000/start.html. After scanning the code you’ll be redirected to http://localhost:9000/index.html - and the game starts.

Conclusion

Summing up, it was a great experience creating the game. It was fun to watch people playing the game. And even with all those glitches and stupid graphics, there were people vigorously playing it, which was awesome.

Thanks to Rafał and Michał for great coding experience, and thanks to all the players of our stupid little game. If you’d like to ask me about anything - feel free to contact me by mail or twitter @zygm0nt

Recently at TouK we had a one-day hackathon. There was no main theme for it, you just could post a project idea, gather people around it and hack on that idea for a whole day - drinks and pizza included.

My main idea was to create something that could be fun to build and be useful somehow to others. I’d figured out that since Confitura was just around a corner I could make a game, that would be playable at TouK’s booth at the conference venue. This idea seemed good enough to attract >Conclusion