What is NoSQL good for?

… or how I ended up writing a CouchDB proof of concept app?

Once upon a time I set out on a journey to discover the NoSQL land. I’ve decided that doing simple queries wouldn’t be interesting enough. That’s why I’ve chose to create an app that would be based on some NoSQL database. The main idea was to create an app, that would dynamically update itself with geographic data flowing in. Since there are myriads of geo-data that are available on the internet, you can pick your favorite one and load them into your SQL database of choice. In my case the primary source of data was a proprietary database, or more specifically – one table in it continuously updated with new data. To make that data visible on my map I needed to: * buffer the huge amount of those records – so as not to overhoul other services with large traffic, and not to flood the frontend * convert then to my representation * display them – have presentation layer in a browser – since browser-based frontend was the easiest and fastest to develop The idea of the front-end HTML page was to show new points on the map. From the moment of opening the page records that appear in database table should be shown interactively on the screen.

Toys used

For the first step I chose to use RabbitMQ broker. A queue on the broker would receive messages – one message per database table’s row. Then I’d use some simple groovy middle ware to convert the data to appropriate format and put it onto another db – this time db specific to my app. You may ask why incorporate another database. It would be good for separating environments – assuming the original data contains some vulnerable content that should be anatomised, or we just don’t feel comfortable exposing the whole database of some XYZ-system just to have access to its one table. Since for my presentation layer I chose HTML+JS without any application server-based back-end I’ve decided on CouchDB . This seemed like a perfect match for this scenario. Why? – ease of use, REST API, with JSON responses – just great for interacting with my simple front-end. The flow of things was as shown on the image below:

diagram

Avro – for the beginning

As you can see, I’ve chosen JSON as my data-format. I’ve been considering Apache Avro in the first place but using it was a real pain in the ass. Avro itself is used in Apache Hadoop as a serialization layer, so it would seem OK, but it has virtually no documentation. But once you tear through the unintuitive interface and manage to handle all those unthinkable exceptions you get a few pros for this library. It’s great in that it does not require code generation – I like it being made on the fly. It also offers sending data in binary format, which was not necessary, but never the less is a nice feature. What I certainly didn’t like about it was its orientation on the files rather than chunks of data – so it was not so obvious how should I send data through the wire. Than I found out it can produce JSON output, which would work for me, except the output could not have been parsed by other JSON libraries :) (I’ve asked on stackoverflow about that, but with no luck). If my whining haven’t put you back and still would like to see how to use Avro, try this unit test in project’s GitHub repo: AvroSimpleTest.groovy

Svenson

I’ve dropped Avro in favour of a simple JSON lib called (Svenson and that was painless. The only thing I was forced to do was create my model class in Java – the rest of the project is written in Groovy. I’ve no idea why was that necessary, and didn’t want to look into it.

RabbitMQ

Further on the way is RabbitMQ, to which records are filled by a feeding middle-ware written in Groovy. Since I use ActiveMQ on a day-to-day basis, I’ve decided to try something new. This broker is a really nice piece of software. Being written in Erlang makes it really fast. What’s more it has some extensive capabilities and is easy to approach for anyone similar with messaging (JMS and friends). For such a lightweight product it is really powerful – implements AMQP!

CouchDB

From the broker’s queue messages are again fetched by a middle-ware just to be put into CouchDB view. This database is also written in Erlang. It’s very reliable, however the way it handles refreshing view isn’t the most pleasant one – performance-wise. Word of advice – if you’re on Debian derivative, be cautious with apt-repository version. It’s rather _ancient_. Also remember to add allow_jsonp = true to you config file /opt/couchbase/etc/couchdb/local.ini. It’s not enabled by default, and not having this set would result with empty responses from the CouchDB server. The problem here is, that the browser doesn’t allow quering a web server with hostname other than the one the script originates. More on this case here. Seems like my problem could be overcame by changing url in index.html and hostname couchdb listens on to the same address. I’ve also created a view, that would expose an event by key: view code

Presenting the dots

As a back-end I’ve done some JQuery based AJAX calls – nothing too fancy. All things necessary for presentation layer are in this file.

Things to consider

Please bear in mind that this whole application is rather a playground, not a full-fledged project!! After creating all the parts I have some doubts about some architectural decisions I made. I don’t think the security have been taken into account seriously enough. Also scalability was never an issue ;-) If you have some thoughts about any of the aspects mentioned in this post, please feel free to comment or contact me directly :) And also you may try the application by yourself – it’s on GitHub.

You May Also Like

CasperJS for Java developers

Why CasperJS

Being a Java developer is kinda hard these days. Java may not be dead yet, but when keeping in sync with all the hipster JavaScript frameworks could make us feel a bit outside the playground. It’s even hard to list JavaScript frameworks with latest releases on one website.

In my current project, we are using AngularJS. It’a a nice abstraction of MV* pattern in frontend layer of any web application (we use Grails underneath). Here is a nice article with an 8-point Win List of Angular way of handling AJAX calls and updating the view. So it’s not only a funny new framework but a truly helper of keeping your code clean and neat.

But there is also another area when you can put helpful JS framework in place of plan-old-java one - functional tests. Especially when you are dealing with one page app with lots of asynchronous REST/JSON communication.

Selenium and Geb

In Java/JVM project the typical is to use Selenium with some wrapper like Geb. So you start your project, setup your CI-functional testing pipeline and… after 1 month of coding your tests stop working and being maintainable. The frameworks itselves are not bad, but the typical setup is so heavy and has so many points of failure that keeping it working in a real life project is really hard.

Here is my list of common myths about Selenium: * It allows you to record test scripts via handy GUI - maybe some static request/response sites. In modern web applications with asynchronous REST/JSON communication your tests must contain a lot of “waitFor” statements and you cannot automate where these should be included. * It allows you to test your web app against many browsers - don’t try to automate IE tests! You have to manually open your app in IE to see how it actually bahaves! * It integrates well with continuous integration servers like Jenkins - you have to setup Selenium Grid on server with X installed to run tests on Chrome or Firefox and a Windows server for IE. And the headless HtmlUnit driver lacks a lot of JS support.

So I decided to try something different and introduce a bit of JavaScript tooling in our project by using CasperJS.

Introduction

CasperJS is simple but powerful navigation scripting & testing utility for PhantomJS - scritable headless WebKit (which is an rendering engine used by Safari and Chrome). In short - CasperJS allows you to navigate and make assertions about web pages as they’d been rendered in Google Chrome. It is enough for me to automate the functional tests of my application.

If you want a gentle introduction to the world of CasperJS I suggest you to read: * Official website, especially installation guide and API * Introductionary article from CasperJS creator Nicolas Perriault * Highlevel testing with CasperJS by Kevin van Zonneveld * grails-angular-scaffolding plugin by Rob Fletcher with some working CasperJS tests

Full example

I run my test suite via following script:

casperjs test --direct --log-level=debug --testhost=localhost:8080 --includes=test/casper/includes/casper-angular.coffee,test/casper/includes/pages.coffee test/casper/specs/

casper-angular.coffe

casper.test.on "fail", (failure) ->
    casper.capture(screenshot)

testhost   = casper.cli.get "testhost"
screenshot = 'test-fail.png'

casper
    .log("Using testhost: #{testhost}", "info")
    .log("Using screenshot: #{screenshot}", "info")

casper.waitUntilVisible = (selector, message, callback) ->
    @waitFor ->
        @visible selector
    , callback, (timeout) ->
        @log("Selector [#{selector}] not visible, failing")
        withParentSelector selector, (parent) ->
            casper.log("Output of parent selector [#{parent}]")
            casper.debugHTML(parent)
        @echo message, "RED_BAR"
        @capture(screenshot)
        @test.fail(f("Wait timeout occured (%dms)", timeout))

withParentSelector = (selector, callback) ->
    if selector.lastIndexOf(" ") > 0
       parent = selector[0..selector.lastIndexOf(" ")-1]
       callback(parent)

Sample pages.coffee:

x = require('casper').selectXPath

class EditDocumentPage

    assertAt: ->
        casper.test.assertSelectorExists("div.customerAccountInfo", 'at EditDocumentPage')

    templatesTreeFirstCategory: 'ul.tree li label'
    templatesTreeFirstTemplate: 'ul.tree li a'
    closePreview: '.closePreview a'
    smallPreview: '.smallPreviewContent img'
    bigPreview: 'img.previewImage'
    confirmDelete: x("//div[@class='modal-footer']/a[1]")

casper.editDocument = new EditDocumentPage()

End a test script:

testhost = casper.cli.get "testhost" or 'localhost:8080'

casper.start "http://#{testhost}/app", ->
    @test.assertHttpStatus 302
    @test.assertUrlMatch /\/fakeLogin/, 'auto login'
    @test.assert @visible('input#Create'), 'mock login button'
    @click 'input#Create'

casper.then ->
    @test.assertUrlMatch /document#\/edit/, 'new document'
    @editDocument.assertAt()
    @waitUntilVisible @editDocument.templatesTreeFirstCategory, 'template categories not visible', ->
        @click @editDocument.templatesTreeFirstCategory
        @waitUntilVisible @editDocument.templatesTreeFirstTemplate, 'template not visible', ->
            @click @editDocument.templatesTreeFirstTemplate

casper.then ->
    @waitUntilVisible @editDocument.smallPreview, 'small preview not visible', ->
        # could be dblclick / whatever
        @mouseEvent('click', @editDocument.smallPreview)

casper.then ->
    @waitUntilVisible @editDocument.bigPreview, 'big preview should be visible', ->
        @test.assertEvalEquals ->
            $('.pageCounter').text()
        , '1/1', 'page counter should be visible'
        @click @editDocument.closePreview

casper.then ->
    @click 'button.cancel'
    @waitUntilVisible '.modal-footer', 'delete confirmation not visible', ->
        @click @editDocument.confirmDelete

casper.run ->
    @test.done()

Here is a list of CasperJS features/caveats used here:

  • Using CoffeeScript is a huge win for your test code to look neat
  • When using casper test command, beware of different (than above articles) logging setup. You can pass --direct --log-level=debug from commandline for best results. Logging is essential here since Phantom often exists without any error and you do want to know what just happened.
  • Extract your helper code into separate files and include them by using --includes switch.
  • When passing server URL as a commandline switch remember that in CoffeeScript variables are not visible between multiple source files (unless getting them via window object)
  • It’s good to override standard waitUntilVisible with capting a screenshot and making a proper log statement. In my version I also look for a parent selector and debugHTML the content of it - great for debugging what is actually rendered by the browser.
  • Selenium and Geb have a nice concept of Page Objects - an abstract models of pages rendered by your application. Using CoffeeScript you can write your own classes, bind selectors to properties and use then in your code script. Assigning the objects to casper instance will end up with quite nice syntax like @editDocument.assertAt().
  • There is some issue with CSS :first and :last selectors. I cannot get them working (but maybe I’m doing something wrong?). But in CasperJS you can also use XPath selectors which are fine for matching n-th child of some element (x("//div[@class='modal-footer']/a[1]")).
    Update: :first and :last are not CSS3 selectors, but JQuery ones. Here is a list of CSS3 selectors, all of these are supported by CasperJS. So you can use nth-child(1) is this case. Thanks Andy and Nicolas for the comments!

Working with CasperJS can lead you to a few hour stall, but after getting things working you have a new, cool tool in your box!