What is NoSQL good for?

… or how I ended up writing a CouchDB proof of concept app?

Once upon a time I set out on a journey to discover the NoSQL land. I’ve decided that doing simple queries wouldn’t be interesting enough. That’s why I’ve chose to create an app that would be based on some NoSQL database. The main idea was to create an app, that would dynamically update itself with geographic data flowing in. Since there are myriads of geo-data that are available on the internet, you can pick your favorite one and load them into your SQL database of choice. In my case the primary source of data was a proprietary database, or more specifically – one table in it continuously updated with new data. To make that data visible on my map I needed to: * buffer the huge amount of those records – so as not to overhoul other services with large traffic, and not to flood the frontend * convert then to my representation * display them – have presentation layer in a browser – since browser-based frontend was the easiest and fastest to develop The idea of the front-end HTML page was to show new points on the map. From the moment of opening the page records that appear in database table should be shown interactively on the screen.

Toys used

For the first step I chose to use RabbitMQ broker. A queue on the broker would receive messages – one message per database table’s row. Then I’d use some simple groovy middle ware to convert the data to appropriate format and put it onto another db – this time db specific to my app. You may ask why incorporate another database. It would be good for separating environments – assuming the original data contains some vulnerable content that should be anatomised, or we just don’t feel comfortable exposing the whole database of some XYZ-system just to have access to its one table. Since for my presentation layer I chose HTML+JS without any application server-based back-end I’ve decided on CouchDB . This seemed like a perfect match for this scenario. Why? – ease of use, REST API, with JSON responses – just great for interacting with my simple front-end. The flow of things was as shown on the image below:

diagram

Avro – for the beginning

As you can see, I’ve chosen JSON as my data-format. I’ve been considering Apache Avro in the first place but using it was a real pain in the ass. Avro itself is used in Apache Hadoop as a serialization layer, so it would seem OK, but it has virtually no documentation. But once you tear through the unintuitive interface and manage to handle all those unthinkable exceptions you get a few pros for this library. It’s great in that it does not require code generation – I like it being made on the fly. It also offers sending data in binary format, which was not necessary, but never the less is a nice feature. What I certainly didn’t like about it was its orientation on the files rather than chunks of data – so it was not so obvious how should I send data through the wire. Than I found out it can produce JSON output, which would work for me, except the output could not have been parsed by other JSON libraries :) (I’ve asked on stackoverflow about that, but with no luck). If my whining haven’t put you back and still would like to see how to use Avro, try this unit test in project’s GitHub repo: AvroSimpleTest.groovy

Svenson

I’ve dropped Avro in favour of a simple JSON lib called (Svenson and that was painless. The only thing I was forced to do was create my model class in Java – the rest of the project is written in Groovy. I’ve no idea why was that necessary, and didn’t want to look into it.

RabbitMQ

Further on the way is RabbitMQ, to which records are filled by a feeding middle-ware written in Groovy. Since I use ActiveMQ on a day-to-day basis, I’ve decided to try something new. This broker is a really nice piece of software. Being written in Erlang makes it really fast. What’s more it has some extensive capabilities and is easy to approach for anyone similar with messaging (JMS and friends). For such a lightweight product it is really powerful – implements AMQP!

CouchDB

From the broker’s queue messages are again fetched by a middle-ware just to be put into CouchDB view. This database is also written in Erlang. It’s very reliable, however the way it handles refreshing view isn’t the most pleasant one – performance-wise. Word of advice – if you’re on Debian derivative, be cautious with apt-repository version. It’s rather _ancient_. Also remember to add allow_jsonp = true to you config file /opt/couchbase/etc/couchdb/local.ini. It’s not enabled by default, and not having this set would result with empty responses from the CouchDB server. The problem here is, that the browser doesn’t allow quering a web server with hostname other than the one the script originates. More on this case here. Seems like my problem could be overcame by changing url in index.html and hostname couchdb listens on to the same address. I’ve also created a view, that would expose an event by key: view code

Presenting the dots

As a back-end I’ve done some JQuery based AJAX calls – nothing too fancy. All things necessary for presentation layer are in this file.

Things to consider

Please bear in mind that this whole application is rather a playground, not a full-fledged project!! After creating all the parts I have some doubts about some architectural decisions I made. I don’t think the security have been taken into account seriously enough. Also scalability was never an issue ;-) If you have some thoughts about any of the aspects mentioned in this post, please feel free to comment or contact me directly :) And also you may try the application by yourself – it’s on GitHub.

You May Also Like

New HTTP Logger Grails plugin

I've wrote a new Grails plugin - httplogger. It logs:

  • request information (url, headers, cookies, method, body),
  • grails dispatch information (controller, action, parameters),
  • response information (elapsed time and body).

It is mostly useful for logging your REST traffic. Full HTTP web pages can be huge to log and generally waste your space. I suggest to map all of your REST controllers with the same path in UrlMappings, e.g. /rest/ and configure this plugin with this path.

Here is some simple output just to give you a taste of it.

17:16:00,331 INFO  filters.LogRawRequestInfoFilter  - 17:16:00,340 INFO  filters.LogRawRequestInfoFilter  - 17:16:00,342 INFO  filters.LogGrailsUrlsInfoFilter  - 17:16:00,731 INFO  filters.LogOutputResponseFilter  - >> #1 returned 200, took 405 ms.
17:16:00,745 INFO filters.LogOutputResponseFilter - >> #1 responded with '{count:0}'
17:18:55,799 INFO  filters.LogRawRequestInfoFilter  - 17:18:55,799 INFO  filters.LogRawRequestInfoFilter  - 17:18:55,800 INFO  filters.LogRawRequestInfoFilter  - 17:18:55,801 INFO  filters.LogOutputResponseFilter  - >> #2 returned 404, took 3 ms.
17:18:55,802 INFO filters.LogOutputResponseFilter - >> #2 responded with ''

Official plugin information can be found on Grails plugins website here: http://grails.org/plugins/httplogger or you can browse code on github: TouK/grails-httplogger.

Spock basics

Spock (homepage) is like its authors say 'testing and specification framework'. Spock combines very elegant and natural syntax with the powerful capabilities. And what is most important it is easy to use.

One note at the very beginning: I assume that you are already familiar with principles of Test Driven Development and you know how to use testing framework like for example JUnit.

So how can I start?


Writing spock specifications is very easy. We need basic configuration of Spock and Groovy dependencies (if you are using mavenized project with Eclipse look to my previous post: Spock, Java and Maven). Once we have everything set up and running smooth we can write our first specs (spec or specification is equivalent for test class in other frameworks like JUnit of TestNG).

What is great with Spock is fact that we can use it to test both Groovy projects and pure Java projects or even mixed projects.


Let's go!


Every spec class must inherit from spock.lang.Specification class. Only then test runner will recognize it as test class and start tests. We will write few specs for this simple class: User class and few tests not connected with this particular class.

We start with defining our class:
import spock.lang.*

class UserSpec extends Specification {

}
Now we can proceed to defining test fixtures and test methods.

All activites we want to perform before each test method, are to be put in def setup() {...} method and everything we want to be run after each test should be put in def cleanup() {...} method (they are equivalents for JUnit methods with @Before and @After annotations).

It can look like this:
class UserSpec extends Specification {
User user
Document document

def setup() {
user = new User()
document = DocumentTestFactory.createDocumentWithTitle("doc1")
}

def cleanup() {

}
}
Of course we can use field initialization for instantiating test objects:
class UserSpec extends Specification {
User user = new User()
Document document = DocumentTestFactory.createDocumentWithTitle("doc1")

def setup() {

}

def cleanup() {

}
}

What is more readable or preferred? It is just a matter of taste because according to Spock docs behaviour is the same in these two cases.

It is worth mentioning that JUnit @BeforeClass/@AfterClass are also present in Spock as def setupSpec() {...} and def cleanupSpec() {...}. They will be runned before first test and after last test method.


First tests


In Spock every method in specification class, expect setup/cleanup, is treated by runner as a test method (unless you annotate it with @Ignore).

Very interesting feature of Spock and Groovy is ability to name methods with full sentences just like regular strings:
class UserSpec extends Specification {
// ...

def "should assign coment to user"() {
// ...
}
}
With such naming convention we can write real specification and include details about specified behaviour in method name, what is very convenient when reading test reports and analyzing errors.

Test method (also called feature method) is logically divided into few blocks, each with its own purpose. Blocks are defined like labels in Java (but they are transformed with Groovy AST transform features) and some of them must be put in code in specific order.

Most basic and common schema for Spock test is:
class UserSpec extends Specification {
// ...

def "should assign coment to user"() {
given:
// do initialization of test objects
when:
// perform actions to be tested
then:
// collect and analyze results
}
}

But there are more blocks like:
  • setup
  • expect
  • where
  • cleanup
In next section I am going to describe each block shortly with little examples.

given block

This block is used to setup test objects and their state. It has to be first block in test and cannot be repeated. Below is little example how can it be used:
class UserSpec extends Specification {
// ...

def "should add project to user and mark user as project's owner"() {
given:
User user = new User()
Project project = ProjectTestFactory.createProjectWithName("simple project")
// ...
}
}

In this code given block contains initialization of test objects and nothing more. We create simple user without any specified attributes and project with given name. In case when some of these objects could be reused in more feature methods, it could be worth putting initialization in setup method.

when and then blocks

When block contains action we want to test (Spock documentation calls it 'stimulus'). This block always occurs in pair with then block, where we are verifying response for satisfying certain conditions. Assume we have this simple test case:
class UserSpec extends Specification {
// ...

def "should assign user to comment when adding comment to user"() {
given:
User user = new User()
Comment comment = new Comment()
when:
user.addComment(comment)
then:
comment.getUserWhoCreatedComment().equals(user)
}

// ...
}

In when block there is a call of tested method and nothing more. After we are sure our action was performed, we can check for desired conditions in then block.

Then block is very well structured and its every line is treated by Spock as boolean statement. That means, Spock expects that we write instructions containing comparisons and expressions returning true or false, so we can create then block with such statements:
user.getName() == "John"
user.getAge() == 40
!user.isEnabled()
Each of lines will be treated as single assertion and will be evaluated by Spock.

Sometimes we expect that our method throws an exception under given circumstances. We can write test for it with use of thrown method:
class CommentSpec extends Specification {
def "should throw exception when adding null document to comment"() {
given:
Comment comment = new Comment()
when:
comment.setCommentedDocument(null)
then:
thrown(RuntimeException)
}
}

In this test we want to make sure that passing incorrect parameters is correctly handled by tested method and that method throws an exception in response. In case you want to be certain that method does not throw particular exception, simply use notThrown method.


expect block

Expect block is primarily used when we do not want to separate when and then blocks because it is unnatural. It is especially useful for simple test (and according to TDD rules all test should be simple and short) with only one condition to check, like in this example (it is simple but should show the idea):
def "should create user with given name"() {
given:
User user = UserTestFactory.createUser("john doe")
expect:
user.getName() == "john doe"
}



More blocks!


That were very simple tests with standard Spock test layout and canonical divide into given/when/then parts. But Spock offers more possibilities in writing tests and provides more blocks.


setup/cleanup blocks

These two blocks have the very same functionality as the def setup and def cleanup methods in specification. They allow to perform some actions before test and after test. But unlike these methods (which are shared between all tests) blocks work only in methods they are defined in. 


where - easy way to create readable parameterized tests

Very often when we create unit tests there is a need to "feed" them with sample data to test various cases and border values. With Spock this task is very easy and straighforward. To provide test data to feature method, we need to use where block. Let's take a look at little the piece of code:

def "should successfully validate emails with valid syntax"() {
expect:
emailValidator.validate(email) == true
where:
email }

In this example, Spock creates variable called email which is used when calling method being tested. Internally feature method is called once, but framework iterates over given values and calls expect/when block as many times as there are values (however, if we use @Unroll annotation Spock can create separate run for each of given values, more about it in one of next examples).

Now, lets assume that we want our feature method to test both successful and failure validations. To achieve that goal we can create few 
parameterized variables for both input parameter and expected result. Here is a little example:

def "should perform validation of email addresses"() {
expect:
emailValidator.validate(email) == result
where:
email result }
Well, it looks nice, but Spock can do much better. It offers tabular format of defining parameters for test what is much more readable and natural. Lets take a look:
def "should perform validation of email addresses"() {
expect:
emailValidator.validate(email) == result
where:
email | result
"WTF" | false
"@domain" | false
"foo@bar.com" | true
"a@test" | false
}
In this code, each column of our "table" is treated as a separate variable and rows are values for subsequent test iterations.

Another useful feature of Spock during parameterizing test is its ability to "unroll" each parameterized test. Feature method from previous example could be defined as (the body stays the same, so I do not repeat it):
@Unroll("should validate email #email")
def "should perform validation of email addresses"() {
// ...
}
With that annotation, Spock generate few methods each with its own name and run them separately. We can use symbols from where blocks in @Unroll argument by preceding it with '#' sign what is a signal to Spock to use it in generated method name.


What next?


Well, that was just quick and short journey  through Spock and its capabilities. However, with that basic tutorial you are ready to write many unit tests. In one of my future posts I am going to describe more features of Spock focusing especially on its mocking abilities.