What Really Grinds My Gears: Apache NiFi

Introduction

Complaining and doing nothing to solve a problem. That’s what everybody does on the Internet. And that’s precisely what I am going to do. Why? Apache NiFi has recently proved to be powerful and effective tool for processing gigabytes of data in our telco integration project. Yet, sometimes I feel like I am just clicking too much to achieve small things. It hurts just like hitting your head against a door frame every time you enter your bedroom. Let’s take a closer look at it.

Navigation

Dividing complex data flows into the process groups is a great way of achieving modularization and reusability, however, navigation between them is rather painful. Imagine set of process groups with nested process groups with nested process groups… Every time I want to go from A->B->C to A->E->F I have to use a well-hidden breadcrumb for A and double click to get into E and F.

What is the problem? At the beginning I couldn’t even find the breadcrumb! I didn’t expect it to be at the very bottom of the screen as it is an untypical location for it. What I did expect however was a big shiny arrow go back. Moreover, having navigated to A->E->F I wanted to go back to A->B->C. As there was no “go forward / redo” button I had to search for components E and F among others which made me sacrifice my precious time.

Why do I need to navigate so much you ask? Well, I am lazy and have rather short memory at the same time. What developing flows means for me is checking what I did yesterday and adjusting it to current problem. Maybe if I had a possibility to open yesterday’s flow in another tab I wouldn’t change those views at all? My internet browser can obviously do that but it has some limits like copy-paste. Similar idea is a screen division. I know it is all difficult to achieve but I would find it a major improvement.

List queue

When I integrate with external data source all I have is documentation and hope for its correctness. In fact this rule applies to every new NiFi processor I try. When something fails I investigate the reason of failure. When everything seems right I ensure that it truly is. In both scenarios I check the first element in the queue.

Again, my expectations are simple. I double click the queue to see what it contains because it is the most interesting feature of a queue: it holds elements. According to my experience 95% of times when I do anything related to a queue I check its first element. Sadly, when double clicking I see a configuration window which comes useful from time to time. Still, in my opinion it is not as important as the content itself.

So how do you see elements of a queue anyway? Just like starting engine with cables: not easy but one can get used to it. Open context menu, select “List queue” and click the “info” button on the very left. Clicking the button is the weirdest step for me. It does not look like a button at all and I discovered its ‘clickability’ only because my friend told me about it.

I am sure there must be faster way to take a quick look at the queue content. I know that the current state is consistent with the processors front-end design but I do not feel a there’s such a strong link between queues and processors. What you typically do with a processor is configuration but what you typically do with a queue is checking the content details.

Error display

NiFi displays error in flows with big red label located in the top right corner of the processor / process group. I have nothing against it. Yet, what does a true programmer do when an error occurs? He or she copies an error message and enters stack overflow :)

And here comes the problem: there is no way to copy an error from NiFi GUI. My workaround is to open logs. Usually, it takes ages. Sometimes it is even impossible. As a software vendor company we struggle with the limited access to hosts and services on everyday basis. Just let me copy my own errors and i will do my job. Trust me I’m an engineer.

Surprisingly in the review my colleague noted that I can actually copy errors from Bulletin board located in the main settings menu. However, I find it a bit dissapointing as the messages are incomplete and the menu itself is hidden deeply.

Templates

Exporting templates from one NiFi instance to another is another unintuitive feature. What I expect is to right click on a process group and choose export option. What I get is a sequence of actions:

  1. Create template
  2. Open “templates” from the main settings menu in a top-right corner
  3. Choose a very small “download” button that doesn’t look like a download button

I do not understand why I can’t export a process group which is not a template. Of course it has to be imported as a template but exporting seems to be just an on demand XML generation. What are the benefits of such dependency?

Summary

I hope that this post will not be taken as offensive by any means. As a programmer I know how difficult it is to design beautiful, functional and consistent frontend even for very simple application. Therefore, I would like to think about this case study as a guidance for everyone who designs UX. Ask your users about their habits, analyse how the application is used. Automating your client’s job will bring you tons of respect.

And what is your experience with NiFi GUI? I think it looks pretty good but still may be more functional and convenient. I hope it will be soon.

Finally I would like to thanks Michał Hofman and Bartłomiej Tartanus for the review.

You May Also Like

How to use mocks in controller tests

Even since I started to write tests for my Grails application I couldn't find many articles on using mocks. Everyone is talking about tests and TDD but if you search for it there isn't many articles.

Today I want to share with you a test with mocks for a simple and complete scenario. I have a simple application that can fetch Twitter tweets and present it to user. I use REST service and I use GET to fetch tweets by id like this: http://api.twitter.com/1/statuses/show/236024636775735296.json. You can copy and paste it into your browser to see a result.

My application uses Grails 2.1 with spock-0.6 for tests. I have TwitterReaderService that fetches tweets by id, then I parse a response into my Tweet class.


class TwitterReaderService {
Tweet readTweet(String id) throws TwitterError {
try {
String jsonBody = callTwitter(id)
Tweet parsedTweet = parseBody(jsonBody)
return parsedTweet
} catch (Throwable t) {
throw new TwitterError(t)
}
}

private String callTwitter(String id) {
// TODO: implementation
}

private Tweet parseBody(String jsonBody) {
// TODO: implementation
}
}

class Tweet {
String id
String userId
String username
String text
Date createdAt
}

class TwitterError extends RuntimeException {}

TwitterController plays main part here. Users call show action along with id of a tweet. This action is my subject under test. I've implemented some basic functionality. It's easier to focus on it while writing tests.


class TwitterController {
def twitterReaderService

def index() {
}

def show() {
Tweet tweet = twitterReaderService.readTweet(params.id)
if (tweet == null) {
flash.message = 'Tweet not found'
redirect(action: 'index')
return
}

[tweet: tweet]
}
}

Let's start writing a test from scratch. Most important thing here is that I use mock for my TwitterReaderService. I do not construct new TwitterReaderService(), because in this test I test only TwitterController. I am not interested in injected service. I know how this service is supposed to work and I am not interested in internals. So before every test I inject a twitterReaderServiceMock into controller:


import grails.test.mixin.TestFor
import spock.lang.Specification

@TestFor(TwitterController)
class TwitterControllerSpec extends Specification {
TwitterReaderService twitterReaderServiceMock = Mock(TwitterReaderService)

def setup() {
controller.twitterReaderService = twitterReaderServiceMock
}
}

Now it's time to think what scenarios I need to test. This line from TwitterReaderService is the most important:


Tweet readTweet(String id) throws TwitterError

You must think of this method like a black box right now. You know nothing of internals from controller's point of view. You're only interested what can be returned for you:

  • a TwitterError can be thrown
  • null can be returned
  • Tweet instance can be returned

This list is your test blueprint. Now answer a simple question for each element: "What do I want my controller to do in this situation?" and you have plan test:

  • show action should redirect to index if TwitterError is thrown and inform about error
  • show action should redirect to index and inform if tweet is not found
  • show action should show found tweet

That was easy and straightforward! And now is the best part: we use twitterReaderServiceMock to mock each of these three scenarios!

In Spock there is a good documentation about interaction with mocks. You declare what methods are called, how many times, what parameters are given and what should be returned. Remember a black box? Mock is your black box with detailed instruction, e.g.: I expect you that if receive exactly one call to readTweet with parameter '1' then you should throw me a TwitterError. Rephrase this sentence out loud and look at this:


1 * twitterReaderServiceMock.readTweet('1') >> { throw new TwitterError() }

This is a valid interaction definition on mock! It's that easy! Here is a complete test that fails for now:


import grails.test.mixin.TestFor
import spock.lang.Specification

@TestFor(TwitterController)
class TwitterControllerSpec extends Specification {
TwitterReaderService twitterReaderServiceMock = Mock(TwitterReaderService)

def setup() {
controller.twitterReaderService = twitterReaderServiceMock
}

def "show should redirect to index if TwitterError is thrown"() {
given:
controller.params.id = '1'
when:
controller.show()
then:
1 * twitterReaderServiceMock.readTweet('1') >> { throw new TwitterError() }
0 * _._
flash.message == 'There was an error on fetching your tweet'
response.redirectUrl == '/twitter/index'
}
}

| Failure: show should redirect to index if TwitterError is thrown(pl.refaktor.twitter.TwitterControllerSpec)
| pl.refaktor.twitter.TwitterError
at pl.refaktor.twitter.TwitterControllerSpec.show should redirect to index if TwitterError is thrown_closure1(TwitterControllerSpec.groovy:29)

You may notice 0 * _._ notation. It says: I don't want any other mocks or any other methods called. Fail this test if something is called! It's a good practice to ensure that there are no more interactions than you want.

Ok, now I need to implement controller logic to handle TwitterError.


class TwitterController {

def twitterReaderService

def index() {
}

def show() {
Tweet tweet

try {
tweet = twitterReaderService.readTweet(params.id)
} catch (TwitterError e) {
log.error(e)
flash.message = 'There was an error on fetching your tweet'
redirect(action: 'index')
return
}

[tweet: tweet]
}
}

My tests passes! We have two scenarios left. Rule stays the same: TwitterReaderService returns something and we test against it. So this line is the heart of each test, change only returned values after >>:


1 * twitterReaderServiceMock.readTweet('1') >> { throw new TwitterError() }

Here is a complete test for three scenarios and controller that passes it.


import grails.test.mixin.TestFor
import spock.lang.Specification

@TestFor(TwitterController)
class TwitterControllerSpec extends Specification {

TwitterReaderService twitterReaderServiceMock = Mock(TwitterReaderService)

def setup() {
controller.twitterReaderService = twitterReaderServiceMock
}

def "show should redirect to index if TwitterError is thrown"() {
given:
controller.params.id = '1'
when:
controller.show()
then:
1 * twitterReaderServiceMock.readTweet('1') >> { throw new TwitterError() }
0 * _._
flash.message == 'There was an error on fetching your tweet'
response.redirectUrl == '/twitter/index'
}

def "show should inform about not found tweet"() {
given:
controller.params.id = '1'
when:
controller.show()
then:
1 * twitterReaderServiceMock.readTweet('1') >> null
0 * _._
flash.message == 'Tweet not found'
response.redirectUrl == '/twitter/index'
}


def "show should show found tweet"() {
given:
controller.params.id = '1'
when:
controller.show()
then:
1 * twitterReaderServiceMock.readTweet('1') >> new Tweet()
0 * _._
flash.message == null
response.status == 200
}
}

class TwitterController {

def twitterReaderService

def index() {
}

def show() {
Tweet tweet

try {
tweet = twitterReaderService.readTweet(params.id)
} catch (TwitterError e) {
log.error(e)
flash.message = 'There was an error on fetching your tweet'
redirect(action: 'index')
return
}

if (tweet == null) {
flash.message = 'Tweet not found'
redirect(action: 'index')
return
}

[tweet: tweet]
}
}

The most important thing here is that we've tested controller-service interaction without logic implementation in service! That's why mock technique is so useful. It decouples your dependencies and let you focus on exactly one subject under test. Happy testing!