What Really Grinds My Gears: Apache NiFi

Introduction

Complaining and doing nothing to solve a problem. That’s what everybody does on the Internet. And that’s precisely what I am going to do. Why? Apache NiFi has recently proved to be powerful and effective tool for processing gigabytes of data in our telco integration project. Yet, sometimes I feel like I am just clicking too much to achieve small things. It hurts just like hitting your head against a door frame every time you enter your bedroom. Let’s take a closer look at it.

Navigation

Dividing complex data flows into the process groups is a great way of achieving modularization and reusability, however, navigation between them is rather painful. Imagine set of process groups with nested process groups with nested process groups… Every time I want to go from A->B->C to A->E->F I have to use a well-hidden breadcrumb for A and double click to get into E and F.

What is the problem? At the beginning I couldn’t even find the breadcrumb! I didn’t expect it to be at the very bottom of the screen as it is an untypical location for it. What I did expect however was a big shiny arrow go back. Moreover, having navigated to A->E->F I wanted to go back to A->B->C. As there was no “go forward / redo” button I had to search for components E and F among others which made me sacrifice my precious time.

Why do I need to navigate so much you ask? Well, I am lazy and have rather short memory at the same time. What developing flows means for me is checking what I did yesterday and adjusting it to current problem. Maybe if I had a possibility to open yesterday’s flow in another tab I wouldn’t change those views at all? My internet browser can obviously do that but it has some limits like copy-paste. Similar idea is a screen division. I know it is all difficult to achieve but I would find it a major improvement.

List queue

When I integrate with external data source all I have is documentation and hope for its correctness. In fact this rule applies to every new NiFi processor I try. When something fails I investigate the reason of failure. When everything seems right I ensure that it truly is. In both scenarios I check the first element in the queue.

Again, my expectations are simple. I double click the queue to see what it contains because it is the most interesting feature of a queue: it holds elements. According to my experience 95% of times when I do anything related to a queue I check its first element. Sadly, when double clicking I see a configuration window which comes useful from time to time. Still, in my opinion it is not as important as the content itself.

So how do you see elements of a queue anyway? Just like starting engine with cables: not easy but one can get used to it. Open context menu, select “List queue” and click the “info” button on the very left. Clicking the button is the weirdest step for me. It does not look like a button at all and I discovered its ‘clickability’ only because my friend told me about it.

I am sure there must be faster way to take a quick look at the queue content. I know that the current state is consistent with the processors front-end design but I do not feel a there’s such a strong link between queues and processors. What you typically do with a processor is configuration but what you typically do with a queue is checking the content details.

Error display

NiFi displays error in flows with big red label located in the top right corner of the processor / process group. I have nothing against it. Yet, what does a true programmer do when an error occurs? He or she copies an error message and enters stack overflow :)

And here comes the problem: there is no way to copy an error from NiFi GUI. My workaround is to open logs. Usually, it takes ages. Sometimes it is even impossible. As a software vendor company we struggle with the limited access to hosts and services on everyday basis. Just let me copy my own errors and i will do my job. Trust me I’m an engineer.

Surprisingly in the review my colleague noted that I can actually copy errors from Bulletin board located in the main settings menu. However, I find it a bit dissapointing as the messages are incomplete and the menu itself is hidden deeply.

Templates

Exporting templates from one NiFi instance to another is another unintuitive feature. What I expect is to right click on a process group and choose export option. What I get is a sequence of actions:

  1. Create template
  2. Open “templates” from the main settings menu in a top-right corner
  3. Choose a very small “download” button that doesn’t look like a download button

I do not understand why I can’t export a process group which is not a template. Of course it has to be imported as a template but exporting seems to be just an on demand XML generation. What are the benefits of such dependency?

Summary

I hope that this post will not be taken as offensive by any means. As a programmer I know how difficult it is to design beautiful, functional and consistent frontend even for very simple application. Therefore, I would like to think about this case study as a guidance for everyone who designs UX. Ask your users about their habits, analyse how the application is used. Automating your client’s job will bring you tons of respect.

And what is your experience with NiFi GUI? I think it looks pretty good but still may be more functional and convenient. I hope it will be soon.

Finally I would like to thanks Michał Hofman and Bartłomiej Tartanus for the review.

You May Also Like

Sygnalizacyjne ABC

Poniższy artykuł oparty jest na wspaniałej pozycji książkowej “System Sygnalizacji nr 7 G. Danielewicz, W.Kabaciński”. Gorąco zachęcam do…

33rd Degree day 1 review

33rd Degree is over. After the one last year, my expectations were very high, but Grzegorz Duda once again proved he's more than able to deliver. With up to five tracks (most of the time: four presentations + one workshop), and ~650 attendees,  there was a lot to see and a lot to do, thus everyone will probably have a little bit different story to tell. Here is mine.

Twitter: From Ruby on Rails to the JVM

Raffi Krikorian talking about Twitter and JVM
The conference started with  Raffi Krikorian from Twitter, talking about their use for JVM. Twitter was build with Ruby but with their performance management a lot of the backend was moved to Scala, Java and Closure. Raffi noted, that for Ruby programmers Scala was easier to grasp than Java, more natural, which is quite interesting considering how many PHP guys move to Ruby these days because of the same reasons. Perhaps the path of learning Jacek Laskowski once described (Java -> Groovy -> Scala/Closure) may be on par with PHP -> Ruby -> Scala. It definitely feels like Scala is the holy grail of languages these days.

Raffi also noted, that while JVM delivered speed and a concurrency model to Twitter stack, it wasn't enough, and they've build/customized their own Garbage Collector. My guess is that Scala/Closure could also be used because of a nice concurrency solutions (STM, immutables and so on).

Raffi pointed out, that with the scale of Twitter, you easily get 3 million hits per second, and that means you probably have 3 edge cases every second. I'd love to learn listen to lessons they've learned from this.

 

Complexity of Complexity


The second keynote of the first day, was Ken Sipe talking about complexity. He made a good point that there is a difference between complex and complicated, and that we often recognize things as complex only because we are less familiar with them. This goes more interesting the moment you realize that the shift in last 20 years of computer languages, from the "Less is more" paradigm (think Java, ASM) to "More is better" (Groovy/Scala/Closure), where you have more complex language, with more powerful and less verbose syntax, that is actually not more complicated, it just looks less familiar.

So while 10 years ago, I really liked Java as a general purpose language for it's small set of rules that could get you everywhere, it turned out that to do most of the real world stuff, a lot of code had to be written. The situation got better thanks to libraries/frameworks and so on, but it's just patching. New languages have a lot of stuff build into, which makes their set of rules and syntax much more complex, but once you get familiar, the real world usage is simple, faster, better, with less traps laying around, waiting for you to fall.

Ken also pointed out, that while Entity Service Bus looks really simple on diagrams, it's usually very difficult and complicated to use from the perspective of the programmer. And that's probably why it gets chosen so often - the guys selling/buying it, look no deeper than on the diagram.

 

Pointy haired bosses and pragmatic programmers: Facts and Fallacies of Software Development

Venkat Subramaniam with Dima
Dima got lucky. Or maybe not.

Venkat Subramaniam is the kind of a speaker that talk about very simple things in a way, which makes everyone either laugh or reflect. Yes, he is a showman, but hey, that's actually good, because even if you know the subject quite well, his talks are still very entertaining.
This talk was very generic (here's my thesis: the longer the title, the more generic the talk will be), interesting and fun, but at the end I'm unable to see anything new I'd have learned, apart from the distinction between Dynamic vs Static and Strong vs Weak typing, which I've seen the last year, but managed to forgot. This may be a very interesting argument for all those who are afraid of Groovy/Ruby, after bad experience with PHP or Perl.

Build Trust in Your Build to Deployment Flow!


Frederic Simon talked about DevOps and deployment, and that was a miss in my  schedule, because of two reasons. First, the talk was aimed at DevOps specifically, and while the subject is trendy lately, without big-scale problems, deployment is a process I usually set up and forget about. It just works, mostly because I only have to deal with one (current) project at a time. 
Not much love for Dart.
Second, while Frederic has a fabulous accent and a nice, loud voice, he tends to start each sentence loud and fade the sound at the end. This, together with mics failing him badly, made half of the presentation hard to grasp unless you were sitting in the first row.
I'm not saying the presentation was bad, far from it, it just clearly wasn't for me.
I've left a few minutes before the end, to see how many people came to Dart presentation by Mike West. I was kind of interested, since I'm following Warsaw Google Technology User Group and heard a few voices about why I should pay attentions to that new Google language. As you can see from the picture on the right, the majority tends to disagree with that opinion.

 

Non blocking, composable reactive web programming with Iteratees

Sadek Drobi's talk about Iteratees in Play 2.0 was very refreshing. Perhaps because I've never used Play before, but the presentation was flawless, with well explained problems, concepts and solutions.
Sadek started with a reflection on how much CPU we waste waiting for IO in web development, then moved to Play's Iteratees, to explain the concept and implementation, which while very different from the that overused Request/Servlet model, looked really nice and simple. I'm not sure though, how much the problem is present when you have a simple service, serving static content before your app server. Think apache (and faster) before tomcat. That won't fix the upload/download issue though, which is beautifully solved in Play 2.0

The Future of the Java Platform: Java SE 8 & Beyond


Simon Ritter is an intriguing fellow. If you take a glance at his work history (AT&T UNIX System Labs -> Novell -> Sun -> Oracle), you can easily see, he's a heavy weight player.
His presentation was rich in content, no corpo-bullshit. He started with a bit of history of JCP and how it looks like right now, then moved to the most interesting stuff, changes. Now I could give you a summary here, but there is really no point: you'd be much better taking look at the slides. There are only 48 of them, but everything is self-explanatory.
While I'm very disappointed with the speed of changes, especially when compared to the C# world, I'm glad with the direction and the fact that they finally want to BREAK the compatibility with the broken stuff (generics, etc.).  Moving to other languages I guess I won't be the one to scream "My god, finally!" somewhere in 2017, though. All the changes together look very promising, it's just that I'd like to have them like... now? Next year max, not near the heat death of the universe.

Simon also revealed one of the great mysteries of Java, to me:
The original idea behind JNI was to make it hard to write, to discourage people form using it.
On a side note, did you know Tegra3 has actually 5 cores? You use 4 of them, and then switch to the other one, when you battery gets low.

BOF: Spring and CloudFoundry


Having most of my folks moved to see "Typesafe stack 2.0" fabulously organized by Rafał Wasilewski and  Wojtek Erbetowski (with both of whom I had a pleasure to travel to the conference) and knowing it will be recorded, I've decided to see what Josh Long has to say about CloudFoundry, a subject I find very intriguing after the de facto fiasco of Google App Engine.

The audience was small but vibrant, mostly users of Amazon EC2, and while it turned out that Josh didn't have much, with pricing and details not yet public, the fact that Spring Source has already created their own competition (Could Foundry is both an Open Source app and a service), takes a lot from my anxiety.

For the review of the second day of the conference, go here.

Using Eclipse snippets for faster JUnit test creation (with Mockito!)

I'm using this snippet to create a template of new unit test method supporting BDD mockito tests. This is a good example for adding static imports to a class from snippets.@${testType:newType(org.junit.Test)}public void should${testname}() { ${staticIm...I'm using this snippet to create a template of new unit test method supporting BDD mockito tests. This is a good example for adding static imports to a class from snippets.@${testType:newType(org.junit.Test)}public void should${testname}() { ${staticIm...