Testing NiFi Flow – The good, the bad and the ugly

Introduction

Some time has passed since we wrote our last blogpost about Apache NiFi where we pointed out what could be improved. It’s a very nice tool, so we are still using it, but we’ve found some other things that could be improved to make it even better. Of course, we could write another post where all we do is complain, but does that make the world better? Unfortunately not. So we decided that we could do better than that. We took the most painful issue and implemented a solution – that’s how NiFi Flow Tester was created.

Use case

Let’s assume you have to create a simple flow in NiFi according to some specification your client gave you:

  1. read some XML files from directory
  2. validate them (using XSD file)
  3. convert them from XML to JSON
  4. log if something failed
  5. pass it for further processing if everything went well

after studying documentation and some googling you end up with this flow: First you need to list all the files, then read their content, validate it, convert to JSON and pass it further. Looks great. Now it’s time for some tests – you have two options.

Test in NiFi directly – the bad

To test it manually, you need to copy some file to the input directory, wait to see what happens and if everything went well – you can verify it (again, manually) using ‘View data provenance’. Let’s try this approach. Something went wrong. Can you tell where? Something inside ValidateXml, but what exactly? So now to the LogAttribute -> View provenance data and check record attributes to find this one:

validatexml.invalid.error
The markup in the document following the root element must be well-formed.

Maybe there’s something wrong with XML file that your client gave you?

> cat data.txt
<person>
    <name>Foo</name>
    <type>student</type>
    <age>31</age>
</person>
<person>
    <name>Invalid age</name>
    <type>student</type>
    <age>-11</age>
</person>
<person>
    <name>Bar</name>
    <type>student</type>
    <age>12</age>
</person>

Looks like your client has some strange file format, where each line is a separate XML file. Ok, that’s fine you just need to add one more processor which splits text by each line and do the manual tests AGAIN… but at this point, you should have asked a question – is there a better way to do this? After all, we’re programmers and we love to write code.

Nifi Mock – the ugly

Nifi Mock library comes with a processor testing tool – TestRunner. Let’s use this to test our flow! TestRunner can only run one processor, but we can work it out. Let’s start with first one:

val listFileRunner = TestRunners.newTestRunner(new ListFile)
listFileRunner.setProperty(ListFile.DIRECTORY, s"$testDir/person/")
listFileRunner.run()
val listFileResults = listFileRunner.getFlowFilesForRelationship("success")

We create a runner with the processor, set a directory to read from, run it and get the results from the relationship. That was easy, let’s create another TestRunner with FetchFile processor, enqueue results from the previous step, run and collect results.

val fetchFileRunner = TestRunners.newTestRunner(new FetchFile)
listFileResults.foreach(f => fetchFileRunner.enqueue(f))
fetchFileRunner.run()
val fetchFileResults = fetchFileRunner.getFlowFilesForRelationship("success")

Great! Next one:

val splitTextRunner = TestRunners.newTestRunner(new SplitText)
splitTextRunner.setProperty(SplitText.LINE_SPLIT_COUNT, "1")
fetchFileResults.foreach(f => splitTextRunner.enqueue(f)) 
splitTextRunner.run()
val splitTextResults = splitTextRunner.getFlowFilesForRelationship(SplitText.REL_SPLITS)

Two more processors and we are done. And don’t forget to check all the runners and results names when you are done with copy-pasting this code! That’s not cool. I mean, we love to write code but this is too much boilerplate, isn’t it? I’m sure you saw the pattern here. We saw it too and that’s why we introduced a solution.

Nifi Flow Tester – the good

In our new library we have two ways to create flow. Today we will focus on the simple one, which is better for prototyping. First, we need to create new NifiFlowBuilder() and add some nodes to it:

new NifiFlowBuilder()
  .addNode(
    "ListFile",
    new ListFile,
    Map(ListFile.DIRECTORY.getName -> s"$testDir/person/")
  )
  .addNode("FetchFile", new FetchFile, Map())
  .addNode(
    "SplitText",
    new Split Text,
    Map(SplitText.LINE_SPLIT_COUNT.getName -> "1")
  )
  .addNode(
    "ValidateXml",
    new ValidateXml,
    Map(ValidateXml.SCHEMA_FILE.getName -> s"$testDir/person-schema.xsd")
  )
  .addNode(
    "TransformXml",
    new TransformXml,
    Map(TransformXml.XSLT_FILE_NAME.getName -> s"$testDir/xml-to-json.xsl")
  )

First, you need to specify the node name, which allows you to identify your node. It can be the same as the class name as long as you don’t use the same processor twice in flow. Then of course you need to specify the processor with all the parameters as simple Map[String, String]. The safest way to do this is to use processor’s class fields of type PropertyDescriptor. Unfortunately, they’re not always public, so sometimes you have written their name – instead of ListFile.DIRECTORY.getName we could write Input Directory.

When all the nodes are specified, we can add connections between them using node names. The first parameter is the source node, the second is a destination and the third is the selected relationship. Again, using processor class fields is less error prone but sometimes these fields are not public (for instance ListFile or FetchFile processor)

.addConnection("ListFile", "FetchFile", "success")
.addConnection("FetchFile", "SplitText", "success")
.addConnection("SplitText", "ValidateXml", SplitText.REL_SPLITS)
.addConnection("ValidateXml", "TransformXml", ValidateXml.REL_VALID)
.addOutputConnection("TransformXml", "success")

In the end, we add a connection to the output port to get the results. Everything looks good. Now just call build method to create flow. The entire code looks like this:

val flow = new NifiFlowBuilder()
      .addNode("ListFile", new ListFile, Map(ListFile.DIRECTORY.getName -> s"$testDir/person/"))
      .addNode("FetchFile", new FetchFile, Map())
      .addNode("SplitText", new SplitText, Map(SplitText.LINE_SPLIT_COUNT.getName -> "1"))
      .addNode("ValidateXml", new ValidateXml, Map(ValidateXml.SCHEMA_FILE.getName -> s"$testDir/person-schema.xsd"))
      .addNode("TransformXml", new TransformXml, Map(TransformXml.XSLT_FILE_NAME.getName -> s"$testDir/xml-to-json.xsl"))
      .addConnection("ListFile", "FetchFile", "success")
      .addConnection("FetchFile", "SplitText", "success")
      .addConnection( "SplitText", "ValidateXml", SplitText.REL_SPLITS)
      .addConnection("ValidateXml", "TransformXml", ValidateXml.REL_VALID)
      .addOutputConnection("TransformXml", "success")
      .build()

Now we just need to run the flow, collect the results and verify them:

flow.run() 
val files = flow.executionResult.outputFlowFiles 
files.head.assertContentEquals("""{"person":{"name":"Foo","type":"student","age":31}}""")

Disclaimer: this is not the best way to test if JSON is correct, but it was done for simplicity.

Conclusion

Apache NiFi seems to be perfect unless you start a serious data integration. Without the ability to test your changes fast, you will become extremely frustrated with clicking every time your client notices a bug. Sometimes it may be a bug of the NiFi itself, but you will never know it until you debug the code. That’s something our library can help you with. You can find it here: https://github.com/TouK/plumber

You May Also Like

Using WsLite in practice

TL;DR

There is a example working GitHub project which covers unit testing and request/response logging when using WsLite.

Why Groovy WsLite ?

I’m a huge fan of Groovy WsLite project for calling SOAP web services. Yes, in a real world you have to deal with those - big companies have huge amount of “legacy” code and are crazy about homogeneous architecture - only SOAP, Java, Oracle, AIX…

But I also never been comfortable with XFire/CXF approach of web service client code generation. I wrote a bit about other posibilites in this post. With JAXB you can also experience some freaky classloading errors - as Tomek described on his blog. In a large commercial project the “the less code the better” principle is significant. And the code generated from XSD could look kinda ugly - especially more complicated structures like sequences, choices, anys etc.

Using WsLite with native Groovy concepts like XmlSlurper could be a great choice. But since it’s a dynamic approach you have to be really careful - write good unit tests and log requests. Below are my few hints for using WsLite in practice.

Unit testing

Suppose you have some invocation of WsLite SOAPClient (original WsLite example):

def getMothersDay(long _year) {
    def response = client.send(SOAPAction: action) {
       body {
           GetMothersDay('xmlns':'http://www.27seconds.com/Holidays/US/Dates/') {
              year(_year)
           }
       }
    }
    response.GetMothersDayResponse.GetMothersDayResult.text()
}

How can the unit test like? My suggestion is to mock SOAPClient and write a simple helper to test that builded XML is correct. Example using great SpockFramework:

void setup() {
   client = Mock(SOAPClient)
   service.client = client
}

def "should pass year to GetMothersDay and return date"() {
  given:
      def year = 2013
  when:
      def date = service.getMothersDay(year)
  then:
      1 * client.send(_, _) >> { Map params, Closure requestBuilder ->
            Document doc = buildAndParseXml(requestBuilder)
            assertXpathEvaluatesTo("$year", '//ns:GetMothersDay/ns:year', doc)
            return mockResponse(Responses.mothersDay)
      }
      date == "2013-05-12T00:00:00"
}

This uses a real cool feature of Spock - even when you mock the invocation with “any mark” (_), you are able to get actual arguments. So we can build XML that would be passed to SOAPClient's send method and check that specific XPaths are correct:

void setup() {
    engine = XMLUnit.newXpathEngine()
    engine.setNamespaceContext(new SimpleNamespaceContext(namespaces()))
}

protected Document buildAndParseXml(Closure xmlBuilder) {
    def writer = new StringWriter()
    def builder = new MarkupBuilder(writer)
    builder.xml(xmlBuilder)
    return XMLUnit.buildControlDocument(writer.toString())
}

protected void assertXpathEvaluatesTo(String expectedValue,
                                      String xpathExpression, Document doc) throws XpathException {
    Assert.assertEquals(expectedValue,
            engine.evaluate(xpathExpression, doc))
}

protected Map namespaces() {
    return [ns: 'http://www.27seconds.com/Holidays/US/Dates/']
}

The XMLUnit library is used just for XpathEngine, but it is much more powerful for comparing XML documents. The NamespaceContext is needed to use correct prefixes (e.g. ns:GetMothersDay) in your Xpath expressions.

Finally - the mock returns SOAPResponse instance filled with envelope parsed from some constant XML:

protected SOAPResponse mockResponse(String resp) {
    def envelope = new XmlSlurper().parseText(resp)
    new SOAPResponse(envelope: envelope)
}

Request and response logging

The WsLite itself doesn’t use any logging framework. We usually handle it by adding own sendWithLogging method:

private SOAPResponse sendWithLogging(String action, Closure cl) {
    SOAPResponse response = client.send(SOAPAction: action, cl)
    log(response?.httpRequest, response?.httpResponse)
    return response
}

private void log(HTTPRequest request, HTTPResponse response) {
    log.debug("HTTPRequest $request with content:\n${request?.contentAsString}")
    log.debug("HTTPResponse $response with content:\n${response?.contentAsString}")
}

This logs the actual request and response send through SOAPClient. But it logs only when invocation is successful and errors are much more interesting… So here goes withExceptionHandler method:

private SOAPResponse withExceptionHandler(Closure cl) {
    try {
        cl.call()
    } catch (SOAPFaultException soapEx) {
        log(soapEx.httpRequest, soapEx.httpResponse)
        def message = soapEx.hasFault() ? soapEx.fault.text() : soapEx.message
        throw new InfrastructureException(message)
    } catch (HTTPClientException httpEx) {
        log(httpEx.request, httpEx.response)
        throw new InfrastructureException(httpEx.message)
    }
}
def send(String action, Closure cl) {
    withExceptionHandler {
        sendWithLogging(action, cl)
    }
}

XmlSlurper gotchas

Working with XML document with XmlSlurper is generally great fun, but is some cases could introduce some problems. A trivial example is parsing an id with a number to Long value:

def id = Long.valueOf(edit.'@id' as String)

The Attribute class (which edit.'@id' evaluates to) can be converted to String using as operator, but converting to Long requires using valueOf.

The second example is a bit more complicated. Consider following XML fragment:

<edit id="3">
   <params>
      <param value="label1" name="label"/>
      <param value="2" name="param2"/>
   </params>
   <value>123</value>
</edit>
<edit id="6">
   <params>
      <param value="label2" name="label"/>
      <param value="2" name="param2"/>
   </params>
   <value>456</value>
</edit>

We want to find id of edit whose label is label1. The simplest solution seems to be:

def param = doc.edit.params.param.find { it['@value'] == 'label1' }
def edit = params.parent().parent()

But it doesn’t work! The parent method returns multiple edits, not only the one that is parent of given param

Here’s the correct solution:

doc.edit.find { edit ->
    edit.params.param.find { it['@value'] == 'label1' }
}

Example

The example working project covering those hints could be found on GitHub.

Multi phased processing in scala

Last time in our project we had to add progress bar for visualization of long time running process. Process was made of a few phases and we had to print in which phase we currently are. In first step we conclude that we need to create a class of Progre...