{"id":13544,"date":"2019-01-10T18:11:42","date_gmt":"2019-01-10T17:11:42","guid":{"rendered":"https:\/\/touk.pl\/blog\/?p=13544"},"modified":"2023-03-20T16:18:48","modified_gmt":"2023-03-20T15:18:48","slug":"testing-nifi-flow-the-good-the-bad-and-the-ugly","status":"publish","type":"post","link":"https:\/\/touk.pl\/blog\/2019\/01\/10\/testing-nifi-flow-the-good-the-bad-and-the-ugly\/","title":{"rendered":"Testing NiFi Flow &#8211; The good, the bad and the ugly"},"content":{"rendered":"<h1 id=\"introduction\">Introduction<\/h1>\n<p>Some time has passed since we wrote our last <a href=\"https:\/\/touk.pl\/blog\/2018\/07\/19\/what-really-grinds-my-gears-apache-nifi\/\">blogpost<\/a> about Apache NiFi where we pointed out what could be improved. It\u2019s a very nice tool, so we are still using it, but we\u2019ve found some other things that could be improved to make it even better. Of course, we could write another post where all we do is complain, but does that make the world better? Unfortunately not. So we decided that we could do better than that. We took the most painful issue and implemented a solution \u2013 that\u2019s how NiFi Flow Tester was created.<\/p>\n<h1 id=\"use-case\">Use case<\/h1>\n<p>Let\u2019s assume you have to create a simple flow in NiFi according to some specification your client gave you:<\/p>\n<ol>\n<li>read some XML files from directory<\/li>\n<li>validate them (using XSD file)<\/li>\n<li>convert them from XML to JSON<\/li>\n<li>log if something failed<\/li>\n<li>pass it for further processing if everything went well<\/li>\n<\/ol>\n<p>after studying documentation and some googling you end up with this flow: <a href=\"https:\/\/touk.pl\/blog\/wp-content\/uploads\/2018\/09\/Zrzut-ekranu-2018-09-27-o-14.18.05.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-13545\" src=\"https:\/\/touk.pl\/blog\/wp-content\/uploads\/2018\/09\/Zrzut-ekranu-2018-09-27-o-14.18.05-1024x863.png\" alt=\"\" width=\"660\" height=\"556\" srcset=\"https:\/\/touk.pl\/blog\/wp-content\/uploads\/2018\/09\/Zrzut-ekranu-2018-09-27-o-14.18.05-1024x863.png 1024w, https:\/\/touk.pl\/blog\/wp-content\/uploads\/2018\/09\/Zrzut-ekranu-2018-09-27-o-14.18.05-300x253.png 300w, https:\/\/touk.pl\/blog\/wp-content\/uploads\/2018\/09\/Zrzut-ekranu-2018-09-27-o-14.18.05-768x647.png 768w, https:\/\/touk.pl\/blog\/wp-content\/uploads\/2018\/09\/Zrzut-ekranu-2018-09-27-o-14.18.05.png 1576w\" sizes=\"auto, (max-width: 660px) 100vw, 660px\" \/><\/a> First you need to list all the files, then read their content, validate it, convert to JSON and pass it further. Looks great. Now it\u2019s time for some tests \u2013 you have two options.<\/p>\n<h2 id=\"test-in-nifi-directly-the-bad\">Test in NiFi directly \u2013 the bad<\/h2>\n<p>To test it manually, you need to copy some file to the input directory, wait to see what happens and if everything went well \u2013 you can verify it (again, manually) using \u2018View data provenance\u2019. Let\u2019s try this approach. <a href=\"https:\/\/touk.pl\/blog\/wp-content\/uploads\/2018\/09\/Zrzut-ekranu-2018-09-27-o-14.42.47.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-13550\" src=\"https:\/\/touk.pl\/blog\/wp-content\/uploads\/2018\/09\/Zrzut-ekranu-2018-09-27-o-14.42.47-1024x858.png\" alt=\"\" width=\"660\" height=\"553\" srcset=\"https:\/\/touk.pl\/blog\/wp-content\/uploads\/2018\/09\/Zrzut-ekranu-2018-09-27-o-14.42.47-1024x858.png 1024w, https:\/\/touk.pl\/blog\/wp-content\/uploads\/2018\/09\/Zrzut-ekranu-2018-09-27-o-14.42.47-300x251.png 300w, https:\/\/touk.pl\/blog\/wp-content\/uploads\/2018\/09\/Zrzut-ekranu-2018-09-27-o-14.42.47-768x644.png 768w, https:\/\/touk.pl\/blog\/wp-content\/uploads\/2018\/09\/Zrzut-ekranu-2018-09-27-o-14.42.47.png 1580w\" sizes=\"auto, (max-width: 660px) 100vw, 660px\" \/><\/a> Something went wrong. Can you tell where? Something inside ValidateXml, but what exactly? So now to the LogAttribute -> View provenance data and check record attributes to find this one:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-linenumbers=\"false\">validatexml.invalid.error\r\nThe markup in the document following the root element must be well-formed.\r\n<\/pre>\n<p>Maybe there\u2019s something wrong with XML file that your client gave you?<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"xml\">> cat data.txt\r\n<person>\r\n    <name>Foo<\/name>\r\n    <type>student<\/type>\r\n    <age>31<\/age>\r\n<\/person>\r\n<person>\r\n    <name>Invalid age<\/name>\r\n    <type>student<\/type>\r\n    <age>-11<\/age>\r\n<\/person>\r\n<person>\r\n    <name>Bar<\/name>\r\n    <type>student<\/type>\r\n    <age>12<\/age>\r\n<\/person><\/pre>\n<p>Looks like your client has some strange file format, where each line is a separate XML file. Ok, that\u2019s fine you just need to add one more processor which splits text by each line and do the manual tests AGAIN\u2026 but at this point, you should have asked a question \u2013 is there a better way to do this? After all, we\u2019re programmers and we love to write code.<\/p>\n<h2 id=\"nifi-mock-the-ugly\">Nifi Mock \u2013 the ugly<\/h2>\n<p>Nifi Mock library comes with a processor testing tool \u2013 <code>TestRunner<\/code>. Let\u2019s use this to test our flow! <code>TestRunner<\/code> can only run one processor, but we can work it out. Let\u2019s start with first one:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"scala\">val listFileRunner = TestRunners.newTestRunner(new ListFile)\r\nlistFileRunner.setProperty(ListFile.DIRECTORY, s\"$testDir\/person\/\")\r\nlistFileRunner.run()\r\nval listFileResults = listFileRunner.getFlowFilesForRelationship(\"success\")\r\n<\/pre>\n<p>We create a runner with the processor, set a directory to read from, run it and get the results from the relationship. That was easy, let\u2019s create another <code>TestRunner<\/code> with <code>FetchFile<\/code> processor, enqueue results from the previous step, run and collect results.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"scala\">val fetchFileRunner = TestRunners.newTestRunner(new FetchFile)\r\nlistFileResults.foreach(f => fetchFileRunner.enqueue(f))\r\nfetchFileRunner.run()\r\nval fetchFileResults = fetchFileRunner.getFlowFilesForRelationship(\"success\")\r\n<\/pre>\n<p>Great! Next one:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"scala\">val splitTextRunner = TestRunners.newTestRunner(new SplitText)\r\nsplitTextRunner.setProperty(SplitText.LINE_SPLIT_COUNT, \"1\")\r\nfetchFileResults.foreach(f => splitTextRunner.enqueue(f)) \r\nsplitTextRunner.run()\r\nval splitTextResults = splitTextRunner.getFlowFilesForRelationship(SplitText.REL_SPLITS)\r\n<\/pre>\n<p>Two more processors and we are done. And don\u2019t forget to check all the runners and results names when you are done with copy-pasting this code! That\u2019s not cool. I mean, we love to write code but this is too much boilerplate, isn\u2019t it? I\u2019m sure you saw the pattern here. We saw it too and that\u2019s why we introduced a solution.<\/p>\n<h2 id=\"nifi-flow-tester-the-good\">Nifi Flow Tester \u2013 the good<\/h2>\n<p>In our new library we have two ways to create flow. Today we will focus on the simple one, which is better for prototyping. First, we need to create <code>new NifiFlowBuilder()<\/code> and add some nodes to it:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"scala\">new NifiFlowBuilder()\r\n  .addNode(\r\n    \"ListFile\",\r\n    new ListFile,\r\n    Map(ListFile.DIRECTORY.getName -> s\"$testDir\/person\/\")\r\n  )\r\n  .addNode(\"FetchFile\", new FetchFile, Map())\r\n  .addNode(\r\n    \"SplitText\",\r\n    new Split Text,\r\n    Map(SplitText.LINE_SPLIT_COUNT.getName -> \"1\")\r\n  )\r\n  .addNode(\r\n    \"ValidateXml\",\r\n    new ValidateXml,\r\n    Map(ValidateXml.SCHEMA_FILE.getName -> s\"$testDir\/person-schema.xsd\")\r\n  )\r\n  .addNode(\r\n    \"TransformXml\",\r\n    new TransformXml,\r\n    Map(TransformXml.XSLT_FILE_NAME.getName -> s\"$testDir\/xml-to-json.xsl\")\r\n  )\r\n<\/pre>\n<p>First, you need to specify the node name, which allows you to identify your node. It can be the same as the class name as long as you don\u2019t use the same processor twice in flow. Then of course you need to specify the processor with all the parameters as simple <code>Map[String, String]<\/code>. The safest way to do this is to use processor\u2019s class fields of type <code>PropertyDescriptor<\/code>. Unfortunately, they\u2019re not always public, so sometimes you have written their name \u2013 instead of <code>ListFile.DIRECTORY.getName<\/code> we could write <code>Input Directory<\/code>.<\/p>\n<p>When all the nodes are specified, we can add connections between them using node names. The first parameter is the source node, the second is a destination and the third is the selected relationship. Again, using processor class fields is less error prone but sometimes these fields are not public (for instance <code>ListFile<\/code> or <code>FetchFile<\/code> processor)<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"scala\">.addConnection(\"ListFile\", \"FetchFile\", \"success\")\r\n.addConnection(\"FetchFile\", \"SplitText\", \"success\")\r\n.addConnection(\"SplitText\", \"ValidateXml\", SplitText.REL_SPLITS)\r\n.addConnection(\"ValidateXml\", \"TransformXml\", ValidateXml.REL_VALID)\r\n.addOutputConnection(\"TransformXml\", \"success\")\r\n<\/pre>\n<p>In the end, we add a connection to the output port to get the results. Everything looks good. Now just call <code>build<\/code> method to create flow. The entire code looks like this:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"scala\">val flow = new NifiFlowBuilder()\r\n      .addNode(\"ListFile\", new ListFile, Map(ListFile.DIRECTORY.getName -> s\"$testDir\/person\/\"))\r\n      .addNode(\"FetchFile\", new FetchFile, Map())\r\n      .addNode(\"SplitText\", new SplitText, Map(SplitText.LINE_SPLIT_COUNT.getName -> \"1\"))\r\n      .addNode(\"ValidateXml\", new ValidateXml, Map(ValidateXml.SCHEMA_FILE.getName -> s\"$testDir\/person-schema.xsd\"))\r\n      .addNode(\"TransformXml\", new TransformXml, Map(TransformXml.XSLT_FILE_NAME.getName -> s\"$testDir\/xml-to-json.xsl\"))\r\n      .addConnection(\"ListFile\", \"FetchFile\", \"success\")\r\n      .addConnection(\"FetchFile\", \"SplitText\", \"success\")\r\n      .addConnection( \"SplitText\", \"ValidateXml\", SplitText.REL_SPLITS)\r\n      .addConnection(\"ValidateXml\", \"TransformXml\", ValidateXml.REL_VALID)\r\n      .addOutputConnection(\"TransformXml\", \"success\")\r\n      .build()\r\n<\/pre>\n<p>Now we just need to run the flow, collect the results and verify them:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"scala\">flow.run() \r\nval files = flow.executionResult.outputFlowFiles \r\nfiles.head.assertContentEquals(\"\"\"{\"person\":{\"name\":\"Foo\",\"type\":\"student\",\"age\":31}}\"\"\")\r\n<\/pre>\n<p>Disclaimer: this is not the best way to test if JSON is correct, but it was done for simplicity.<\/p>\n<h2 id=\"conclusion\">Conclusion<\/h2>\n<p>Apache NiFi seems to be perfect unless you start a serious data integration. Without the ability to test your changes fast, you will become extremely frustrated with clicking every time your client notices a bug. Sometimes it may be a bug of the NiFi itself, but you will never know it until you debug the code. That\u2019s something our library can help you with. You can find it here: <a href=\"https:\/\/github.com\/TouK\/plumber\">https:\/\/github.com\/TouK\/plumber<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"Introduction Some time has passed since we wrote our last blogpost about Apache NiFi where we pointed out&hellip;\n","protected":false},"author":74,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11],"tags":[252,113,30],"class_list":["post-13544","post","type-post","status-publish","format-standard","category-development-design","tag-big-data","tag-scala","tag-testing"],"_links":{"self":[{"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/posts\/13544","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/users\/74"}],"replies":[{"embeddable":true,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/comments?post=13544"}],"version-history":[{"count":45,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/posts\/13544\/revisions"}],"predecessor-version":[{"id":15450,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/posts\/13544\/revisions\/15450"}],"wp:attachment":[{"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/media?parent=13544"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/categories?post=13544"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/tags?post=13544"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}