TouK Nussknacker – using Apache Flink made easier for analysts and business

Few weeks ago we (TouK) revealed on Github our latest open source project - Nussknacker. What is it and why should you care?

Why?


First, some history: more than year ago one of our clients decided it’s high time for Real Time Marketing. They had pretty large data streams and lots of ideas for marketing campaigns, so one of the key success factors was the ability to process the data fast. We prepared a POC based on Apache Flink and it turned out that it’s really great piece of technology - fast and accurate.

There was just one problem - how to write and customize processes? Flink, as many modern stream processing engines, provides rich and friendly DSL, both in Java and Scala. But for our client that was not really enough. See, they are enterprise that do not employ developers - they have many decent, competent analysts who know the business and SQL - but not Java or (Heaven forbid…) Scala.

Nussknacker to the rescue!


So we decided to create simple process designer for them. It looks more or less like this:

editor.png

You draw a diagram, fill the details (like filter expressions - “#input.usageData > 0” or SMS/mail content), then you press the Deploy button and voilà - your brand new process in running on Flink cluster!

Of course first somebody (that is, developer) has to prepare data sources (most probably Kafka topics), design data model (POJOs or case classes) and implement actions - like sending email or sending some event to another Kafka topic. But once model of data and external services are defined, an analyst can define and deploy processes all by him/herself.

More features


Sounds a bit scary to let your users run stream processes with GUI? Bad filter condition can have serious performance implications if you’re dealing with streams of tens of thousands of events per second... That’s why we let user test their diagrams first - each test case can be generated by sample of real data coming from e.g. Kafka and then run in Flink sandbox mini-cluster.

We have also many more features that make working with Nussknacker and Flink easier: subprocesses, versioning, generating PDF documentation, migration between environments and last but certainly not least - integration with InfluxDB/Grafana to provide detailed insight into how process is doing:

monitoring.png

Where can I use it?


What are Nussknacker use cases? Our main deployments deal with RTM (Real Time Marketing). One of our clients started with RTM and then found out that Nussknacker is also great choice for fraud detection in real time. Industries we are working with include telcos, banks and media companies. We are also thinking about other possibilities - for example IoT.

Sounds cool?


If you are interested in easy access for semi-technical analysts to streaming data - give Nussknacker a try!

You can find the code at Github: https://github.com/touk/nussknacker, we also have a nice, Docker-based quickstart: https://touk.github.io/nussknacker/Quickstart.html.

And if you are coming to Flink Forward next week in Berlin - join me on Wednesday afternoon to hear more - https://berlin.flink-forward.org/kb_sessions/touk-nussknacker-creating-flink-jobs-with-gui/.

In next days/weeks we’ll post more information on TouK blog both on Nussknacker architecture and internals and on interesting use cases - stay tuned :)

OSGi Blueprint visualization

What is blueprint?

Blueprint is a dependency injection framework for OSGi bundles. It could be written by hand or generated using Blueprint Maven Plugin. Blueprint file is only an XML describing beans, services and references. Each OSGi bundle could have one or more blueprint files.

Blueprint files represent architecture of our bundle. Let's visualize it using groovy script and graphviz available in my github repository and analyze.

Example generation

Pre: All you need is groovy and graphviz installed on your OS

I am working mostly with bundles with generated blueprint, so I will use blueprint file generated from Blueprint Maven Plugin tests as example. All examples are included in github repository.

Generation could be invoked by running run.sh script with given destination file prefix (png extension will be added to it) and path to blueprint file:

mkdir -p target

./run.sh target/fullBlueprint fullBlueprint.xml

Visualization is available here.

Separating domains

First if you look at the image, you see that some beans are grouped. You could easily extract such domains with tree roots: beanWithConfigurationProperties and beanWithCallbackMethods to separate blueprint files and bundles in future and generate images from them:

./run.sh target/beanWithCallbackMethods example/firstCut/beanWithCallbackMethods.xml
./run.sh target/beanWithConfigurationProperties example/firstCut/beanWithConfigurationProperties.xml
./run.sh target/otherStuff example/firstCut/otherStuff.xml

Now we have three, a bit cleaner, images: beanWithConfigurationProperties.png, beanWithCallbackMethods.png and otherStuff.png.

We also could generate image from more than one blueprint:

./run.sh target/joinFirstCut example/firstCut/otherStuff.xml example/firstCut/beanWithConfigurationProperties.xml example/firstCut/beanWithCallbackMethods.xml

And the result is here. The image contains beans grouped by file, but if you do not like it, you could force generation without such separation using option --no-group-by-file:

./run.sh target/joinFirstCutGrouped example/firstCut/otherStuff.xml example/firstCut/beanWithConfigurationProperties.xml example/firstCut/beanWithCallbackMethods.xml --no-group-by-file

It will generate image with all beans from all files.

Exclusion

Sometimes it is difficult to spot and extract other domains. It will be easier to do some experiments on blueprint. For example, bean my1 is a dependency for too many other beans. You could consider converting my1 bean to OSGi service and extracting it to another bundle.

Let's exclude my1 bean from generation via -e option and see what happens:

./run.sh target/otherStuffWithoutMy example/firstCut/otherStuff.xml -e my1

Result is available here. Now we see, that tree with root bean myFactoryBeanAsService could be separated and my1 could be inject to it as osgi service in another bundle.

You could exclude more than one bean adding -e switch for each of them, e. g. -e my1 -e m2 -e myBean123.

Conclusion

Blueprint is great for dependency injection for OSGi bundles, but it is easy to create quite big context containing many domains. It is much easier to recognize or search for such domains using blueprint visualizer script.


YOUR CODE HRER