Complex flows with Apache Camel

At work, we’re mainly integrating services and systems, and since we’re on a constant lookout for new, better technologies, ways to do things easier, make them more sustainable, we’re trying to Usually we use Apache Camel for this task, which is a Swis…At work, we’re mainly integrating services and systems, and since we’re on a constant lookout for new, better technologies, ways to do things easier, make them more sustainable, we’re trying to Usually we use Apache Camel for this task, which is a Swis…

At work, we’re mainly integrating services and systems, and since we’re on a constant lookout for new, better technologies, ways to do things easier, make them more sustainable, we’re trying to Usually we use

Apache Camel for this task, which is a Swiss-knife for integration engineer. What’s more, this tools corresponds well with our approach to integration solutions:
* try to operate on XML messages, so you get the advantage of XPaths, XSL and other benefits,
* don’t convert XML into Java classes back and forth and be worried with problems like XML conversion,
* try to get a simple flow of the process. However, at first sight Apache Camel seems to have some drawbacks mainly in the area of practical solutions ;-). It’s very handy tool if you need to use it as a pipeline with some marginal processing of the data that passes through it. It gets a lot harder to wrap your head around if you consider some branching and intermediate calls to external services. This may be tricky to write properly in Camel’s DSL. Here is a simple pipeline example:

And here the exact scenario we’re discussing: What I’d like to show is the solution to this problem. Well, if you’re using a recent version of Camel this may be easier, a little different, but should still more-or-less work this way. This code is written for Apache Camel 1.4 – a rather antic version, but that’s what we’re forced to use. Oh, well. Ok, enough whining! So, I create a test class to illustrate the case. The route defined in TestRouter class is responsible for:
1. receiving input
2. setting exchange property to a given xpath, which effectively is the name of the first XML element in the input stream
3. than, the input data is sent to three different external services, each of them replies with some fictional data – notice routes a, b and c. The SimpleContentSetter processor is just for responding with a given text.
4. the response from all three services is somehow processed by RequestEnricher bean, which is described below
5. eventually the exchange is logged in specified category Here is some code for this:

public class SimpleTest {
    public void setUp() throws Exception {
        TestRouter tr = new TestRouter();
        ctx.addRoutes(tr);
    }

    @Test
    public void shouldCheck() throws Exception {
        ctx.createProducerTemplate().send("direct:in", getInOut(""));
    }

    class TestRouter extends RouteBuilder {

        public void configure() throws Exception {

            ((ProcessorType) from("direct:in")
                .setProperty("operation").xpath("local-name(/*)", String.class)
                .multicast(new MergeAggregationStrategy())
                .to("direct:a", "direct:b", "direct:c")
                .end()
                .setBody().simple("${in.body}"))
            .bean(RequestEnricher.class, "enrich")
                .to("log:pl.touk.debug");

            from("direct:a").process(new SimpleContentSetter(""));
            from("direct:b").process(new SimpleContentSetter(""));
            from("direct:c").process(new SimpleContentSetter(""));
        }
    }
}

What’s unusual in this code is the fact, that what normally Camel does when you write a piece of DSL like:

.to("direct:a", "direct:b", "direct:c")

is pass input to service

a, than a‘s output gets passed to b, becomes it’s input, than b‘s output becomes c‘s input. The problem being, you loose the output from a and b, not mentioning that you might want to send the same input to all three services. That’s where a little tool called multicast() comes in handy. It offers you the ability to aggregate the outputs of those services. You may even create an AggregationStrategy that will do it the way you like. Below class, MergeAggregationStrategy does exactly that kind of work – it joins outputs from all three services. A lot of info about proper use of AggregationStrategy-ies can be found in this post by Torsten Mielke.

public class MergeAggregationStrategy implements AggregationStrategy {

    public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {
        if (oldExchange.isFailed()) {
            return oldExchange;
        }
        transformMessage(oldExchange.getIn(), newExchange.getIn());
        transformMessage(oldExchange.getOut(), newExchange.getOut());
        return newExchange;
    }

    private void transformMessage(Message oldM, Message newM) {
        String oldBody = oldM.getBody(String.class);
        String newBody = newM.getBody(String.class);
        newM.setBody(oldBody + newBody);
    }

}

However nice this may look (or not), what you’re left with is a mix of multiple XMLs. Normally this won’t do you much good. Better thing to do is to parse this output in some way. What we’re using for this is a Groovy :). Which is great for the task of parsing XML. A lot less verbose than ordinary Java. Let’s assume a scenario, that the aggregated output, currently looking like this:


is to be processed with the following steps in mind:

  • use ** as the result element
  • use attributes param1, param2, param3 from element ** and add it to result element **
public class RequestEnricher {

    public String enrich(@Property(name = "operation") String operation, Exchange ex) {

        use(DOMCategory) {
            def dhl = new groovy.xml.Namespace("http://example.com/common/dhl/schema", 'dhl')
            def pc = new groovy.xml.Namespace("http://example.com/pc/types", 'pc')
            def doc = new XmlParser().parseText(ex.in.body)

            def pcRequest = doc.
            "aaaa" [0]

            ["param1", "param2", "param3"].each() {
                def node = doc.
                '**' [("" + it)][0]
                if (node)
                    pcRequest['@' + it] = node.text()
            }

            gNodeListToString([pcRequest])
        }

    }

    String gNodeListToString(list) {
        StringBuilder sb = new StringBuilder();
        list.each {
            listItem ->
                StringWriter sw = new StringWriter();
            new XmlNodePrinter(new PrintWriter(sw)).print(listItem)
            sb.append(sw.toString());
        }
        return sb.toString();
    }

}

 

What we’re doing here, especially the last line of

enrich method is the conversion to String. There are some problems for Camel if we spit out Groovy objects. The rest is just some Groovy specific ways of manipulating XML. But looking into enrich method’s parameters, there is @Property annotation used, which binds the property assigned earlier in a router code to one of the arguments. That is really cool feature and there are more such annotations:
* @XPath
* @Header
* @Headers and @Properties – gives whole maps of properties or headers This pretty much concludes the subject :) Have fun, and if in doubt, leave a comment with your question!

You May Also Like

Open IMS Core Mr interface

Open IMS Core does’t have standard way to define connection to MRF (Media Resource Function) on Mr interface.In IMS Mr interface is based on SIP and is similar to ISC used by Application Server (AS). Because of that we can define MRF as IMS AS and just add Wildcard PSI that has trigger on that AS. That [...]

Recently at storm-users

I've been reading through storm-users Google Group recently. This resolution was heavily inspired by Adam Kawa's post "Football zero, Apache Pig hero". Since I've encountered a lot of insightful and very interesting information I've decided to describe some of those in this post.

  • nimbus will work in HA mode - There's a pull request open for it already... but some recent work (distributing topology files via Bittorrent) will greatly simplify the implementation. Once the Bittorrent work is done we'll look at reworking the HA pull request. (storm’s pull request)

  • pig on storm - Pig on Trident would be a cool and welcome project. Join and groupBy have very clear semantics there, as those concepts exist directly in Trident. The extensions needed to Pig are the concept of incremental, persistent state across batches (mirroring those concepts in Trident). You can read a complete proposal.

  • implementing topologies in pure python with petrel looks like this:

class Bolt(storm.BasicBolt):
    def initialize(self, conf, context):
       ''' This method executed only once '''
        storm.log('initializing bolt')

    def process(self, tup):
       ''' This method executed every time a new tuple arrived '''       
       msg = tup.values[0]
       storm.log('Got tuple %s' %msg)

if __name__ == "__main__":
    Bolt().run()
  • Fliptop is happy with storm - see their presentation here

  • topology metrics in 0.9.0: The new metrics feature allows you to collect arbitrarily custom metrics over fixed windows. Those metrics are exported to a metrics stream that you can consume by implementing IMetricsConsumer and configure with Config.java#L473. Use TopologyContext#registerMetric to register new metrics.

  • storm vs flume - some users' point of view: I use Storm and Flume and find that they are better at different things - it really depends on your use case as to which one is better suited. First and foremost, they were originally designed to do different things: Flume is a reliable service for collecting, aggregating, and moving large amounts of data from source to destination (e.g. log data from many web servers to HDFS). Storm is more for real-time computation (e.g. streaming analytics) where you analyse data in flight and don't necessarily land it anywhere. Having said that, Storm is also fault-tolerant and can write to external data stores (e.g. HBase) and you can do real-time computation in Flume (using interceptors)

That's all for this day - however, I'll keep on reading through storm-users, so watch this space for more info on storm development.

I've been reading through storm-users Google Group recently. This resolution was heavily inspired by Adam Kawa's post "Football zero, Apache Pig hero". Since I've encountered a lot of insightful and very interesting information I've decided to describe some of those in this post.

  • nimbus will work in HA mode - There's a pull request open for it already... but some recent work (distributing topology files via Bittorrent) will greatly simplify the implementation. Once the Bittorrent work is done we'll look at reworking the HA pull request. (storm’s pull request)

  • pig on storm - Pig on Trident would be a cool and welcome project. Join and groupBy have very clear semantics there, as those concepts exist directly in Trident. The extensions needed to Pig are the concept of incremental, persistent state across batches (mirroring those concepts in Trident). You can read a complete proposal.

  • implementing topologies in pure python with petrel looks like this:

class Bolt(storm.BasicBolt):
    def initialize(self, conf, context):
       ''' This method executed only once '''
        storm.log('initializing bolt')

    def process(self, tup):
       ''' This method executed every time a new tuple arrived '''       
       msg = tup.values[0]
       storm.log('Got tuple %s' %msg)

if __name__ == "__main__":
    Bolt().run()
  • Fliptop is happy with storm - see their presentation here

  • topology metrics in 0.9.0: The new metrics feature allows you to collect arbitrarily custom metrics over fixed windows. Those metrics are exported to a metrics stream that you can consume by implementing IMetricsConsumer and configure with Config.java#L473. Use TopologyContext#registerMetric to register new metrics.

  • storm vs flume - some users' point of view: I use Storm and Flume and find that they are better at different things - it really depends on your use case as to which one is better suited. First and foremost, they were originally designed to do different things: Flume is a reliable service for collecting, aggregating, and moving large amounts of data from source to destination (e.g. log data from many web servers to HDFS). Storm is more for real-time computation (e.g. streaming analytics) where you analyse data in flight and don't necessarily land it anywhere. Having said that, Storm is also fault-tolerant and can write to external data stores (e.g. HBase) and you can do real-time computation in Flume (using interceptors)

That's all for this day - however, I'll keep on reading through storm-users, so watch this space for more info on storm development.