SortedSet + Joda DateTime == danger

It’s been quite a long time since I wrote something on this blog… Two things occurred that made me do this. Firstly, I’m going to talk at Java Developer’s Conference in Cairo and at Booster conference in Bergen next month, so I want to have some co…

It’s been quite a long time since I wrote something on this blog… Two things occurred that made me do this.
Firstly, I’m going to talk at Java Developer’s Conference in Cairo and at Booster conference in Bergen next month, so I want to have some content when I put a link at my slides ;)
Secondly, last week I encountered really weird situation. In fact it was endless loop.
Yep.
In was in rather critical place of our app and it was on semi-production environment so it was quite embarassing. What’s more, the code was working before, it was untouched for about half a year, and it had pretty good test coverage. It looked more or less like this (I’ve left some stuff out, so now it looks too complex for it’s task):

def findDates(dates:SortedSet[DateTime],a:List[DateTime])=
  if (dates.isEmpty || dates.head.toMilis < date) {
    (dates, a)
  } else {
    findDates(dates - dates.head, a+dates.head)
  }

Just simple tail recursion, how can it loop endlessly? It turns out it can. Actually, for some specific data dates – dates.head == dates.
Why? The reason is DateTime is not consistent with equals. If you look into Comparable definition, it says:

It is strongly recommended (though not required) that natural orderings be consistent with equals. This is so because sorted sets (and sorted maps) without explicit comparators behave “strangely” when they are used with elements (or keys) whose natural ordering is inconsistent with equals. In particular, such a sorted set (or sorted map) violates the general contract for set (or map), which is defined in terms of the equals method.

What does this mean? That you should only use sorted collections for classes that satisfy following:if a.compareTo(b) == 0 then a.equals(b) == true And in joda’s DateTime javadoc you can read:

Compares this object with the specified object for ascending millisecond instant order. This ordering is inconsistent with equals, as it ignores the Chronology.

And it turns out that this was our case – in our data there were dates that were equal with respect to miliseconds, but in different timezones. What’s more, not every pair of such dates can lead to disaster. They have to cause some mess in underlying black-red tree… The solution was to introduce some wrapper (we used it anyway actually) that defined comparison consistent with equality…

You May Also Like

Read emails from imap with Spring Intergration

What's the easiest way to read emails from IMAP account in Java? Depends what your background is. If you have any experience in Apache Camel, ServiceMix, Mule, you already know the answer. If you don't, and your application is using Spring alr...What's the easiest way to read emails from IMAP account in Java? Depends what your background is. If you have any experience in Apache Camel, ServiceMix, Mule, you already know the answer. If you don't, and your application is using Spring alr...

Grails render as JSON catch

One of a reasons your controller doesn't render a proper response in JSON format might be wrong package name that you use. It is easy to overlook. Import are on top of a file, you look at your code and everything seems to be fine. Except response is still not in JSON format.

Consider this simple controller:

class RestJsonCatchController {
def grailsJson() {
render([first: 'foo', second: 5] as grails.converters.JSON)
}

def netSfJson() {
render([first: 'foo', second: 5] as net.sf.json.JSON)
}
}

And now, with finger crossed... We have a winner!

$ curl localhost:8080/example/restJsonCatch/grailsJson
{"first":"foo","second":5}
$ curl localhost:8080/example/restJsonCatch/netSfJson
{first=foo, second=5}

As you can see only grails.converters.JSON converts your response to JSON format. There is no such converter for net.sf.json.JSON, so Grails has no converter to apply and it renders Map normally.

Conclusion: always carefully look at your imports if you're working with JSON in Grails!

Edit: Burt suggested that this is a bug. I've submitted JIRA issue here: GRAILS-9622 render as class that is not a codec should throw exception

Recently at storm-users

I've been reading through storm-users Google Group recently. This resolution was heavily inspired by Adam Kawa's post "Football zero, Apache Pig hero". Since I've encountered a lot of insightful and very interesting information I've decided to describe some of those in this post.

  • nimbus will work in HA mode - There's a pull request open for it already... but some recent work (distributing topology files via Bittorrent) will greatly simplify the implementation. Once the Bittorrent work is done we'll look at reworking the HA pull request. (storm’s pull request)

  • pig on storm - Pig on Trident would be a cool and welcome project. Join and groupBy have very clear semantics there, as those concepts exist directly in Trident. The extensions needed to Pig are the concept of incremental, persistent state across batches (mirroring those concepts in Trident). You can read a complete proposal.

  • implementing topologies in pure python with petrel looks like this:

class Bolt(storm.BasicBolt):
    def initialize(self, conf, context):
       ''' This method executed only once '''
        storm.log('initializing bolt')

    def process(self, tup):
       ''' This method executed every time a new tuple arrived '''       
       msg = tup.values[0]
       storm.log('Got tuple %s' %msg)

if __name__ == "__main__":
    Bolt().run()
  • Fliptop is happy with storm - see their presentation here

  • topology metrics in 0.9.0: The new metrics feature allows you to collect arbitrarily custom metrics over fixed windows. Those metrics are exported to a metrics stream that you can consume by implementing IMetricsConsumer and configure with Config.java#L473. Use TopologyContext#registerMetric to register new metrics.

  • storm vs flume - some users' point of view: I use Storm and Flume and find that they are better at different things - it really depends on your use case as to which one is better suited. First and foremost, they were originally designed to do different things: Flume is a reliable service for collecting, aggregating, and moving large amounts of data from source to destination (e.g. log data from many web servers to HDFS). Storm is more for real-time computation (e.g. streaming analytics) where you analyse data in flight and don't necessarily land it anywhere. Having said that, Storm is also fault-tolerant and can write to external data stores (e.g. HBase) and you can do real-time computation in Flume (using interceptors)

That's all for this day - however, I'll keep on reading through storm-users, so watch this space for more info on storm development.

I've been reading through storm-users Google Group recently. This resolution was heavily inspired by Adam Kawa's post "Football zero, Apache Pig hero". Since I've encountered a lot of insightful and very interesting information I've decided to describe some of those in this post.

  • nimbus will work in HA mode - There's a pull request open for it already... but some recent work (distributing topology files via Bittorrent) will greatly simplify the implementation. Once the Bittorrent work is done we'll look at reworking the HA pull request. (storm’s pull request)

  • pig on storm - Pig on Trident would be a cool and welcome project. Join and groupBy have very clear semantics there, as those concepts exist directly in Trident. The extensions needed to Pig are the concept of incremental, persistent state across batches (mirroring those concepts in Trident). You can read a complete proposal.

  • implementing topologies in pure python with petrel looks like this:

class Bolt(storm.BasicBolt):
    def initialize(self, conf, context):
       ''' This method executed only once '''
        storm.log('initializing bolt')

    def process(self, tup):
       ''' This method executed every time a new tuple arrived '''       
       msg = tup.values[0]
       storm.log('Got tuple %s' %msg)

if __name__ == "__main__":
    Bolt().run()
  • Fliptop is happy with storm - see their presentation here

  • topology metrics in 0.9.0: The new metrics feature allows you to collect arbitrarily custom metrics over fixed windows. Those metrics are exported to a metrics stream that you can consume by implementing IMetricsConsumer and configure with Config.java#L473. Use TopologyContext#registerMetric to register new metrics.

  • storm vs flume - some users' point of view: I use Storm and Flume and find that they are better at different things - it really depends on your use case as to which one is better suited. First and foremost, they were originally designed to do different things: Flume is a reliable service for collecting, aggregating, and moving large amounts of data from source to destination (e.g. log data from many web servers to HDFS). Storm is more for real-time computation (e.g. streaming analytics) where you analyse data in flight and don't necessarily land it anywhere. Having said that, Storm is also fault-tolerant and can write to external data stores (e.g. HBase) and you can do real-time computation in Flume (using interceptors)

That's all for this day - however, I'll keep on reading through storm-users, so watch this space for more info on storm development.