Subtle feature of CONNECT BY

Hierarchical queries (CONNECT BY … START WITH …) are used when retrieving data from tree-like structures. You use them if for example you want to retrieve all people who are directly or indirectly below a certain person in a company hierarchy. However, if you use joins and add some other conditions in the WHERE clause then the effect of the query is not obvious.

The good description of hierarchical queries can be found on the page

http://download.oracle.com/docs/cd/B10501_01/server.920/a96540/queries4a.htm#2053937 There are 2 important points in this article:
* If the|WHERE| predicate contains a join, Oracle applies the join predicates /before/ doing the |CONNECT| |BY| processing.
* If the |WHERE| clause does not contain a join, Oracle applies all predicates other than the|CONNECT| |BY| predicates _/after/_doing the |CONNECT| |BY|processing without affecting the other rows of the hierarchy. What does it mean? How does it work? In order to explain this I will describe the error I came across before I understood how the hierarchical queries really work. The query was meant to return all direct and not direct superiors for a given customer (some customers work in companies).

SELECT c.* FROM customer_all c
CONNECT BY PRIOR c.customer_id_high = c.customer_id
START WITH c.customer_id = :cust_id

This worked fine. However we had to change the query in order to return billing address of customers. Each customer can have many addresses, but only one of them is a billing one. Such address has the flag CCBILL set to ‘X’. Thus, the query was changed into this one:

SELECT level, c.*, cc.*
  FROM customer_all c,
       ccontact_all cc
 WHERE c.customer_id = cc.customer_id
   AND cc.ccbill = 'X'
CONNECT BY PRIOR c.customer_id_high = c.customer_id
START WITH c.customer_id = :cust_id

That was a mistake. The condition c.customer_id = cc.customer_id was applied before CONNECT BY whearas cc.ccbill = ‘X’ was executed after CONNECT BY. A customer (let’s suppose his/her customer_id = 123456) had two addresses, while his superior had one address and his superior’s superior had also one address. The query was executed in the following way (I omit stage irrelevant to this example): Stage 1: 2 records are found for the client 123456: one for his billing address and one for another address. They have LEVEL=1 Stage 2: CONNECT BY clause is applied to both records with LEVEL=1 from the previous stage. Thus, the superior of 123456 is added twice to the result set and there are two records with LEVEL=2 Stage 3: CONNECT BY clause is applied to both records with LEVEL=2 from the previous stage. Thus the superior of the superior of 123456 is added twice to the result set and there are two records with LEVEL=3 Stage 4: The non-billing address of customer 123456 is removed from the result set, but his superiors records which were added because of this record are not removed, as their address is not a billing one (they do not have any non-billing address). As a result, the query returned 5 records, although it was expected to return 3 records. The records of the superior and superior’s superior were duplicated because of the second address of 123456. The record with the second address was removed but the branch it created was preserved. The solution was to check in CONNECT BY clause if the address in the PRIOR record is the billing one

SELECT level, c.*, cc.*
  FROM customer_all c, ccontact_all cc
 WHERE c.customer_id = cc.customer_id
   AND cc.ccbill = 'X'
CONNECT BY PRIOR c.customer_id_high = c.customer_id
       AND PRIOR cc.ccbill = 'X'
 START WITH c.customer_id = :cust_id

There is a clearer solution available – to make a join with addresses table after executing the hierarchical query:

SELECT *
  FROM (SELECT c. *
          FROM customer_all c,
        CONNECT BY PRIOR c.customer_id_high = c.customer_id
         START WITH c.customer_id = :cust_id) c,
       ccontract_all cc
 WHERE c.customer_id = cc.customer_id
   AND cc.ccbill = 'X'

There is another interesting issue. When I changed the join to ANSI style (JOIN … ON instead of condition in the WHERE clause) then the execution time increased a few hundred thousand times. It seems that Oracle 9 on which I tested this query does not cope well with ANSI join syntax. This is not a surprise. Once we had to rapidly convert many views in a system. They used ANSI-style joins and when they were used in queries using Oracle-style joins then their performance dropped rapidly. We had to convert the views to use Oracle-style joins.

You May Also Like

Using WsLite in practice

TL;DR

There is a example working GitHub project which covers unit testing and request/response logging when using WsLite.

Why Groovy WsLite ?

I’m a huge fan of Groovy WsLite project for calling SOAP web services. Yes, in a real world you have to deal with those - big companies have huge amount of “legacy” code and are crazy about homogeneous architecture - only SOAP, Java, Oracle, AIX…

But I also never been comfortable with XFire/CXF approach of web service client code generation. I wrote a bit about other posibilites in this post. With JAXB you can also experience some freaky classloading errors - as Tomek described on his blog. In a large commercial project the “the less code the better” principle is significant. And the code generated from XSD could look kinda ugly - especially more complicated structures like sequences, choices, anys etc.

Using WsLite with native Groovy concepts like XmlSlurper could be a great choice. But since it’s a dynamic approach you have to be really careful - write good unit tests and log requests. Below are my few hints for using WsLite in practice.

Unit testing

Suppose you have some invocation of WsLite SOAPClient (original WsLite example):

def getMothersDay(long _year) {
    def response = client.send(SOAPAction: action) {
       body {
           GetMothersDay('xmlns':'http://www.27seconds.com/Holidays/US/Dates/') {
              year(_year)
           }
       }
    }
    response.GetMothersDayResponse.GetMothersDayResult.text()
}

How can the unit test like? My suggestion is to mock SOAPClient and write a simple helper to test that builded XML is correct. Example using great SpockFramework:

void setup() {
   client = Mock(SOAPClient)
   service.client = client
}

def "should pass year to GetMothersDay and return date"() {
  given:
      def year = 2013
  when:
      def date = service.getMothersDay(year)
  then:
      1 * client.send(_, _) >> { Map params, Closure requestBuilder ->
            Document doc = buildAndParseXml(requestBuilder)
            assertXpathEvaluatesTo("$year", '//ns:GetMothersDay/ns:year', doc)
            return mockResponse(Responses.mothersDay)
      }
      date == "2013-05-12T00:00:00"
}

This uses a real cool feature of Spock - even when you mock the invocation with “any mark” (_), you are able to get actual arguments. So we can build XML that would be passed to SOAPClient's send method and check that specific XPaths are correct:

void setup() {
    engine = XMLUnit.newXpathEngine()
    engine.setNamespaceContext(new SimpleNamespaceContext(namespaces()))
}

protected Document buildAndParseXml(Closure xmlBuilder) {
    def writer = new StringWriter()
    def builder = new MarkupBuilder(writer)
    builder.xml(xmlBuilder)
    return XMLUnit.buildControlDocument(writer.toString())
}

protected void assertXpathEvaluatesTo(String expectedValue,
                                      String xpathExpression, Document doc) throws XpathException {
    Assert.assertEquals(expectedValue,
            engine.evaluate(xpathExpression, doc))
}

protected Map namespaces() {
    return [ns: 'http://www.27seconds.com/Holidays/US/Dates/']
}

The XMLUnit library is used just for XpathEngine, but it is much more powerful for comparing XML documents. The NamespaceContext is needed to use correct prefixes (e.g. ns:GetMothersDay) in your Xpath expressions.

Finally - the mock returns SOAPResponse instance filled with envelope parsed from some constant XML:

protected SOAPResponse mockResponse(String resp) {
    def envelope = new XmlSlurper().parseText(resp)
    new SOAPResponse(envelope: envelope)
}

Request and response logging

The WsLite itself doesn’t use any logging framework. We usually handle it by adding own sendWithLogging method:

private SOAPResponse sendWithLogging(String action, Closure cl) {
    SOAPResponse response = client.send(SOAPAction: action, cl)
    log(response?.httpRequest, response?.httpResponse)
    return response
}

private void log(HTTPRequest request, HTTPResponse response) {
    log.debug("HTTPRequest $request with content:\n${request?.contentAsString}")
    log.debug("HTTPResponse $response with content:\n${response?.contentAsString}")
}

This logs the actual request and response send through SOAPClient. But it logs only when invocation is successful and errors are much more interesting… So here goes withExceptionHandler method:

private SOAPResponse withExceptionHandler(Closure cl) {
    try {
        cl.call()
    } catch (SOAPFaultException soapEx) {
        log(soapEx.httpRequest, soapEx.httpResponse)
        def message = soapEx.hasFault() ? soapEx.fault.text() : soapEx.message
        throw new InfrastructureException(message)
    } catch (HTTPClientException httpEx) {
        log(httpEx.request, httpEx.response)
        throw new InfrastructureException(httpEx.message)
    }
}
def send(String action, Closure cl) {
    withExceptionHandler {
        sendWithLogging(action, cl)
    }
}

XmlSlurper gotchas

Working with XML document with XmlSlurper is generally great fun, but is some cases could introduce some problems. A trivial example is parsing an id with a number to Long value:

def id = Long.valueOf(edit.'@id' as String)

The Attribute class (which edit.'@id' evaluates to) can be converted to String using as operator, but converting to Long requires using valueOf.

The second example is a bit more complicated. Consider following XML fragment:

<edit id="3">
   <params>
      <param value="label1" name="label"/>
      <param value="2" name="param2"/>
   </params>
   <value>123</value>
</edit>
<edit id="6">
   <params>
      <param value="label2" name="label"/>
      <param value="2" name="param2"/>
   </params>
   <value>456</value>
</edit>

We want to find id of edit whose label is label1. The simplest solution seems to be:

def param = doc.edit.params.param.find { it['@value'] == 'label1' }
def edit = params.parent().parent()

But it doesn’t work! The parent method returns multiple edits, not only the one that is parent of given param

Here’s the correct solution:

doc.edit.find { edit ->
    edit.params.param.find { it['@value'] == 'label1' }
}

Example

The example working project covering those hints could be found on GitHub.

New HTTP Logger Grails plugin

I've wrote a new Grails plugin - httplogger. It logs:

  • request information (url, headers, cookies, method, body),
  • grails dispatch information (controller, action, parameters),
  • response information (elapsed time and body).

It is mostly useful for logging your REST traffic. Full HTTP web pages can be huge to log and generally waste your space. I suggest to map all of your REST controllers with the same path in UrlMappings, e.g. /rest/ and configure this plugin with this path.

Here is some simple output just to give you a taste of it.

17:16:00,331 INFO  filters.LogRawRequestInfoFilter  - 17:16:00,340 INFO  filters.LogRawRequestInfoFilter  - 17:16:00,342 INFO  filters.LogGrailsUrlsInfoFilter  - 17:16:00,731 INFO  filters.LogOutputResponseFilter  - >> #1 returned 200, took 405 ms.
17:16:00,745 INFO filters.LogOutputResponseFilter - >> #1 responded with '{count:0}'
17:18:55,799 INFO  filters.LogRawRequestInfoFilter  - 17:18:55,799 INFO  filters.LogRawRequestInfoFilter  - 17:18:55,800 INFO  filters.LogRawRequestInfoFilter  - 17:18:55,801 INFO  filters.LogOutputResponseFilter  - >> #2 returned 404, took 3 ms.
17:18:55,802 INFO filters.LogOutputResponseFilter - >> #2 responded with ''

Official plugin information can be found on Grails plugins website here: http://grails.org/plugins/httplogger or you can browse code on github: TouK/grails-httplogger.