Subtle feature of CONNECT BY

Hierarchical queries (CONNECT BY … START WITH …) are used when retrieving data from tree-like structures. You use them if for example you want to retrieve all people who are directly or indirectly below a certain person in a company hierarchy. However, if you use joins and add some other conditions in the WHERE clause then the effect of the query is not obvious.

The good description of hierarchical queries can be found on the page

http://download.oracle.com/docs/cd/B10501_01/server.920/a96540/queries4a.htm#2053937 There are 2 important points in this article:
* If the|WHERE| predicate contains a join, Oracle applies the join predicates /before/ doing the |CONNECT| |BY| processing.
* If the |WHERE| clause does not contain a join, Oracle applies all predicates other than the|CONNECT| |BY| predicates _/after/_doing the |CONNECT| |BY|processing without affecting the other rows of the hierarchy. What does it mean? How does it work? In order to explain this I will describe the error I came across before I understood how the hierarchical queries really work. The query was meant to return all direct and not direct superiors for a given customer (some customers work in companies).

SELECT c.* FROM customer_all c
CONNECT BY PRIOR c.customer_id_high = c.customer_id
START WITH c.customer_id = :cust_id

This worked fine. However we had to change the query in order to return billing address of customers. Each customer can have many addresses, but only one of them is a billing one. Such address has the flag CCBILL set to ‘X’. Thus, the query was changed into this one:

SELECT level, c.*, cc.*
  FROM customer_all c,
       ccontact_all cc
 WHERE c.customer_id = cc.customer_id
   AND cc.ccbill = 'X'
CONNECT BY PRIOR c.customer_id_high = c.customer_id
START WITH c.customer_id = :cust_id

That was a mistake. The condition c.customer_id = cc.customer_id was applied before CONNECT BY whearas cc.ccbill = ‘X’ was executed after CONNECT BY. A customer (let’s suppose his/her customer_id = 123456) had two addresses, while his superior had one address and his superior’s superior had also one address. The query was executed in the following way (I omit stage irrelevant to this example): Stage 1: 2 records are found for the client 123456: one for his billing address and one for another address. They have LEVEL=1 Stage 2: CONNECT BY clause is applied to both records with LEVEL=1 from the previous stage. Thus, the superior of 123456 is added twice to the result set and there are two records with LEVEL=2 Stage 3: CONNECT BY clause is applied to both records with LEVEL=2 from the previous stage. Thus the superior of the superior of 123456 is added twice to the result set and there are two records with LEVEL=3 Stage 4: The non-billing address of customer 123456 is removed from the result set, but his superiors records which were added because of this record are not removed, as their address is not a billing one (they do not have any non-billing address). As a result, the query returned 5 records, although it was expected to return 3 records. The records of the superior and superior’s superior were duplicated because of the second address of 123456. The record with the second address was removed but the branch it created was preserved. The solution was to check in CONNECT BY clause if the address in the PRIOR record is the billing one

SELECT level, c.*, cc.*
  FROM customer_all c, ccontact_all cc
 WHERE c.customer_id = cc.customer_id
   AND cc.ccbill = 'X'
CONNECT BY PRIOR c.customer_id_high = c.customer_id
       AND PRIOR cc.ccbill = 'X'
 START WITH c.customer_id = :cust_id

There is a clearer solution available – to make a join with addresses table after executing the hierarchical query:

SELECT *
  FROM (SELECT c. *
          FROM customer_all c,
        CONNECT BY PRIOR c.customer_id_high = c.customer_id
         START WITH c.customer_id = :cust_id) c,
       ccontract_all cc
 WHERE c.customer_id = cc.customer_id
   AND cc.ccbill = 'X'

There is another interesting issue. When I changed the join to ANSI style (JOIN … ON instead of condition in the WHERE clause) then the execution time increased a few hundred thousand times. It seems that Oracle 9 on which I tested this query does not cope well with ANSI join syntax. This is not a surprise. Once we had to rapidly convert many views in a system. They used ANSI-style joins and when they were used in queries using Oracle-style joins then their performance dropped rapidly. We had to convert the views to use Oracle-style joins.

You May Also Like

Multi phased processing in scala

Last time in our project we had to add progress bar for visualization of long time running process. Process was made of a few phases and we had to print in which phase we currently are. In first step we conclude that we need to create a class of Progre...

Spock, Java and Maven

Few months ago I've came across Groovy - powerful language for JVM platform which combines the power of Java with abilities typical for scripting languages (dynamic typing, metaprogramming).

Together with Groovy I've discovered spock framework (https://code.google.com/p/spock/) - specification framework for Groovy (of course you can test Java classes too!). But spock is not only test/specification framework - it also contains powerful mocking tools.

Even though spock is dedicated for Groovy there is no problem with using it for Java classes tests. In this post I'm going to describe how to configure Maven project to build and run spock specifications together with traditional JUnit tests.


Firstly, we need to prepare pom.xml and add necessary dependencies and plugins.

Two obligatory libraries are:
<dependency>
<groupid>org.spockframework</groupId>
<artifactid>spock-core</artifactId>
<version>0.7-groovy-2.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupid>org.codehaus.groovy</groupId>
<artifactid>groovy-all</artifactId>
<version>${groovy.version}</version>
<scope>test</scope>
</dependency>
Where groovy.version is property defined in pom.xml for more convenient use and easy version change, just like this:
<properties>
<gmaven-plugin.version>1.4</gmaven-plugin.version>
<groovy.version>2.1.5</groovy.version>
</properties>

I've added property for gmaven-plugin version for the same reason ;)

Besides these two dependencies, we can use few additional ones providing extra functionality:
  • cglib - for class mocking
  • objenesis - enables mocking classes without default constructor
To add them to the project put these lines in <dependencies> section of pom.xml:
<dependency>
<groupid>cglib</groupId>
<artifactid>cglib-nodep</artifactId>
<version>3.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupid>org.objenesis</groupId>
<artifactid>objenesis</artifactId>
<version>1.3</version>
<scope>test</scope>
</dependency>

And that's all for dependencies section. Now we will focus on plugins necessary to compile Groovy classes. We need to add gmaven-plugin with gmaven-runtime-2.0 dependency in plugins section:
<plugin>
<groupid>org.codehaus.gmaven</groupId>
<artifactid>gmaven-plugin</artifactId>
<version>${gmaven-plugin.version}</version>
<configuration>
<providerselection>2.0</providerSelection>
</configuration>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
<dependencies>
<dependency>
<groupid>org.codehaus.gmaven.runtime</groupId>
<artifactid>gmaven-runtime-2.0</artifactId>
<version>${gmaven-plugin.version}</version>
<exclusions>
<exclusion>
<groupid>org.codehaus.groovy</groupId>
<artifactid>groovy-all</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupid>org.codehaus.groovy</groupId>
<artifactid>groovy-all</artifactId>
<version>${groovy.version}</version>
</dependency>
</dependencies>
</plugin>

With these configuration we can use spock and write our first specifications. But there is one issue: default settings for maven-surefire plugin demand that test classes must end with "..Test" postfix, which is ok when we want to use such naming scheme for our spock tests. But if we want to name them like CommentSpec.groovy or whatever with "..Spec" ending (what in my opinion is much more readable) we need to make little change in surefire plugin configuration:
<plugin>
<groupid>org.apache.maven.plugins</groupId>
<artifactid>maven-surefire-plugin</artifactId>
<version>2.15</version>
<configuration>
<includes>
<include>**/*Test.java</include>
<include>**/*Spec.java</include>
</includes>
</configuration>
</plugin>

As you can see there is a little trick ;) We add include directive for standard Java JUnit test ending with "..Test" postfix, but there is also an entry for spock test ending with "..Spec". And there is a trick: we must write "**/*Spec.java", not "**/*Spec.groovy", otherwise Maven will not run spock tests (which is strange and I've spent some time to figure out why Maven can't run my specs).

Little update: instead of "*.java" postfix for both types of tests we can write "*.class" what is in my opinion more readable and clean:
<include>**/*Test.class</include>
<include>**/*Spec.class</include>
(thanks to Tomek Pęksa for pointing this out!)

With such configuration, we can write either traditional JUnit test and put them in src/test/java directory or groovy spock specifications and place them in src/test/groovy. And both will work together just fine :) In one of my next posts I'll write something about using spock and its mocking abilities in practice, so stay in tune.

Confitura 2013 afterthoughts

Confitura, the biggest free-of-charge Java conference in Europe, took place on the 6th of July in Warsaw. TouK's presence was heavy, with 5 separate talks, all chosen in call for papers, no sponsored bullshit. We were sponsoring deck chairs during the...Confitura, the biggest free-of-charge Java conference in Europe, took place on the 6th of July in Warsaw. TouK's presence was heavy, with 5 separate talks, all chosen in call for papers, no sponsored bullshit. We were sponsoring deck chairs during the...