xquery4j in action

In my previous article, I introduced a wrapper library for Saxon, xquery4j http://github.com/rafalrusin/xquery4j.
Here, I will explain how to use it to create an article generator in Java and XQuery for XHTML, called Article. You can download it here: http://github.com/rafalrusin/Article. It’s a simple DSL for article generation.

I think it is something worth noticing, because the whole project took me just a while to implement and has interesting features. Those are:

  • embedded code syntax highlighting for a lot of programming languages (using external program highlight),
  • creating href entries for links, so you don’t need to type URL twice
  • it integrates natively with XHTML constructs

This is an example of an input it takes:

<a:article xmlns='http://www.w3.org/1999/xhtml' xmlns:a="urn:article">
Some text
<a:code lang="xml"><![CDATA[

]]>

It generates XHTML output for it, using command

./run <input.xml >output.xhtml

The interesting thing is that XQuery expression for this transformation is very simple to do in Saxon. This is the complete code of it:

declare namespace a="urn:article";
declare default element namespace "http://www.w3.org/1999/xhtml";

declare function a:processLine($l) {
for $i in $l/node()
return
typeswitch ($i)
case element(a:link, xs:untyped) return <a href=“{$i/text()}”>{$i/text()}
default return $i
};

declare function a:articleItem($i) {
typeswitch ($i)
case element(a:l, xs:untyped) return (a:processLine($i),
)

case element(a:code, xs:untyped) return
( a:highlight($i/text(), $i/@lang)/body/* ,
)

default return “error;”
};

<html xmlns=“http://www.w3.org/1999/xhtml”>

<br /> a.xml


<link rel=“stylesheet” type=“text/css” href=“highlight.css”/>


{
for $i in a:article/*
return
a:articleItem($i)
}

Inside this expression, there is bound a:highlight Java function, which takes two strings on input (a code and a language) and returns DOM Node containing XHTML output from highlight command.
Since there is not much trouble with manipulating DOM using xquery4j, we can get as simple solution as this for a:highlight function:

public static class Mod {
public static Node highlight(final String code, String lang) throws Exception {
Validate.notNull(lang);
final Process p = new ProcessBuilder("highlight", "-X", "--syntax", lang).start();
Thread t = new Thread(new Runnable() {

public void run() {
try {
OutputStream out = p.getOutputStream();
IOUtils.write(code, out);
out.flush();
out.close();
} catch (IOException e) {
throw new RuntimeException(e);
}
}
});
t.start();
String result = IOUtils.toString(p.getInputStream());
t.join();
return DOMUtils.parse(result).getDocumentElement();
}
}

Please note that creating a separate thread for feeding input into highlight command is required, since Thread’s output queue is limited and potentially might lead to dead lock. So we need to concurrently collect output from spawned Process.
However at the end, when we need to convert a String to DOM and we use xquery4j’s DOMUtils.parse(result), so it’s a very simple construct.

You May Also Like

4Developers 2010 Review

I've been to 4Developers in 2009 in Cracow, together with Tomasz Przybysz and we had very nice impressions, no wonder then I wanted to signed up for 2010 edition in Poznań as well. Tomasz was sick, but Jakub Kurlenda decided to come with me. This time...I've been to 4Developers in 2009 in Cracow, together with Tomasz Przybysz and we had very nice impressions, no wonder then I wanted to signed up for 2010 edition in Poznań as well. Tomasz was sick, but Jakub Kurlenda decided to come with me. This time...

Phonegap / Cordova and cross domain ssl request problem on android.

In one app I have participated, there was a use case:
  • User fill up a form.
  • User submit the form.
  • System send data via https to server and show a response.
During development there wasn’t any problem, but when we were going to release production version then some unsuspected situation occurred. I prepare the production version accordingly with standard flow for Android environment:
  • ant release
  • align
  • signing
During conduct tests on that version, every time I try to submit the form, a connection error appear. In that situation, at the first you should check whitelist in cordova settings. Every URL you want to connect to, must be explicit type in:
res/xml/cordova.xml
If whitelist looks fine, the error is most likely caused by inner implementation of Android System. The Android WebView does not allow by default self-signed SSL certs. When app is debug-signed the SSL error is ignored, but if app is release-signed connection to untrusted services is blocked.



Workaround


You have to remember that secure connection to service with self-signed certificate is risky and unrecommended. But if you know what you are doing there is some workaround of the security problem. Behavior of method
CordovaWebViewClient.onReceivedSslError
must be changed.


Thus add new class extended CordovaWebViewClient and override ‘onReceivedSslError’. I strongly suggest to implement custom onReceiveSslError as secure as possible. I know that the problem occours when app try connect to example.domain.com and in spite of self signed certificate the domain is trusted, so only for that case the SslError is ignored.

public class MyWebViewClient extends CordovaWebViewClient {

   private static final String TAG = MyWebViewClient.class.getName();
   private static final String AVAILABLE_SLL_CN
= "example.domain.com";

   public MyWebViewClient(DroidGap ctx) {
       super(ctx);
   }

   @Override
   public void onReceivedSslError(WebView view,
SslErrorHandler handler,
android.net.http.SslError error) {

String errorSourceCName = error.getCertificate().
getIssuedTo().getCName();

       if( AVAILABLE_SLL_CN.equals(errorSourceCName) ) {
           Log.i(TAG, "Detect ssl connection error: " +
error.toString() +
„ so the error is ignored”);

           handler.proceed();
           return;
       }

       super.onReceivedSslError(view, handler, error);
   }
}
Next step is forcing yours app to  use custom implementation of WebViewClient.

public class Start extends DroidGap
{
   private static final String TAG = Start.class.getName();

   @Override
   public void onCreate(Bundle savedInstanceState)
   {
       super.onCreate(savedInstanceState);
       super.setIntegerProperty("splashscreen", R.drawable.splash);
       super.init();

       MyWebViewClient myWebViewClient = new MyWebViewClient(this);
       myWebViewClient.setWebView(this.appView);

       this.appView.setWebViewClient(myWebViewClient);
       
// yours code

   }
}
That is all ypu have to do if minSdk of yours app is greater or equals 8. In older version of Android there is no class
android.net.http.SslError
So in class MyCordovaWebViewClient class there are errors because compliator doesn’t see SslError class. Fortunately Android is(was) open source, so it is easy to find source of the class. There is no inpediments to ‘upgrade’ app and just add the file to project. I suggest to keep original packages. Thus after all operations the source tree looks like:

Class SslError placed in source tree. 
 Now the app created in release mode can connect via https to services with self-signed SSl certificates.

Zookeeper + Curator = Distributed sync

An application developed for one of my recent projects at TouK involved multiple servers. There was a requirement to ensure failover for the system’s components. Since I had already a few separate components I didn’t want to add more of that, and since there already was a Zookeeper ensemble running - required by one of the services, I’ve decided to go that way with my solution.

What is Zookeeper?

Just a crude distributed synchronization framework. However, it implements Paxos-style algorithms (http://en.wikipedia.org/wiki/Paxos_(computer_science)) to ensure no split-brain scenarios would occur. This is quite an important feature, since I don’t have to care about that kind of problems while using this app. You just need to create an ensemble of a couple of its instances - to ensure high availability. It is basically a virtual filesystem, with files, directories and stuff. One could ask why another filesystem? Well this one is a rather special one, especially for distributed systems. The reason why creating all the locking algorithms on top of Zookeeper is easy is its Ephemeral Nodes - which are just files that exist as long as connection for them exists. After it disconnects - such file disappears.

With such paradigms in place it’s fairly easy to create some high level algorithms for synchronization.

Having that in place, it can safely integrate multiple services ensuring loose coupling in a distributed way.

Zookeeper from developer’s POV

With all the base services for Zookeeper started, it seems there is nothing else, than just connect to it and start implementing necessary algorithms. Unfortunately, the API is quite basic and offers files and directories abstractions with the addition of different node type (file types) - ephemeral and sequence. It is also possible to watch a node for changes.

Using bare Zookeeper is hard!

Creating connections is tedious - and there is lots of things to take care of. Handling an established connection is hard - when establishing connection to ensemble, it’s necessary to negotiate a session also. During the whole process a number of exceptions can occur - these are “recoverable” exceptions, that can be gracefully handled and not break the connection.

    class="c8"><span>So, Zookeeper API is hard.</span></p><p class="c1"><span></span></p><p class="c8"><span>Even if one is proficient with that API, then there come recipes. The reason for using Zookeeper is to be able to implement some more sophisticated algorithms on top of it. Unfortunately those aren&rsquo;t trivial and it is again quite hard to implement them without bugs.</span>

And since distributed systems are hard, why would anyone want another difficult to handle tool?

Enter Curator

<p
    class="c8"><span>Happily, guys from Netflix implemented a nice abstraction for dealing with Zookeeper internals. They called it Curator and use it extensively in the company&rsquo;s environment. Curator offers consistent API for Zookeeper&rsquo;s functionality. It even implements a couple of recipes for distributed systems.</span>

File read/write

<p
    class="c8"><span>The basic use of Zookeeper is as a distributed configuration repository. For this scenario I only need read/write capabilities, to be able to write and read files from the Zookeeper filesystem. This code snippet writes a sample json to a file on ZK filesystem.</span>

<a href="#"
                                                                                                  name="0"></a>

EnsurePath ensurePath = new EnsurePath(markerPath);
ensurePath.ensure(client.getZookeeperClient());
String json = “...”;
if (client.checkExists().forPath(statusFile(core)) != null)
     client.setData().forPath(statusFile(core), json.getBytes());
else
     client.create().forPath(statusFile(core), json.getBytes());


Distributed locking

Having multiple systems there may be a need of using an exclusive lock for some resource, or perhaps some big system requires it’s components to synchronize based on locks. This “recipe” is an ideal match for those situations.

ref="#"
                                                                                    name="b0329bbbf14b79ffaba1139881914aea887ef6a3"></a>



lock = new InterProcessSemaphoreMutex(client, lockPath);
lock.acquire(5, TimeUnit.MINUTES);
… do sth …
lock.release();


 (from https://github.com/zygm0nt/curator-playground/blob/master/src/main/java/pl/touk/curator/LockingRemotely.java)

Sevice Advertisement

<p

    class="c8"><span>This is quite an interesting use case. With many small services on different servers it is not wise to exchange ip addresses and ports between them. When some of those services may go down, while other will try to replace them - the task gets even harder. </span>

That’s why, with Zookeeper in place, it can be utilised as a registry of existing services.

If a service starts, it registers into the ServiceRegistry, offering basic information, like it’s purpose, role, address, and port.

Services that want to use a specific kind of service request an access to some instance. This way of configuring easily decouples services from their configuration.

Basically this scenario needs ? steps:

<span>1. Service starts and registers its presence (</span><span class="c5"><a class="c0"
                                                                               href="https://github.com/zygm0nt/curator-playground/blob/master/src/main/java/pl/touk/curator/WorkerAdvertiser.java#L44">https://github.com/zygm0nt/curator-playground/blob/master/src/main/java/pl/touk/curator/WorkerAdvertiser.java#L44</a></span><span>)</span><span>:</span>



ServiceDiscovery discovery = getDiscovery();
            discovery.start();
            ServiceInstance si = getInstance();
            log.info(si);
            discovery.registerService(si);



2. Another service - on another host or in another JVM on the same machine tries to discover who is implementing the service (https://github.com/zygm0nt/curator-playground/blob/master/src/main/java/pl/touk/curator/WorkerFinder.java#L50):

<a href="#"

                                                                                                  name="3"></a>

instances = discovery.queryForInstances(serviceName);

The whole concept here is ridiculously simple - the service advertising its presence just stores a file with its whereabouts. The service that is looking for service providers just look into specific directory and read stored definitions.

In my example, the structure advertised by services looks like this (+ some getters and constructor - the rest is here: https://github.com/zygm0nt/curator-playground/blob/master/src/main/java/pl/touk/model/WorkerMetadata.java):



public final class WorkerMetadata {
    private final UUID workerId;
    private final String listenAddress;
    private final int listenPort;
}


Source code

<p

    class="c8"><span>The above recipes are available in Curator library (</span><span class="c5"><a class="c0"
                                                                                                    href="http://curator.incubator.apache.org/">http://curator.incubator.apache.org/</a></span><span>). Recipes&rsquo;
usage examples are in my github repo at </span><span class="c5"><a class="c0"
                                                                   href="https://github.com/zygm0nt/curator-playground">https://github.com/zygm0nt/curator-playground</a></span>

Conclusion

<p
    class="c8"><span>If you&rsquo;re in need of a reliable platform for exchanging data and managing synchronization, and you need to do it in a distributed fashion - just choose Zookeeper. Then add Curator for the ease of using it. Enjoy!</span>


  1. image comes from: http://www.flickr.com/photos/jfgallery/2993361148
  2. all source code fragments taken from this repo: https://github.com/zygm0nt/curator-playground

An application developed for one of my recent projects at TouK involved multiple servers. There was a requirement to ensure failover for the system’s components. Since I had already a few separate components I didn’t want to add more of that, and since there already was a Zookeeper ensemble running - required by one of the services, I’ve decided to go that way with my solution.

What is Zookeeper?

Just a crude distributed synchronization framework. However, it implements Paxos-style algorithms (http://en.wikipedia.org/wiki/Paxos_(computer_science)) to ensure no split-brain scenarios would occur. This is quite an important feature, since I don’t have to care about that kind of problems while using this app. You just need to create an ensemble of a couple of its instances - to ensure high availability. It is basically a virtual filesystem, with files, directories and stuff. One could ask why another filesystem? Well this one is a rather special one, especially for distributed systems. The reason why creating all the locking algorithms on top of Zookeeper is easy is its Ephemeral Nodes - which are just files that exist as long as connection for them exists. After it disconnects - such file disappears.

With such paradigms in place it’s fairly easy to create some high level algorithms for synchronization.

Having that in place, it can safely integrate multiple services ensuring loose coupling in a distributed way.

Zookeeper from developer’s POV

With all the base services for Zookeeper started, it seems there is nothing else, than just connect to it and start implementing necessary algorithms. Unfortunately, the API is quite basic and offers files and directories abstractions with the addition of different node type (file types) - ephemeral and sequence. It is also possible to watch a node for changes.

Using bare Zookeeper is hard!

Creating connections is tedious - and there is lots of things to take care of. Handling an established connection is hard - when establishing connection to ensemble, it’s necessary to negotiate a session also. During the whole process a number of exceptions can occur - these are “recoverable” exceptions, that can be gracefully handled and not break the connection.

    class="c8"><span>So, Zookeeper API is hard.</span></p><p class="c1"><span></span></p><p class="c8"><span>Even if one is proficient with that API, then there come recipes. The reason for using Zookeeper is to be able to implement some more sophisticated algorithms on top of it. Unfortunately those aren&rsquo;t trivial and it is again quite hard to implement them without bugs.</span>

And since distributed systems are hard, why would anyone want another difficult to handle tool?

Enter Curator

<p
    class="c8"><span>Happily, guys from Netflix implemented a nice abstraction for dealing with Zookeeper internals. They called it Curator and use it extensively in the company&rsquo;s environment. Curator offers consistent API for Zookeeper&rsquo;s functionality. It even implements a couple of recipes for distributed systems.</span>

File read/write

<p
    class="c8"><span>The basic use of Zookeeper is as a distributed configuration repository. For this scenario I only need read/write capabilities, to be able to write and read files from the Zookeeper filesystem. This code snippet writes a sample json to a file on ZK filesystem.</span>

<a href="#"
                                                                                                  name="0"></a>

EnsurePath ensurePath = new EnsurePath(markerPath);
ensurePath.ensure(client.getZookeeperClient());
String json = “...”;
if (client.checkExists().forPath(statusFile(core)) != null)
     client.setData().forPath(statusFile(core), json.getBytes());
else
     client.create().forPath(statusFile(core), json.getBytes());


Distributed locking

Having multiple systems there may be a need of using an exclusive lock for some resource, or perhaps some big system requires it’s components to synchronize based on locks. This “recipe” is an ideal match for those situations.

ref="#"
                                                                                    name="b0329bbbf14b79ffaba1139881914aea887ef6a3"></a>



lock = new InterProcessSemaphoreMutex(client, lockPath);
lock.acquire(5, TimeUnit.MINUTES);
… do sth …
lock.release();


 (from https://github.com/zygm0nt/curator-playground/blob/master/src/main/java/pl/touk/curator/LockingRemotely.java)

Sevice Advertisement

<p

    class="c8"><span>This is quite an interesting use case. With many small services on different servers it is not wise to exchange ip addresses and ports between them. When some of those services may go down, while other will try to replace them - the task gets even harder. </span>

That’s why, with Zookeeper in place, it can be utilised as a registry of existing services.

If a service starts, it registers into the ServiceRegistry, offering basic information, like it’s purpose, role, address, and port.

Services that want to use a specific kind of service request an access to some instance. This way of configuring easily decouples services from their configuration.

Basically this scenario needs ? steps:

<span>1. Service starts and registers its presence (</span><span class="c5"><a class="c0"
                                                                               href="https://github.com/zygm0nt/curator-playground/blob/master/src/main/java/pl/touk/curator/WorkerAdvertiser.java#L44">https://github.com/zygm0nt/curator-playground/blob/master/src/main/java/pl/touk/curator/WorkerAdvertiser.java#L44</a></span><span>)</span><span>:</span>



ServiceDiscovery discovery = getDiscovery();
            discovery.start();
            ServiceInstance si = getInstance();
            log.info(si);
            discovery.registerService(si);



2. Another service - on another host or in another JVM on the same machine tries to discover who is implementing the service (https://github.com/zygm0nt/curator-playground/blob/master/src/main/java/pl/touk/curator/WorkerFinder.java#L50):

<a href="#"

                                                                                                  name="3"></a>

instances = discovery.queryForInstances(serviceName);

The whole concept here is ridiculously simple - the service advertising its presence just stores a file with its whereabouts. The service that is looking for service providers just look into specific directory and read stored definitions.

In my example, the structure advertised by services looks like this (+ some getters and constructor - the rest is here: https://github.com/zygm0nt/curator-playground/blob/master/src/main/java/pl/touk/model/WorkerMetadata.java):



public final class WorkerMetadata {
    private final UUID workerId;
    private final String listenAddress;
    private final int listenPort;
}


Source code

<p

    class="c8"><span>The above recipes are available in Curator library (</span><span class="c5"><a class="c0"
                                                                                                    href="http://curator.incubator.apache.org/">http://curator.incubator.apache.org/</a></span><span>). Recipes&rsquo;
usage examples are in my github repo at </span><span class="c5"><a class="c0"
                                                                   href="https://github.com/zygm0nt/curator-playground">https://github.com/zygm0nt/curator-playground</a></span>

Conclusion

<p
    class="c8"><span>If you&rsquo;re in need of a reliable platform for exchanging data and managing synchronization, and you need to do it in a distributed fashion - just choose Zookeeper. Then add Curator for the ease of using it. Enjoy!</span>


  1. image comes from: http://www.flickr.com/photos/jfgallery/2993361148
  2. all source code fragments taken from this repo: https://github.com/zygm0nt/curator-playground