The road to Kotlin Symbol Processing

There’s a long back story to Java annotations. Introduced in 2004 for Java 5 and supported by the javac compiler since Java 6 in 2002, they may be thought of as an industry standard approach to
metaprogramming. However, as mostly used in the Android ecosystem, it is not that popular technique in backend development. I had an
opportunity to dig into the subject while developing Krush, which is based on Kotlin annotation processing (KAPT). In this article I’ll try to show you the different approaches to annotation processing, starting from pure Java solutions, then moving to KAPT and finally to its
successor – Kotlin Symbol Processing.

Java annotation processors

Lombok *

Lombok is one of the first projects that comes into Java devs mind when thinking about annotation processing. You just add a single dependency into your pom.xml / add a plugin into Intellij Idea and some kind of magic turns your classes annotated with @Value into a functional immutable data structure. But in fact, Lombok is not a 100% pure example of an annotation processor… If you just run your debugger and step into generated toString / equals method, you’ll see no generated code:

lombok debug

Lombok starts as an usual annotation processor, but during its run it modifies a compiler abstract syntax tree to insert desired methods to your classes, which is not an intended use-case for annotation processors. In some references this technique is even called a hack:

Lombok … uses annotation processing as a bootstrapping mechanism to include itself into the compilation process and modify the AST via some internal compiler APIs. This hacky technique has nothing to do with the intended purpose of annotation processing

[source]

Apart from not being the clearest solution, using Lombok for some time broke other annotation-processing based libraries configured to run on the same source code.

AutoValue

A similar library which uses annotation processing in a clear way is AutoValue. It processes the @AutoValue and @AutoValue.Builder annotations to generate immutable classes which allow you to safely step into using a debugger. Consider following example:

@AutoValue
abstract class Book {

   static Builder builder() {
       return new AutoValue_Book.Builder();
   }

   @AutoValue.Builder
   interface Builder {
       Builder title(String title);
       Builder author(String author);
       Book build();
   }

   abstract String title();
   abstract String author();
}

Then, if you create an instance of the class using a Builder, you can see generated toString / equals / hashCode methods in the debugger:

autovalue debug

By quickly looking at the above code you can see some characteristics of using an annotation processor in your project:

  • there is a minimal framework/convention that must be built into your classes (abstract getters, static builder method)
  • you are using “third party” code (classes in this example) even by
    directly referring it in your source code or in the runtime (like in @AutoValue example)
  • there is a need to run partial compilation before being able to use
    generated code

Other examples of Java annotation processor include:

You can find other examples on this annotation processing list.

KAPT

I wasn’t aware how much annotation processing is used by libraries from the Android ecosystem. Butterknife, Room, Moshi, Hilt… these names are not quite familiar if you’re a backend developer working on the JVM. As Kotlin started gaining popularity in the Android community, it was crucial for it to support existing ecosystem libraries made for Java. That’s why KAPT was introduced – a Kotlin Annotation Processing Tool. The idea behind it was very simple:

  • make a minimal step to transform Kotlin code to Java
  • run annotation processing on Java sources, just as in any Java project

Krush

One of the examples of using KAPT is Krush – our lightweight persistence layer for Kotlin based on Exposed SQL DSL. Krush interprets standard JPA annotations on the entity classes to generate both Exposed DSL mappings and convenient methods transfer data from / to entity classes.

Consider following entity:

@Entity
data class Reservation(

   @Id
   val uid: UUID = UUID.randomUUID(),
   @Enumerated(EnumType.STRING)

   val status: Status = Status.FREE
)

By adding Krush we will have following mapping generated:

// generated
public object ReservationTable : Table("reservation") {
 public val uid: Column = uuid("uid")
 public override val primaryKey: Table.PrimaryKey = PrimaryKey(uid)
 public val status: Column = 
    enumerationByName("status", 255, pl.touk.krush.Status::class)
}

However, during Krush implementation we observed some drawbacks of using annotation processing in 100% Kotlin codebase:

  • no direct support for code generation. Java annotation API contains limited support for code generation using a Filer interface, that just lets you generate new files, with no distinction between source code, metadata, docs etc. A partial solution to this is using a third-party code generation library, like KotlinPoet, which contains a feature-rich DSL for generating Kotlin classes, properties etc.
  • missing Kotlin-specific information in the API – annotation processing API contains information about the structure of the code during compilation using javax.lang.model package. However, this is Java-specific, so you don’t have access to top-level functions, file annotations etc. However, some of additional Kotlin metadata can be retrieved by using kotlinpoet-metadata integration.
  • a need to generate Java stubs before running the annotation processing itself reduces the overall performance of project build process, some resources estimate that stub generation takes ⅓ time of the whole kotlinc run, which can be painful for large codebases.

KSP

The downsides of using KAPT in pure Kotlin projects (especially Android-based), triggered Google to develop a Kotlin-specific approach, called Kotlin Symbol Processing. It is now a preferred way to implement annotation processing in Kotlin, since its release KAPT has been put into maintenance mode. It provides a separate from javax.lang.model, Kotlin-specific API representing your source code, no need to generate any Java stubs and a better integration with code generator libraries. Let’s look at basic features of KSP by doing a hands-on example.

Implementing @Slf4j

Let’s look at how to use KSP by writing a simple processor – an equivalent of Lombok’s @Slf4j.
We should start with an implementation of our annotation, SymbolProcessor and a provider for it:

annotation class Slf4j

class Slf4jProcessor(val env: SymbolProcessorEnvironment) : SymbolProcessor {

   override fun process(resolver: Resolver): List {
       val symbols = resolver.getSymbolsWithAnnotation(Slf4j::class.java.name)
       val ret = symbols.filter { !it.validate() }.toList()
       symbols
           .filter { it is KSClassDeclaration && it.validate() }
           .forEach { it.accept(Slf4jProcessorVisitor(), Unit) }
       return ret
   }

   inner class Slf4jProcessorVisitor : KSVisitorVoid()
  
}

class Slf4jProcessorProvider : SymbolProcessorProvider {
   override fun create(environment: SymbolProcessorEnvironment) = Slf4jProcessor(environment)
}

Apart from a bunch of bootstrapping code, you may notice that the main part of the implementation would be Slf4jProcessorVisitor – which is how KSP is traversing your source code – by using the visitor pattern. So what you have to do is write appropriate visitXXX method(s) to implement your processor functionality:

override fun visitClassDeclaration(classDeclaration: KSClassDeclaration, data: Unit) {
   val packageName = classDeclaration.packageName.asString()
   val ksType = classDeclaration.asType(emptyList())

    val fileSpec = FileSpec.builder(
        packageName = packageName,
        fileName = classDeclaration.simpleName.asString() + "Ext"
    ).apply {
        val className = ksType.toClassName()
        val loggerName = "_${className.simpleName.replaceFirstChar { it.lowercase() }}Logger"
        addProperty(
            PropertySpec.builder(loggerName, Logger::class.java)
                .addModifiers(KModifier.PRIVATE)
                .initializer("%T.getLogger(%T::class.java)", LoggerFactory::class.java, className)
                .build()
        )
        addProperty(
            PropertySpec.builder("logger", Logger::class.java)
                .receiver(className)
                .getter(
                    FunSpec.getterBuilder()
                        .addStatement("return $loggerName")
                        .build()
                )
                .build()
        )
    }.build()

    fileSpec.writeTo(codeGenerator = env.codeGenerator, aggregating = false)
}

So, I’m using KotlinPoet here, which is quite nice integrated with KSP by this writeTo method – so if you want to generate a new file you just build a FileSpec and then write it to appropriate folder configured by the KSP plugin by calling writeTo method.

In short, for each class annotated with @Slf4j annotation we generate a file with a _serviceLogger property which is initialized with standard LoggerFactory.getLogger call. KotlinPoet comes with a nice templating system, which allows you to just pass class declarations from the KSP model instead of resolving them manually, with imports etc.. The second property is an extension of our annotated class, which we express by using a receiver block, we can also make a custom getter by using another KotlinPoet call.

If we did everything right this how the generated code should look for annotated class:

import org.slf4j.Logger
import org.slf4j.LoggerFactory

private val _serviceLogger: Logger = LoggerFactory.getLogger(Service::class.java)

public val Service.logger: Logger
  get() = _serviceLogger

Which should allow us to use the logger in our class:

@Slf4j
class Service {
   fun test() {
       logger.info("Hello from KSP!")
   }
}

Summary

I hope this article helped you learn the history of annotation processing and motivation behind the KSP project. In fact KAPT is now in maintenance mode, so KSP should be the default library to use in new, pure Kotlin projects. However, if you think of migrating your existing library based on KAPT, you should be aware of some complications when java.lang.model is too tightly coupled to your model. For example the Dagger project has a rough road of supporting KSP, they must first introduce some common model based on XProcessing to support both KSP and traditional annotation processing.

The code for example @Slf4j processor can be found here.

You May Also Like

Zookeeper + Curator = Distributed sync

An application developed for one of my recent projects at TouK involved multiple servers. There was a requirement to ensure failover for the system’s components. Since I had already a few separate components I didn’t want to add more of that, and since there already was a Zookeeper ensemble running - required by one of the services, I’ve decided to go that way with my solution.

What is Zookeeper?

Just a crude distributed synchronization framework. However, it implements Paxos-style algorithms (http://en.wikipedia.org/wiki/Paxos_(computer_science)) to ensure no split-brain scenarios would occur. This is quite an important feature, since I don’t have to care about that kind of problems while using this app. You just need to create an ensemble of a couple of its instances - to ensure high availability. It is basically a virtual filesystem, with files, directories and stuff. One could ask why another filesystem? Well this one is a rather special one, especially for distributed systems. The reason why creating all the locking algorithms on top of Zookeeper is easy is its Ephemeral Nodes - which are just files that exist as long as connection for them exists. After it disconnects - such file disappears.

With such paradigms in place it’s fairly easy to create some high level algorithms for synchronization.

Having that in place, it can safely integrate multiple services ensuring loose coupling in a distributed way.

Zookeeper from developer’s POV

With all the base services for Zookeeper started, it seems there is nothing else, than just connect to it and start implementing necessary algorithms. Unfortunately, the API is quite basic and offers files and directories abstractions with the addition of different node type (file types) - ephemeral and sequence. It is also possible to watch a node for changes.

Using bare Zookeeper is hard!

Creating connections is tedious - and there is lots of things to take care of. Handling an established connection is hard - when establishing connection to ensemble, it’s necessary to negotiate a session also. During the whole process a number of exceptions can occur - these are “recoverable” exceptions, that can be gracefully handled and not break the connection.

    class="c8"><span>So, Zookeeper API is hard.</span></p><p class="c1"><span></span></p><p class="c8"><span>Even if one is proficient with that API, then there come recipes. The reason for using Zookeeper is to be able to implement some more sophisticated algorithms on top of it. Unfortunately those aren&rsquo;t trivial and it is again quite hard to implement them without bugs.</span>

And since distributed systems are hard, why would anyone want another difficult to handle tool?

Enter Curator

<p
    class="c8"><span>Happily, guys from Netflix implemented a nice abstraction for dealing with Zookeeper internals. They called it Curator and use it extensively in the company&rsquo;s environment. Curator offers consistent API for Zookeeper&rsquo;s functionality. It even implements a couple of recipes for distributed systems.</span>

File read/write

<p
    class="c8"><span>The basic use of Zookeeper is as a distributed configuration repository. For this scenario I only need read/write capabilities, to be able to write and read files from the Zookeeper filesystem. This code snippet writes a sample json to a file on ZK filesystem.</span>

<a href="#"
                                                                                                  name="0"></a>

EnsurePath ensurePath = new EnsurePath(markerPath);
ensurePath.ensure(client.getZookeeperClient());
String json = “...”;
if (client.checkExists().forPath(statusFile(core)) != null)
     client.setData().forPath(statusFile(core), json.getBytes());
else
     client.create().forPath(statusFile(core), json.getBytes());


Distributed locking

Having multiple systems there may be a need of using an exclusive lock for some resource, or perhaps some big system requires it’s components to synchronize based on locks. This “recipe” is an ideal match for those situations.

ref="#"
                                                                                    name="b0329bbbf14b79ffaba1139881914aea887ef6a3"></a>



lock = new InterProcessSemaphoreMutex(client, lockPath);
lock.acquire(5, TimeUnit.MINUTES);
… do sth …
lock.release();


 (from https://github.com/zygm0nt/curator-playground/blob/master/src/main/java/pl/touk/curator/LockingRemotely.java)

Sevice Advertisement

<p

    class="c8"><span>This is quite an interesting use case. With many small services on different servers it is not wise to exchange ip addresses and ports between them. When some of those services may go down, while other will try to replace them - the task gets even harder. </span>

That’s why, with Zookeeper in place, it can be utilised as a registry of existing services.

If a service starts, it registers into the ServiceRegistry, offering basic information, like it’s purpose, role, address, and port.

Services that want to use a specific kind of service request an access to some instance. This way of configuring easily decouples services from their configuration.

Basically this scenario needs ? steps:

<span>1. Service starts and registers its presence (</span><span class="c5"><a class="c0"
                                                                               href="https://github.com/zygm0nt/curator-playground/blob/master/src/main/java/pl/touk/curator/WorkerAdvertiser.java#L44">https://github.com/zygm0nt/curator-playground/blob/master/src/main/java/pl/touk/curator/WorkerAdvertiser.java#L44</a></span><span>)</span><span>:</span>



ServiceDiscovery discovery = getDiscovery();
            discovery.start();
            ServiceInstance si = getInstance();
            log.info(si);
            discovery.registerService(si);



2. Another service - on another host or in another JVM on the same machine tries to discover who is implementing the service (https://github.com/zygm0nt/curator-playground/blob/master/src/main/java/pl/touk/curator/WorkerFinder.java#L50):

<a href="#"

                                                                                                  name="3"></a>

instances = discovery.queryForInstances(serviceName);

The whole concept here is ridiculously simple - the service advertising its presence just stores a file with its whereabouts. The service that is looking for service providers just look into specific directory and read stored definitions.

In my example, the structure advertised by services looks like this (+ some getters and constructor - the rest is here: https://github.com/zygm0nt/curator-playground/blob/master/src/main/java/pl/touk/model/WorkerMetadata.java):



public final class WorkerMetadata {
    private final UUID workerId;
    private final String listenAddress;
    private final int listenPort;
}


Source code

<p

    class="c8"><span>The above recipes are available in Curator library (</span><span class="c5"><a class="c0"
                                                                                                    href="http://curator.incubator.apache.org/">http://curator.incubator.apache.org/</a></span><span>). Recipes&rsquo;
usage examples are in my github repo at </span><span class="c5"><a class="c0"
                                                                   href="https://github.com/zygm0nt/curator-playground">https://github.com/zygm0nt/curator-playground</a></span>

Conclusion

<p
    class="c8"><span>If you&rsquo;re in need of a reliable platform for exchanging data and managing synchronization, and you need to do it in a distributed fashion - just choose Zookeeper. Then add Curator for the ease of using it. Enjoy!</span>


  1. image comes from: http://www.flickr.com/photos/jfgallery/2993361148
  2. all source code fragments taken from this repo: https://github.com/zygm0nt/curator-playground

An application developed for one of my recent projects at TouK involved multiple servers. There was a requirement to ensure failover for the system’s components. Since I had already a few separate components I didn’t want to add more of that, and since there already was a Zookeeper ensemble running - required by one of the services, I’ve decided to go that way with my solution.

What is Zookeeper?

Just a crude distributed synchronization framework. However, it implements Paxos-style algorithms (http://en.wikipedia.org/wiki/Paxos_(computer_science)) to ensure no split-brain scenarios would occur. This is quite an important feature, since I don’t have to care about that kind of problems while using this app. You just need to create an ensemble of a couple of its instances - to ensure high availability. It is basically a virtual filesystem, with files, directories and stuff. One could ask why another filesystem? Well this one is a rather special one, especially for distributed systems. The reason why creating all the locking algorithms on top of Zookeeper is easy is its Ephemeral Nodes - which are just files that exist as long as connection for them exists. After it disconnects - such file disappears.

With such paradigms in place it’s fairly easy to create some high level algorithms for synchronization.

Having that in place, it can safely integrate multiple services ensuring loose coupling in a distributed way.

Zookeeper from developer’s POV

With all the base services for Zookeeper started, it seems there is nothing else, than just connect to it and start implementing necessary algorithms. Unfortunately, the API is quite basic and offers files and directories abstractions with the addition of different node type (file types) - ephemeral and sequence. It is also possible to watch a node for changes.

Using bare Zookeeper is hard!

Creating connections is tedious - and there is lots of things to take care of. Handling an established connection is hard - when establishing connection to ensemble, it’s necessary to negotiate a session also. During the whole process a number of exceptions can occur - these are “recoverable” exceptions, that can be gracefully handled and not break the connection.

    class="c8"><span>So, Zookeeper API is hard.</span></p><p class="c1"><span></span></p><p class="c8"><span>Even if one is proficient with that API, then there come recipes. The reason for using Zookeeper is to be able to implement some more sophisticated algorithms on top of it. Unfortunately those aren&rsquo;t trivial and it is again quite hard to implement them without bugs.</span>

And since distributed systems are hard, why would anyone want another difficult to handle tool?

Enter Curator

<p
    class="c8"><span>Happily, guys from Netflix implemented a nice abstraction for dealing with Zookeeper internals. They called it Curator and use it extensively in the company&rsquo;s environment. Curator offers consistent API for Zookeeper&rsquo;s functionality. It even implements a couple of recipes for distributed systems.</span>

File read/write

<p
    class="c8"><span>The basic use of Zookeeper is as a distributed configuration repository. For this scenario I only need read/write capabilities, to be able to write and read files from the Zookeeper filesystem. This code snippet writes a sample json to a file on ZK filesystem.</span>

<a href="#"
                                                                                                  name="0"></a>

EnsurePath ensurePath = new EnsurePath(markerPath);
ensurePath.ensure(client.getZookeeperClient());
String json = “...”;
if (client.checkExists().forPath(statusFile(core)) != null)
     client.setData().forPath(statusFile(core), json.getBytes());
else
     client.create().forPath(statusFile(core), json.getBytes());


Distributed locking

Having multiple systems there may be a need of using an exclusive lock for some resource, or perhaps some big system requires it’s components to synchronize based on locks. This “recipe” is an ideal match for those situations.

ref="#"
                                                                                    name="b0329bbbf14b79ffaba1139881914aea887ef6a3"></a>



lock = new InterProcessSemaphoreMutex(client, lockPath);
lock.acquire(5, TimeUnit.MINUTES);
… do sth …
lock.release();


 (from https://github.com/zygm0nt/curator-playground/blob/master/src/main/java/pl/touk/curator/LockingRemotely.java)

Sevice Advertisement

<p

    class="c8"><span>This is quite an interesting use case. With many small services on different servers it is not wise to exchange ip addresses and ports between them. When some of those services may go down, while other will try to replace them - the task gets even harder. </span>

That’s why, with Zookeeper in place, it can be utilised as a registry of existing services.

If a service starts, it registers into the ServiceRegistry, offering basic information, like it’s purpose, role, address, and port.

Services that want to use a specific kind of service request an access to some instance. This way of configuring easily decouples services from their configuration.

Basically this scenario needs ? steps:

<span>1. Service starts and registers its presence (</span><span class="c5"><a class="c0"
                                                                               href="https://github.com/zygm0nt/curator-playground/blob/master/src/main/java/pl/touk/curator/WorkerAdvertiser.java#L44">https://github.com/zygm0nt/curator-playground/blob/master/src/main/java/pl/touk/curator/WorkerAdvertiser.java#L44</a></span><span>)</span><span>:</span>



ServiceDiscovery discovery = getDiscovery();
            discovery.start();
            ServiceInstance si = getInstance();
            log.info(si);
            discovery.registerService(si);



2. Another service - on another host or in another JVM on the same machine tries to discover who is implementing the service (https://github.com/zygm0nt/curator-playground/blob/master/src/main/java/pl/touk/curator/WorkerFinder.java#L50):

<a href="#"

                                                                                                  name="3"></a>

instances = discovery.queryForInstances(serviceName);

The whole concept here is ridiculously simple - the service advertising its presence just stores a file with its whereabouts. The service that is looking for service providers just look into specific directory and read stored definitions.

In my example, the structure advertised by services looks like this (+ some getters and constructor - the rest is here: https://github.com/zygm0nt/curator-playground/blob/master/src/main/java/pl/touk/model/WorkerMetadata.java):



public final class WorkerMetadata {
    private final UUID workerId;
    private final String listenAddress;
    private final int listenPort;
}


Source code

<p

    class="c8"><span>The above recipes are available in Curator library (</span><span class="c5"><a class="c0"
                                                                                                    href="http://curator.incubator.apache.org/">http://curator.incubator.apache.org/</a></span><span>). Recipes&rsquo;
usage examples are in my github repo at </span><span class="c5"><a class="c0"
                                                                   href="https://github.com/zygm0nt/curator-playground">https://github.com/zygm0nt/curator-playground</a></span>

Conclusion

<p
    class="c8"><span>If you&rsquo;re in need of a reliable platform for exchanging data and managing synchronization, and you need to do it in a distributed fashion - just choose Zookeeper. Then add Curator for the ease of using it. Enjoy!</span>


  1. image comes from: http://www.flickr.com/photos/jfgallery/2993361148
  2. all source code fragments taken from this repo: https://github.com/zygm0nt/curator-playground