Hamming Error Correction with Kotlin – part 1

Hamming code is one of the Computer Science/Telecommunication classics.

In this article, we’ll revisit the topic and implement a stateless Hamming(7,4) encoder using Kotlin.

Hamming Error Correction

Our communication channels and data storages are error-prone – bits can flip due to various things like electric/magnetic interferences, background radiation, or just because of the low quality of materials used.

Since the neutron flux is ~300 higher at around 10km altitude, a particular attention is necessary when dealing with systems operating at high altitudes – the case study of the Cassini-Huygens proves it – in space, a number of reported errors was over four times bigger than on earth, hence the need for efficient error correction.

Richard Hamming‘s Code is one of the solutions to the problem. It’s a perfect code (at least, according to Hamming’s definition) which can expose and correct errors in transmitted messages.

Simply put, it adds metadata to the message (in the form of parity bits) that can be used for validation and correction of errors in messages.

A Brief Explanation

I bet you already wondered what did (7,4) in “Hamming(7,4)” mean.

Simply put, N and M in “Hamming(N, M)” represent the block length and the message size – so, (7,4) means that it encodes four bits into seven bits by adding three additional parity bits – as simple as that.

This particular version can detect and correct single-bit errors, and detect (but not correct) double-bit errors.

In the Hamming’s codeword, parity bits always occupy all indexes that are powers of two (if we use 1-based-indexing).

So, if our initial message is 1111, the codeword will look somewhat like [][]1[]111 – with three parity bits for us to fill in.

If we want to calculate the n-th parity bit, we start on its position in a codeword, we take n elements, skip n elements, take n elements, skip n elements… and so on. If the number of taken ones is odd, we set the parity bit to one, otherwise zero.

In our case:

  • For the first parity bit, we check indexes 1,3,5,7       -> (1)()1()111
  • For the second parity bit, we check indexes 2,3,6,7 -> (1)(1)1()111
  • For the third parity bit, we check indexes 4,5,6,7     -> (1)(1)1(1)111

And that’s all – the codeword is 1111111.

In this case, it might be tempting to think that every sequence containing only ones will be encoded to another sequence comprising only ones… but that’s not the case… but every message containing only zeros will always be encoded to zeros exclusively.

Encoding

First things first, we can leverage Type Driven Development for making our life easier when working with Strings representing raw and encoded messages:

data class EncodedString(val value: String)

data class BinaryString(val value: String)

Using this approach, it’ll be slightly harder to mix them up.

We’ll need a method for calculating the encoded codeword size for a given message. In this case, we simply find the lowest number of parity pairs that can cover the given message:

fun codewordSize(msgLength: Int) = generateSequence(2) { it + 1 }
  .first { r -> msgLength + r + 1 <= (1 shl r) } + msgLength

Next, we’ll need a method for calculating parity and data bits at given indexes for a given message:

fun getParityBit(codeWordIndex: Int, msg: BinaryString) =
  parityIndicesSequence(codeWordIndex, codewordSize(msg.value.length))
    .map { getDataBit(it, msg).toInt() }
    .reduce { a, b -> a xor b }
    .toString()

fun getDataBit(ind: Int, input: BinaryString) = input
  .value[ind - Integer.toBinaryString(ind).length].toString()

Where parityIndicesSequence() is defined as:

fun parityIndicesSequence(start: Int, endEx: Int) = generateSequence(start) { it + 1 }
  .take(endEx - start)
  .filterIndexed { i, _ -> i % ((2 * (start + 1))) < start + 1 }
  .drop(1) // ignore the parity bit

Now, we can put it all together to form the actual solution, which simply is simply going through the whole codeword and filling it with parity bits and actual data:

override fun encode(input: BinaryString): EncodedString {
    fun toHammingCodeValue(it: Int, input: BinaryString) =
      when ((it + 1).isPowerOfTwo()) {
          true -> hammingHelper.getParityBit(it, input)
          false -> hammingHelper.getDataBit(it, input)
      }

    return hammingHelper.getHammingCodewordIndices(input.value.length)
      .map { toHammingCodeValue(it, input) }
      .joinToString("")
      .let(::EncodedString)
}

Note that isPowerOfTwo() is our custom extension function and is not available out-of-the-box in Kotlin:

internal fun Int.isPowerOfTwo() = this != 0 && this and this - 1 == 0

Inlined

The interesting thing is that the whole computation can be inlined to a single Goliath sequence:

override fun encode(input: BinaryString) = generateSequence(0) { it + 1 }
  .take(generateSequence(2) { it + 1 }
    .first { r -> input.value.length + r + 1 <= (1 shl r) } + input.value.length)
  .map {
      when ((it + 1).isPowerOfTwo()) {
          true -> generateSequence(it) { it + 1 }
            .take(generateSequence(2) { it + 1 }
              .first { r -> input.value.length + r + 1 <= (1 shl r) } + input.value.length - it)
            .filterIndexed { i, _ -> i % ((2 * (it + 1))) < it + 1 }
            .drop(1)
            .map {
                input
                  .value[it - Integer.toBinaryString(it).length].toString().toInt()
            }
            .reduce { a, b -> a xor b }
            .toString()
          false -> input
            .value[it - Integer.toBinaryString(it).length].toString()
      }
  }
  .joinToString("")
  .let(::EncodedString)

Not the most readable version, but interesting to have a look.

In Action

We can verify that the implementation works as expected by leveraging JUnit5 and Parameterized Tests:

@ParameterizedTest(name = "{0} should be encoded to {1}")
@CsvSource(
  "1,111",
  "01,10011",
  "11,01111",
  "1001000,00110010000",
  "1100001,10111001001",
  "1101101,11101010101",
  "1101001,01101011001",
  "1101110,01101010110",
  "1100111,01111001111",
  "0100000,10011000000",
  "1100011,11111000011",
  "1101111,10101011111",
  "1100100,11111001100",
  "1100101,00111000101",
  "10011010,011100101010")
fun shouldEncode(first: String, second: String) {
    assertThat(sut.encode(BinaryString(first)))
      .isEqualTo(EncodedString(second))
}

… and by using a home-made property testing:

@Test
@DisplayName("should always encode zeros to zeros")
fun shouldEncodeZeros() {
    generateSequence("0") { it + "0" }
      .take(1000)
      .map { sut.encode(BinaryString(it)).value }
      .forEach {
          assertThat(it).doesNotContain("1")
      }
}

Going Parallel

The most important property of this implementation is statelessness – it could be achieved by making sure that we’re using only pure functions and avoiding shared mutable state – all necessary data is always passed explicitly as input parameters and not held in any form of internal state.

Unfortunately, it results in some repetition and performance overhead that could’ve been avoided if we’re just modifying one mutable list and passing it around… but now we can utilize our resources wiser by parallelizing the whole operation – which should result in a performance improvement.

Without running the code that’s just wishful thinking so let’s do that.

We can parallelize the operation (naively) using Java 8’s parallel streams:

override fun encode(input: BinaryString) = hammingHelper.getHammingCodewordIndices(input.value.length)
  .toList().parallelStream()
  .map { toHammingCodeValue(it, input) }
  .reduce("") { t, u -> t + u }
  .let(::EncodedString)

To not give the sequential implementation an unfair advantage (no toList() conversion so far), we’ll need to change the implementation slightly:

override fun encode(input: BinaryString) = hammingHelper.getHammingCodewordIndices(input.value.length)
  .toList().stream() // to be fair.
  .map { toHammingCodeValue(it, input) }
  .reduce("") { t, u -> t + u }
  .let(::EncodedString)

And now, we can perform some benchmarking using JMH (message.size == 10_000):

Result "com.pivovarit.hamming.benchmarks.SimpleBenchmark.parallel":
 3.690 ±(99.9%) 0.018 ms/op [Average]
 (min, avg, max) = (3.524, 3.690, 3.974), stdev = 0.076
 CI (99.9%): [3.672, 3.708] (assumes normal distribution)

Result "com.pivovarit.hamming.benchmarks.SimpleBenchmark.sequential":
  10.877 ±(99.9%) 0.097 ms/op [Average]
  (min, avg, max) = (10.482, 10.877, 13.498), stdev = 0.410
  CI (99.9%): [10.780, 10.974] (assumes normal distribution)


# Run complete. Total time: 00:15:14

Benchmark                   Mode  Cnt   Score   Error  Units
SimpleBenchmark.parallel    avgt  200   3.690 ± 0.018  ms/op
SimpleBenchmark.sequential  avgt  200  10.877 ± 0.097  ms/op

As we can see, we can notice a major performance improvement in favor of the parallelized implementation – of course; results might drastically change because of various factors so do not think that we’ve found a silver bullet – they do not exist.

For example, here’re the results for encoding a very short message (message.size == 10)):

Benchmark                   Mode Cnt Score   Error Units
SimpleBenchmark.parallel    avgt 200 0.024 ± 0.001 ms/op
SimpleBenchmark.sequential  avgt 200 0.003 ± 0.001 ms/op

In this case, the overhead of splitting the operation among multiple threads makes the parallelized implementation perform eight times slower(sic!).

Here’s the full table for the reference:

Benchmark            (messageSize) Mode Cnt Score   Error    Units
Benchmark.parallel   10            avgt 200 0.022   ± 0.001  ms/op
Benchmark.sequential 10            avgt 200 0.003   ± 0.001  ms/op
 
Benchmark.parallel   100           avgt 200 0.038   ± 0.001  ms/op
Benchmark.sequential 100           avgt 200 0.031   ± 0.001  ms/op

Benchmark.parallel   1000          avgt 200 0.273   ± 0.011  ms/op 
Benchmark.sequential 1000          avgt 200 0.470   ± 0.008  ms/op

Benchmark.parallel   10000         avgt 200 3.731   ± 0.047  ms/op
Benchmark.sequential 10000         avgt 200 12.425  ± 0.336  ms/op

Conclusion

We saw how to implement a thread-safe Hamming(7,4) encoder using Kotlin and what parallelization can potentially give us.

In the second part of the article, we’ll implement a Hamming decoder and see how we can correct single-bit errors and detect double-bit ones.

Code snippets can be found on GitHub.

You May Also Like

Clojure web development – state of the art

It’s now more than a year that I’m getting familiar with Clojure and the more I dive into it, the more it becomes the language. Once you defeat the “parentheses fear”, everything else just makes the difference: tooling, community, good engineering practices. So it’s now time for me to convince others. In this post I’ll try to walktrough a simple web application from scratch to show key tools and libraries used to develop with Clojure in late 2015.

Note for Clojurians: This material is rather elementary and may be useful for you if you already know Clojure a bit but never did anything bigger than hello world application.

Note for Java developers: This material shows how to replace Spring, Angular, grunt, live-reload with a bunch of Clojure tools and libraries and a bit of code.

The repo with final code and individual steps is here.

Bootstrap

I think all agreed that component is the industry standard for managing lifecycle of Clojure applications. If you are a Java developer you may think of it as a Spring (DI) replacement - you declare dependencies between “components” which are resolved on “system” startup. So you just say “my component needs a repository/database pool” and component library “injects” it for you.

To keep things simple I like to start with duct web app template. It’s a nice starter component application following the 12-factor philosophy. So let’s start with it:

lein new duct clojure-web-app +example

The +example parameter tells duct to create an example endpoint with HTTP routes - this would be helpful. To finish bootstraping run lein setup inside clojure-web-app directory.

Ok, let’s dive into the code. Component and injection related code should be in system.clj file:

(defn new-system [config]
  (let [config (meta-merge base-config config)]
    (-> (component/system-map
         :app  (handler-component (:app config))
         :http (jetty-server (:http config))
         :example (endpoint-component example-endpoint))
        (component/system-using
         {:http [:app]
          :app  [:example]
          :example []}))))

In the first section you instantiate components without dependencies, which are resolved in the second section. So in this example, “http” component (server) requires “app” (application abstraction), which in turn is injected with “example” (actual routes). If your component needs others, you just can get then by names (precisely: by Clojure keywords).

To start the system you must fire a REPL - interactive environment running within context of your application:

lein repl

After seeing prompt type (go). Application should start, you can visit http://localhost:3000 to see some example page.

A huge benefit of using component approach is that you get fully reloadable application. When you change literally anything - configuration, endpoints, implementation, you can just type (reset) in REPL and your application is up-to-date with the code. It’s a feature of the language, no JRebel, Spring-reloaded needed.

Adding REST endpoint

Ok, in the next step let’s add some basic REST endpoint returning JSON. We need to add 2 dependencies in project.clj file:

:dependencies
 ...
  [ring/ring-json "0.3.1"]
  [cheshire "5.1.1"]

Ring-json adds support for JSON for your routes (in ring it’s called middleware) and cheshire is Clojure JSON parser (like Jackson in Java). Modifying project dependencies if one of the few tasks that require restarting the REPL, so hit CTRL-C and type lein repl again.

To configure JSON middleware we have to add wrap-json-body and wrap-json-response just before wrap-defaults in system.clj:

(:require 
 ...
 [ring.middleware.json :refer [wrap-json-body wrap-json-response]])

(def base-config
   {:app {:middleware [[wrap-not-found :not-found]
                      [wrap-json-body {:keywords? true}]
                      [wrap-json-response]
                      [wrap-defaults :defaults]]

And finally, in endpoint/example.clj we must add some route with JSON response:

(:require 
 ...
 [ring.util.response :refer [response]]))

(defn example-endpoint [config]
  (routes
    (GET "/hello" [] (response {:hello "world"}))
    ...

Reload app with (reset) in REPL and test new route with curl:

curl -v http://localhost:3000/hello

< HTTP/1.1 200 OK
< Date: Tue, 15 Sep 2015 21:17:37 GMT
< Content-Type: application/json; charset=utf-8
< Set-Cookie: ring-session=37c337fb-6bbc-4e65-a060-1997718d03e0;Path=/;HttpOnly
< X-XSS-Protection: 1; mode=block
< X-Frame-Options: SAMEORIGIN
< X-Content-Type-Options: nosniff
< Content-Length: 151
* Server Jetty(9.2.10.v20150310) is not blacklisted
< Server: Jetty(9.2.10.v20150310)
<
* Connection #0 to host localhost left intact
{"hello": "world"}

It works! In case of any problems you can find working version in this commit.

Adding frontend with figwheel

Coding backend in Clojure is great, but what about the frontend? As you may already know, Clojure could be compiled not only to JVM bytecode, but also to Javascript. This may sound familiar if you used e.g. Coffescript. But ClojureScript philosophy is not only to provide some syntax sugar, but improve your development cycle with great tooling and fully interactive development. Let’s see how to achieve it.

The best way to introduce ClojureScript to a project is figweel. First let’s add fighweel plugin and configuration to project.clj:

:plugins
   ...
   [lein-figwheel "0.3.9"]

And cljsbuild configuration:

:cljsbuild
    {:builds [{:id "dev"
               :source-paths ["src-cljs"]
               :figwheel true
               :compiler {:main       "clojure-web-app.core"
                          :asset-path "js/out"
                          :output-to  "resources/public/js/clojure-web-app.js"
                          :output-dir "resources/public/js/out"}}]}

In short this tells ClojureScript compiler to take sources from src-cljs with figweel support and but resulting JavaScript into resources/public/js/clojure-web-app.js file. So we need to include this file in a simple HTML page:

<!DOCTYPE html>
<head>
</head>
<body>
  <div id="main">
  </div>
  <script src="js/clojure-web-app.js" type="text/javascript"></script>
</body>
</html>

To serve this static file we need to change some defaults and add corresponding route. In system.clj change api-defaults to site-defaults both in require section and base-config function. In example.clj add following route:

(GET "/" [] (io/resource "public/index.html")

Again (reset) in REPL window should reload everything.

But where is our ClojureScript source file? Let’s create file core.cljs in src-cljs/clojure-web-app directory:

(ns ^:figwheel-always clojure-web-app.core)

(enable-console-print!)

(println "hello from clojurescript")

Open another terminal and run lein fighweel. It should compile ClojureScript and print ‘Prompt will show when figwheel connects to your application’. Open http://localhost:3000. Fighweel window should prompt:

To quit, type: :cljs/quit
cljs.user=>

Type (js/alert "hello"). Boom! If everything worked you should see and alert in your browser. Open developers console in your browser. You should see hello from clojurescript printed on the console. Change it in core.cljs to (println "fighweel rocks") and save the file. Without reloading the page your should see updated message. Figweel rocks! Again, in case of any problems, refer to this commit.

In the next post I’ll show how to fetch data from MongoDB, serve it with REST to the broser and write ReactJs/Om components to render it. Stay tuned!