Hamming Error Correction with Kotlin – part 1

Hamming code is one of the Computer Science/Telecommunication classics.

In this article, we’ll revisit the topic and implement a stateless Hamming(7,4) encoder using Kotlin.

Hamming Error Correction

Our communication channels and data storages are error-prone – bits can flip due to various things like electric/magnetic interferences, background radiation, or just because of the low quality of materials used.

Since the neutron flux is ~300 higher at around 10km altitude, a particular attention is necessary when dealing with systems operating at high altitudes – the case study of the Cassini-Huygens proves it – in space, a number of reported errors was over four times bigger than on earth, hence the need for efficient error correction.

Richard Hamming‘s Code is one of the solutions to the problem. It’s a perfect code (at least, according to Hamming’s definition) which can expose and correct errors in transmitted messages.

Simply put, it adds metadata to the message (in the form of parity bits) that can be used for validation and correction of errors in messages.

A Brief Explanation

I bet you already wondered what did (7,4) in “Hamming(7,4)” mean.

Simply put, N and M in “Hamming(N, M)” represent the block length and the message size – so, (7,4) means that it encodes four bits into seven bits by adding three additional parity bits – as simple as that.

This particular version can detect and correct single-bit errors, and detect (but not correct) double-bit errors.

In the Hamming’s codeword, parity bits always occupy all indexes that are powers of two (if we use 1-based-indexing).

So, if our initial message is 1111, the codeword will look somewhat like [][]1[]111 – with three parity bits for us to fill in.

If we want to calculate the n-th parity bit, we start on its position in a codeword, we take n elements, skip n elements, take n elements, skip n elements… and so on. If the number of taken ones is odd, we set the parity bit to one, otherwise zero.

In our case:

  • For the first parity bit, we check indexes 1,3,5,7       -> (1)()1()111
  • For the second parity bit, we check indexes 2,3,6,7 -> (1)(1)1()111
  • For the third parity bit, we check indexes 4,5,6,7     -> (1)(1)1(1)111

And that’s all – the codeword is 1111111.

In this case, it might be tempting to think that every sequence containing only ones will be encoded to another sequence comprising only ones… but that’s not the case… but every message containing only zeros will always be encoded to zeros exclusively.

Encoding

First things first, we can leverage Type Driven Development for making our life easier when working with Strings representing raw and encoded messages:

data class EncodedString(val value: String)

data class BinaryString(val value: String)

Using this approach, it’ll be slightly harder to mix them up.

We’ll need a method for calculating the encoded codeword size for a given message. In this case, we simply find the lowest number of parity pairs that can cover the given message:

fun codewordSize(msgLength: Int) = generateSequence(2) { it + 1 }
  .first { r -> msgLength + r + 1 <= (1 shl r) } + msgLength

Next, we’ll need a method for calculating parity and data bits at given indexes for a given message:

fun getParityBit(codeWordIndex: Int, msg: BinaryString) =
  parityIndicesSequence(codeWordIndex, codewordSize(msg.value.length))
    .map { getDataBit(it, msg).toInt() }
    .reduce { a, b -> a xor b }
    .toString()

fun getDataBit(ind: Int, input: BinaryString) = input
  .value[ind - Integer.toBinaryString(ind).length].toString()

Where parityIndicesSequence() is defined as:

fun parityIndicesSequence(start: Int, endEx: Int) = generateSequence(start) { it + 1 }
  .take(endEx - start)
  .filterIndexed { i, _ -> i % ((2 * (start + 1))) < start + 1 }
  .drop(1) // ignore the parity bit

Now, we can put it all together to form the actual solution, which simply is simply going through the whole codeword and filling it with parity bits and actual data:

override fun encode(input: BinaryString): EncodedString {
    fun toHammingCodeValue(it: Int, input: BinaryString) =
      when ((it + 1).isPowerOfTwo()) {
          true -> hammingHelper.getParityBit(it, input)
          false -> hammingHelper.getDataBit(it, input)
      }

    return hammingHelper.getHammingCodewordIndices(input.value.length)
      .map { toHammingCodeValue(it, input) }
      .joinToString("")
      .let(::EncodedString)
}

Note that isPowerOfTwo() is our custom extension function and is not available out-of-the-box in Kotlin:

internal fun Int.isPowerOfTwo() = this != 0 && this and this - 1 == 0

Inlined

The interesting thing is that the whole computation can be inlined to a single Goliath sequence:

override fun encode(input: BinaryString) = generateSequence(0) { it + 1 }
  .take(generateSequence(2) { it + 1 }
    .first { r -> input.value.length + r + 1 <= (1 shl r) } + input.value.length)
  .map {
      when ((it + 1).isPowerOfTwo()) {
          true -> generateSequence(it) { it + 1 }
            .take(generateSequence(2) { it + 1 }
              .first { r -> input.value.length + r + 1 <= (1 shl r) } + input.value.length - it)
            .filterIndexed { i, _ -> i % ((2 * (it + 1))) < it + 1 }
            .drop(1)
            .map {
                input
                  .value[it - Integer.toBinaryString(it).length].toString().toInt()
            }
            .reduce { a, b -> a xor b }
            .toString()
          false -> input
            .value[it - Integer.toBinaryString(it).length].toString()
      }
  }
  .joinToString("")
  .let(::EncodedString)

Not the most readable version, but interesting to have a look.

In Action

We can verify that the implementation works as expected by leveraging JUnit5 and Parameterized Tests:

@ParameterizedTest(name = "{0} should be encoded to {1}")
@CsvSource(
  "1,111",
  "01,10011",
  "11,01111",
  "1001000,00110010000",
  "1100001,10111001001",
  "1101101,11101010101",
  "1101001,01101011001",
  "1101110,01101010110",
  "1100111,01111001111",
  "0100000,10011000000",
  "1100011,11111000011",
  "1101111,10101011111",
  "1100100,11111001100",
  "1100101,00111000101",
  "10011010,011100101010")
fun shouldEncode(first: String, second: String) {
    assertThat(sut.encode(BinaryString(first)))
      .isEqualTo(EncodedString(second))
}

… and by using a home-made property testing:

@Test
@DisplayName("should always encode zeros to zeros")
fun shouldEncodeZeros() {
    generateSequence("0") { it + "0" }
      .take(1000)
      .map { sut.encode(BinaryString(it)).value }
      .forEach {
          assertThat(it).doesNotContain("1")
      }
}

Going Parallel

The most important property of this implementation is statelessness – it could be achieved by making sure that we’re using only pure functions and avoiding shared mutable state – all necessary data is always passed explicitly as input parameters and not held in any form of internal state.

Unfortunately, it results in some repetition and performance overhead that could’ve been avoided if we’re just modifying one mutable list and passing it around… but now we can utilize our resources wiser by parallelizing the whole operation – which should result in a performance improvement.

Without running the code that’s just wishful thinking so let’s do that.

We can parallelize the operation (naively) using Java 8’s parallel streams:

override fun encode(input: BinaryString) = hammingHelper.getHammingCodewordIndices(input.value.length)
  .toList().parallelStream()
  .map { toHammingCodeValue(it, input) }
  .reduce("") { t, u -> t + u }
  .let(::EncodedString)

To not give the sequential implementation an unfair advantage (no toList() conversion so far), we’ll need to change the implementation slightly:

override fun encode(input: BinaryString) = hammingHelper.getHammingCodewordIndices(input.value.length)
  .toList().stream() // to be fair.
  .map { toHammingCodeValue(it, input) }
  .reduce("") { t, u -> t + u }
  .let(::EncodedString)

And now, we can perform some benchmarking using JMH (message.size == 10_000):

Result "com.pivovarit.hamming.benchmarks.SimpleBenchmark.parallel":
 3.690 ±(99.9%) 0.018 ms/op [Average]
 (min, avg, max) = (3.524, 3.690, 3.974), stdev = 0.076
 CI (99.9%): [3.672, 3.708] (assumes normal distribution)

Result "com.pivovarit.hamming.benchmarks.SimpleBenchmark.sequential":
  10.877 ±(99.9%) 0.097 ms/op [Average]
  (min, avg, max) = (10.482, 10.877, 13.498), stdev = 0.410
  CI (99.9%): [10.780, 10.974] (assumes normal distribution)


# Run complete. Total time: 00:15:14

Benchmark                   Mode  Cnt   Score   Error  Units
SimpleBenchmark.parallel    avgt  200   3.690 ± 0.018  ms/op
SimpleBenchmark.sequential  avgt  200  10.877 ± 0.097  ms/op

As we can see, we can notice a major performance improvement in favor of the parallelized implementation – of course; results might drastically change because of various factors so do not think that we’ve found a silver bullet – they do not exist.

For example, here’re the results for encoding a very short message (message.size == 10)):

Benchmark                   Mode Cnt Score   Error Units
SimpleBenchmark.parallel    avgt 200 0.024 ± 0.001 ms/op
SimpleBenchmark.sequential  avgt 200 0.003 ± 0.001 ms/op

In this case, the overhead of splitting the operation among multiple threads makes the parallelized implementation perform eight times slower(sic!).

Here’s the full table for the reference:

Benchmark            (messageSize) Mode Cnt Score   Error    Units
Benchmark.parallel   10            avgt 200 0.022   ± 0.001  ms/op
Benchmark.sequential 10            avgt 200 0.003   ± 0.001  ms/op
 
Benchmark.parallel   100           avgt 200 0.038   ± 0.001  ms/op
Benchmark.sequential 100           avgt 200 0.031   ± 0.001  ms/op

Benchmark.parallel   1000          avgt 200 0.273   ± 0.011  ms/op 
Benchmark.sequential 1000          avgt 200 0.470   ± 0.008  ms/op

Benchmark.parallel   10000         avgt 200 3.731   ± 0.047  ms/op
Benchmark.sequential 10000         avgt 200 12.425  ± 0.336  ms/op

Conclusion

We saw how to implement a thread-safe Hamming(7,4) encoder using Kotlin and what parallelization can potentially give us.

In the second part of the article, we’ll implement a Hamming decoder and see how we can correct single-bit errors and detect double-bit ones.

Code snippets can be found on GitHub.

You May Also Like

CasperJS for Java developers

Why CasperJS

Being a Java developer is kinda hard these days. Java may not be dead yet, but when keeping in sync with all the hipster JavaScript frameworks could make us feel a bit outside the playground. It’s even hard to list JavaScript frameworks with latest releases on one website.

In my current project, we are using AngularJS. It’a a nice abstraction of MV* pattern in frontend layer of any web application (we use Grails underneath). Here is a nice article with an 8-point Win List of Angular way of handling AJAX calls and updating the view. So it’s not only a funny new framework but a truly helper of keeping your code clean and neat.

But there is also another area when you can put helpful JS framework in place of plan-old-java one - functional tests. Especially when you are dealing with one page app with lots of asynchronous REST/JSON communication.

Selenium and Geb

In Java/JVM project the typical is to use Selenium with some wrapper like Geb. So you start your project, setup your CI-functional testing pipeline and… after 1 month of coding your tests stop working and being maintainable. The frameworks itselves are not bad, but the typical setup is so heavy and has so many points of failure that keeping it working in a real life project is really hard.

Here is my list of common myths about Selenium: * It allows you to record test scripts via handy GUI - maybe some static request/response sites. In modern web applications with asynchronous REST/JSON communication your tests must contain a lot of “waitFor” statements and you cannot automate where these should be included. * It allows you to test your web app against many browsers - don’t try to automate IE tests! You have to manually open your app in IE to see how it actually bahaves! * It integrates well with continuous integration servers like Jenkins - you have to setup Selenium Grid on server with X installed to run tests on Chrome or Firefox and a Windows server for IE. And the headless HtmlUnit driver lacks a lot of JS support.

So I decided to try something different and introduce a bit of JavaScript tooling in our project by using CasperJS.

Introduction

CasperJS is simple but powerful navigation scripting & testing utility for PhantomJS - scritable headless WebKit (which is an rendering engine used by Safari and Chrome). In short - CasperJS allows you to navigate and make assertions about web pages as they’d been rendered in Google Chrome. It is enough for me to automate the functional tests of my application.

If you want a gentle introduction to the world of CasperJS I suggest you to read: * Official website, especially installation guide and API * Introductionary article from CasperJS creator Nicolas Perriault * Highlevel testing with CasperJS by Kevin van Zonneveld * grails-angular-scaffolding plugin by Rob Fletcher with some working CasperJS tests

Full example

I run my test suite via following script:

casperjs test --direct --log-level=debug --testhost=localhost:8080 --includes=test/casper/includes/casper-angular.coffee,test/casper/includes/pages.coffee test/casper/specs/

casper-angular.coffe

casper.test.on "fail", (failure) ->
    casper.capture(screenshot)

testhost   = casper.cli.get "testhost"
screenshot = 'test-fail.png'

casper
    .log("Using testhost: #{testhost}", "info")
    .log("Using screenshot: #{screenshot}", "info")

casper.waitUntilVisible = (selector, message, callback) ->
    @waitFor ->
        @visible selector
    , callback, (timeout) ->
        @log("Selector [#{selector}] not visible, failing")
        withParentSelector selector, (parent) ->
            casper.log("Output of parent selector [#{parent}]")
            casper.debugHTML(parent)
        @echo message, "RED_BAR"
        @capture(screenshot)
        @test.fail(f("Wait timeout occured (%dms)", timeout))

withParentSelector = (selector, callback) ->
    if selector.lastIndexOf(" ") > 0
       parent = selector[0..selector.lastIndexOf(" ")-1]
       callback(parent)

Sample pages.coffee:

x = require('casper').selectXPath

class EditDocumentPage

    assertAt: ->
        casper.test.assertSelectorExists("div.customerAccountInfo", 'at EditDocumentPage')

    templatesTreeFirstCategory: 'ul.tree li label'
    templatesTreeFirstTemplate: 'ul.tree li a'
    closePreview: '.closePreview a'
    smallPreview: '.smallPreviewContent img'
    bigPreview: 'img.previewImage'
    confirmDelete: x("//div[@class='modal-footer']/a[1]")

casper.editDocument = new EditDocumentPage()

End a test script:

testhost = casper.cli.get "testhost" or 'localhost:8080'

casper.start "http://#{testhost}/app", ->
    @test.assertHttpStatus 302
    @test.assertUrlMatch /\/fakeLogin/, 'auto login'
    @test.assert @visible('input#Create'), 'mock login button'
    @click 'input#Create'

casper.then ->
    @test.assertUrlMatch /document#\/edit/, 'new document'
    @editDocument.assertAt()
    @waitUntilVisible @editDocument.templatesTreeFirstCategory, 'template categories not visible', ->
        @click @editDocument.templatesTreeFirstCategory
        @waitUntilVisible @editDocument.templatesTreeFirstTemplate, 'template not visible', ->
            @click @editDocument.templatesTreeFirstTemplate

casper.then ->
    @waitUntilVisible @editDocument.smallPreview, 'small preview not visible', ->
        # could be dblclick / whatever
        @mouseEvent('click', @editDocument.smallPreview)

casper.then ->
    @waitUntilVisible @editDocument.bigPreview, 'big preview should be visible', ->
        @test.assertEvalEquals ->
            $('.pageCounter').text()
        , '1/1', 'page counter should be visible'
        @click @editDocument.closePreview

casper.then ->
    @click 'button.cancel'
    @waitUntilVisible '.modal-footer', 'delete confirmation not visible', ->
        @click @editDocument.confirmDelete

casper.run ->
    @test.done()

Here is a list of CasperJS features/caveats used here:

  • Using CoffeeScript is a huge win for your test code to look neat
  • When using casper test command, beware of different (than above articles) logging setup. You can pass --direct --log-level=debug from commandline for best results. Logging is essential here since Phantom often exists without any error and you do want to know what just happened.
  • Extract your helper code into separate files and include them by using --includes switch.
  • When passing server URL as a commandline switch remember that in CoffeeScript variables are not visible between multiple source files (unless getting them via window object)
  • It’s good to override standard waitUntilVisible with capting a screenshot and making a proper log statement. In my version I also look for a parent selector and debugHTML the content of it - great for debugging what is actually rendered by the browser.
  • Selenium and Geb have a nice concept of Page Objects - an abstract models of pages rendered by your application. Using CoffeeScript you can write your own classes, bind selectors to properties and use then in your code script. Assigning the objects to casper instance will end up with quite nice syntax like @editDocument.assertAt().
  • There is some issue with CSS :first and :last selectors. I cannot get them working (but maybe I’m doing something wrong?). But in CasperJS you can also use XPath selectors which are fine for matching n-th child of some element (x("//div[@class='modal-footer']/a[1]")).
    Update: :first and :last are not CSS3 selectors, but JQuery ones. Here is a list of CSS3 selectors, all of these are supported by CasperJS. So you can use nth-child(1) is this case. Thanks Andy and Nicolas for the comments!

Working with CasperJS can lead you to a few hour stall, but after getting things working you have a new, cool tool in your box!