Hamming Error Correction with Kotlin – part 1

Hamming code is one of the Computer Science/Telecommunication classics.

In this article, we’ll revisit the topic and implement a stateless Hamming(7,4) encoder using Kotlin.

Hamming Error Correction

Our communication channels and data storages are error-prone – bits can flip due to various things like electric/magnetic interferences, background radiation, or just because of the low quality of materials used.

Since the neutron flux is ~300 higher at around 10km altitude, a particular attention is necessary when dealing with systems operating at high altitudes – the case study of the Cassini-Huygens proves it – in space, a number of reported errors was over four times bigger than on earth, hence the need for efficient error correction.

Richard Hamming‘s Code is one of the solutions to the problem. It’s a perfect code (at least, according to Hamming’s definition) which can expose and correct errors in transmitted messages.

Simply put, it adds metadata to the message (in the form of parity bits) that can be used for validation and correction of errors in messages.

A Brief Explanation

I bet you already wondered what did (7,4) in “Hamming(7,4)” mean.

Simply put, N and M in “Hamming(N, M)” represent the block length and the message size – so, (7,4) means that it encodes four bits into seven bits by adding three additional parity bits – as simple as that.

This particular version can detect and correct single-bit errors, and detect (but not correct) double-bit errors.

In the Hamming’s codeword, parity bits always occupy all indexes that are powers of two (if we use 1-based-indexing).

So, if our initial message is 1111, the codeword will look somewhat like [][]1[]111 – with three parity bits for us to fill in.

If we want to calculate the n-th parity bit, we start on its position in a codeword, we take n elements, skip n elements, take n elements, skip n elements… and so on. If the number of taken ones is odd, we set the parity bit to one, otherwise zero.

In our case:

  • For the first parity bit, we check indexes 1,3,5,7       -> (1)()1()111
  • For the second parity bit, we check indexes 2,3,6,7 -> (1)(1)1()111
  • For the third parity bit, we check indexes 4,5,6,7     -> (1)(1)1(1)111

And that’s all – the codeword is 1111111.

In this case, it might be tempting to think that every sequence containing only ones will be encoded to another sequence comprising only ones… but that’s not the case… but every message containing only zeros will always be encoded to zeros exclusively.

Encoding

First things first, we can leverage Type Driven Development for making our life easier when working with Strings representing raw and encoded messages:

data class EncodedString(val value: String)

data class BinaryString(val value: String)

Using this approach, it’ll be slightly harder to mix them up.

We’ll need a method for calculating the encoded codeword size for a given message. In this case, we simply find the lowest number of parity pairs that can cover the given message:

fun codewordSize(msgLength: Int) = generateSequence(2) { it + 1 }
  .first { r -> msgLength + r + 1 <= (1 shl r) } + msgLength

Next, we’ll need a method for calculating parity and data bits at given indexes for a given message:

fun getParityBit(codeWordIndex: Int, msg: BinaryString) =
  parityIndicesSequence(codeWordIndex, codewordSize(msg.value.length))
    .map { getDataBit(it, msg).toInt() }
    .reduce { a, b -> a xor b }
    .toString()

fun getDataBit(ind: Int, input: BinaryString) = input
  .value[ind - Integer.toBinaryString(ind).length].toString()

Where parityIndicesSequence() is defined as:

fun parityIndicesSequence(start: Int, endEx: Int) = generateSequence(start) { it + 1 }
  .take(endEx - start)
  .filterIndexed { i, _ -> i % ((2 * (start + 1))) < start + 1 }
  .drop(1) // ignore the parity bit

Now, we can put it all together to form the actual solution, which simply is simply going through the whole codeword and filling it with parity bits and actual data:

override fun encode(input: BinaryString): EncodedString {
    fun toHammingCodeValue(it: Int, input: BinaryString) =
      when ((it + 1).isPowerOfTwo()) {
          true -> hammingHelper.getParityBit(it, input)
          false -> hammingHelper.getDataBit(it, input)
      }

    return hammingHelper.getHammingCodewordIndices(input.value.length)
      .map { toHammingCodeValue(it, input) }
      .joinToString("")
      .let(::EncodedString)
}

Note that isPowerOfTwo() is our custom extension function and is not available out-of-the-box in Kotlin:

internal fun Int.isPowerOfTwo() = this != 0 && this and this - 1 == 0

Inlined

The interesting thing is that the whole computation can be inlined to a single Goliath sequence:

override fun encode(input: BinaryString) = generateSequence(0) { it + 1 }
  .take(generateSequence(2) { it + 1 }
    .first { r -> input.value.length + r + 1 <= (1 shl r) } + input.value.length)
  .map {
      when ((it + 1).isPowerOfTwo()) {
          true -> generateSequence(it) { it + 1 }
            .take(generateSequence(2) { it + 1 }
              .first { r -> input.value.length + r + 1 <= (1 shl r) } + input.value.length - it)
            .filterIndexed { i, _ -> i % ((2 * (it + 1))) < it + 1 }
            .drop(1)
            .map {
                input
                  .value[it - Integer.toBinaryString(it).length].toString().toInt()
            }
            .reduce { a, b -> a xor b }
            .toString()
          false -> input
            .value[it - Integer.toBinaryString(it).length].toString()
      }
  }
  .joinToString("")
  .let(::EncodedString)

Not the most readable version, but interesting to have a look.

In Action

We can verify that the implementation works as expected by leveraging JUnit5 and Parameterized Tests:

@ParameterizedTest(name = "{0} should be encoded to {1}")
@CsvSource(
  "1,111",
  "01,10011",
  "11,01111",
  "1001000,00110010000",
  "1100001,10111001001",
  "1101101,11101010101",
  "1101001,01101011001",
  "1101110,01101010110",
  "1100111,01111001111",
  "0100000,10011000000",
  "1100011,11111000011",
  "1101111,10101011111",
  "1100100,11111001100",
  "1100101,00111000101",
  "10011010,011100101010")
fun shouldEncode(first: String, second: String) {
    assertThat(sut.encode(BinaryString(first)))
      .isEqualTo(EncodedString(second))
}

… and by using a home-made property testing:

@Test
@DisplayName("should always encode zeros to zeros")
fun shouldEncodeZeros() {
    generateSequence("0") { it + "0" }
      .take(1000)
      .map { sut.encode(BinaryString(it)).value }
      .forEach {
          assertThat(it).doesNotContain("1")
      }
}

Going Parallel

The most important property of this implementation is statelessness – it could be achieved by making sure that we’re using only pure functions and avoiding shared mutable state – all necessary data is always passed explicitly as input parameters and not held in any form of internal state.

Unfortunately, it results in some repetition and performance overhead that could’ve been avoided if we’re just modifying one mutable list and passing it around… but now we can utilize our resources wiser by parallelizing the whole operation – which should result in a performance improvement.

Without running the code that’s just wishful thinking so let’s do that.

We can parallelize the operation (naively) using Java 8’s parallel streams:

override fun encode(input: BinaryString) = hammingHelper.getHammingCodewordIndices(input.value.length)
  .toList().parallelStream()
  .map { toHammingCodeValue(it, input) }
  .reduce("") { t, u -> t + u }
  .let(::EncodedString)

To not give the sequential implementation an unfair advantage (no toList() conversion so far), we’ll need to change the implementation slightly:

override fun encode(input: BinaryString) = hammingHelper.getHammingCodewordIndices(input.value.length)
  .toList().stream() // to be fair.
  .map { toHammingCodeValue(it, input) }
  .reduce("") { t, u -> t + u }
  .let(::EncodedString)

And now, we can perform some benchmarking using JMH (message.size == 10_000):

Result "com.pivovarit.hamming.benchmarks.SimpleBenchmark.parallel":
 3.690 ±(99.9%) 0.018 ms/op [Average]
 (min, avg, max) = (3.524, 3.690, 3.974), stdev = 0.076
 CI (99.9%): [3.672, 3.708] (assumes normal distribution)

Result "com.pivovarit.hamming.benchmarks.SimpleBenchmark.sequential":
  10.877 ±(99.9%) 0.097 ms/op [Average]
  (min, avg, max) = (10.482, 10.877, 13.498), stdev = 0.410
  CI (99.9%): [10.780, 10.974] (assumes normal distribution)


# Run complete. Total time: 00:15:14

Benchmark                   Mode  Cnt   Score   Error  Units
SimpleBenchmark.parallel    avgt  200   3.690 ± 0.018  ms/op
SimpleBenchmark.sequential  avgt  200  10.877 ± 0.097  ms/op

As we can see, we can notice a major performance improvement in favor of the parallelized implementation – of course; results might drastically change because of various factors so do not think that we’ve found a silver bullet – they do not exist.

For example, here’re the results for encoding a very short message (message.size == 10)):

Benchmark                   Mode Cnt Score   Error Units
SimpleBenchmark.parallel    avgt 200 0.024 ± 0.001 ms/op
SimpleBenchmark.sequential  avgt 200 0.003 ± 0.001 ms/op

In this case, the overhead of splitting the operation among multiple threads makes the parallelized implementation perform eight times slower(sic!).

Here’s the full table for the reference:

Benchmark            (messageSize) Mode Cnt Score   Error    Units
Benchmark.parallel   10            avgt 200 0.022   ± 0.001  ms/op
Benchmark.sequential 10            avgt 200 0.003   ± 0.001  ms/op
 
Benchmark.parallel   100           avgt 200 0.038   ± 0.001  ms/op
Benchmark.sequential 100           avgt 200 0.031   ± 0.001  ms/op

Benchmark.parallel   1000          avgt 200 0.273   ± 0.011  ms/op 
Benchmark.sequential 1000          avgt 200 0.470   ± 0.008  ms/op

Benchmark.parallel   10000         avgt 200 3.731   ± 0.047  ms/op
Benchmark.sequential 10000         avgt 200 12.425  ± 0.336  ms/op

Conclusion

We saw how to implement a thread-safe Hamming(7,4) encoder using Kotlin and what parallelization can potentially give us.

In the second part of the article, we’ll implement a Hamming decoder and see how we can correct single-bit errors and detect double-bit ones.

Code snippets can be found on GitHub.

You May Also Like

Super Confitura Man

How Super Confitura Man came to be :)

Recently at TouK we had a one-day hackathon. There was no main theme for it, you just could post a project idea, gather people around it and hack on that idea for a whole day - drinks and pizza included.

My main idea was to create something that could be fun to build and be useful somehow to others. I’d figured out that since Confitura was just around a corner I could make a game, that would be playable at TouK’s booth at the conference venue. This idea seemed good enough to attract Rafał Nowak @RNowak3 and Marcin Jasion @marcinjasion - two TouK employees, that with me formed a team for the hackathon.

Confitura 01

The initial plan was to develop a simple mario-style game, with preceduraly generated levels, random collectible items and enemies. One of the ideas was to introduce Confitura Man as the main character, but due to time constraints, this fall through. We’ve decided to just choose a random available sprite for a character - hence the onion man :)

Confitura 02

How the game is played?

Since we wanted to have a scoreboard and have unique users, we’ve printed out QR codes. A person that would like to play the game could pick up a QR code, show it against a camera attached to the play booth. The start page scanned the QR code and launched the game with username read from paper code.

The rest of the game was playable with gamepad or keyboard.

Confitura game screen

Technicalities

Writing a game takes a lot of time and effort. We wanted to deliver, so we’ve decided to spend some time in the days before the hackathon just to bootstrap the technology stack of our enterprise.

We’ve decided that the game would be written in some Javascript based engine, with Google Chrome as a web platform. There are a lot of HTML5 game engines - list of html5 game engines and you could easily create a game with each and every of them. We’ve decided to use Phaser IO which handles a lot of difficult, game-related stuff on its own. So, we didn’t have to worry about physics, loading and storing assets, animations, object collisions, controls input/output. Go see for yourself, it is really nice and easy to use.

Scoreboard would be a rip-off from JIRA Survivor with stats being served from some web server app. To make things harder, the backend server was written in Clojure. With no experience in that language in the team, it was a bit risky, but the tasks of the server were trivial, so if all that clojure effort failed, it could be rewritten in something we know.

Statistics

During the whole Confitura day there were 69 unique players (69 QR codes were used), and 1237 games were played. The final score looked like this:

  1. Barister Lingerie 158 - 1450 points
  2. Boilerdang Custardbath 386 - 1060 points
  3. Benadryl Clarytin 306 - 870 points

And the obligatory scoreboard screenshot:

Confitura 03

Obstacles

The game, being created in just one day, had to have problems :) It wasn’t play tested enough, there were some rough edges. During the day we had to make a few fixes:

  • the server did not respect the highest score by specific user, it was just overwritting a user’s score with it’s latest one,
  • there was one feature not supported on keyboard, that was available on gamepad - turbo button
  • server was opening a database connection each time it got a request, so after around 5 minutes it would exhaust open file limit for MongoDB (backend database), this was easily fixed - thou the fix is a bit hackish :)

These were easily identified and fixed. Unfortunately there were issues that we were unable to fix while the event was on:

  • google chrome kept asking for the permission to use webcam - this was very annoying, and all the info found on the web did not work - StackOverflow thread
  • it was hard to start the game with QR code - either the codes were too small, or the lighting around that area was inappropriate - I think this issue could be fixed by printing larger codes,

Technology evaluation

All in all we were pretty happy with the chosen stack. Phaser was easy to use and left us with just the fun parts of the game creation process. Finding the right graphics with appropriate licensing was rather hard. We didn’t have enough time to polish all the visual aspects of the game before Confitura.

Writing a server in clojure was the most challenging part, with all the new syntax and new libraries. There were tasks, trivial in java/scala, but hard in Clojure - at least for a whimpy beginners :) Nevertheless Clojure seems like a really handy tool and I’d like to dive deeper into its ecosystem.

Source code

All of the sources for the game can be found here TouK/confitura-man.

The repository is split into two parts:

  • game - HTML5 game
  • server - clojure based backend server

To run the server you need to have a local MongoDB installation. Than in server’s directory run: $ lein ring server-headless This will start a server on http://localhost:3000

To run the game you need to install dependencies with bower and than run $ grunt from game’s directory.

To launch the QR reading part of the game, you enter http://localhost:9000/start.html. After scanning the code you’ll be redirected to http://localhost:9000/index.html - and the game starts.

Conclusion

Summing up, it was a great experience creating the game. It was fun to watch people playing the game. And even with all those glitches and stupid graphics, there were people vigorously playing it, which was awesome.

Thanks to Rafał and Michał for great coding experience, and thanks to all the players of our stupid little game. If you’d like to ask me about anything - feel free to contact me by mail or twitter @zygm0nt

Recently at TouK we had a one-day hackathon. There was no main theme for it, you just could post a project idea, gather people around it and hack on that idea for a whole day - drinks and pizza included.

My main idea was to create something that could be fun to build and be useful somehow to others. I’d figured out that since Confitura was just around a corner I could make a game, that would be playable at TouK’s booth at the conference venue. This idea seemed good enough to attract >Conclusion

Sygnalizacyjne ABC

Poniższy artykuł oparty jest na wspaniałej pozycji książkowej “System Sygnalizacji nr 7 G. Danielewicz, W.Kabaciński”. Gorąco zachęcam do…