{"id":13246,"date":"2017-10-17T09:32:02","date_gmt":"2017-10-17T07:32:02","guid":{"rendered":"https:\/\/touk.pl\/blog\/?p=13246"},"modified":"2017-10-17T09:32:02","modified_gmt":"2017-10-17T07:32:02","slug":"hamming-error-correction-with-kotlin-part-1","status":"publish","type":"post","link":"https:\/\/touk.pl\/blog\/2017\/10\/17\/hamming-error-correction-with-kotlin-part-1\/","title":{"rendered":"Hamming Error Correction with Kotlin &#8211; part 1"},"content":{"rendered":"<p>Hamming code is one of the Computer Science\/Telecommunication classics.<\/p>\n<p>In this article, we\u2019ll revisit the topic and implement a stateless Hamming(7,4) encoder using Kotlin.<\/p>\n<h2 id=\"hamming-error-correction\">Hamming Error Correction<\/h2>\n<p>Our communication channels and data storages <a href=\"http:\/\/www.cs.toronto.edu\/~bianca\/papers\/sigmetrics09.pdf\">are error-prone<\/a> \u2013 bits can flip due to various things like electric\/magnetic interferences, background radiation, or just because of the low quality of materials used.<\/p>\n<p>Since the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Neutron_flux\">neutron flux<\/a> is ~300 higher at around 10km altitude, a particular attention is necessary when dealing with systems operating at high altitudes \u2013 the case study of the\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Cassini\u2013Huygens\">Cassini-Huygens<\/a> proves it \u2013 in space, a number of reported errors was over four times bigger than on earth, hence the need for efficient error correction.<\/p>\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Richard_Hamming\">Richard Hamming<\/a>\u2018s Code is one of the solutions to the problem. It\u2019s a <em>perfect code<\/em> (at least, according to Hamming\u2019s definition) which can expose\u00a0and correct errors in transmitted messages.<\/p>\n<p>Simply put, it adds metadata to the message (in the form of parity bits) that can be used for validation and correction of errors in messages.<\/p>\n<h3 id=\"a-brief-explanation\">A Brief Explanation<\/h3>\n<p>I bet you already wondered what did (7,4) in \u201cHamming(7,4)\u201d mean.<\/p>\n<p>Simply put, N and M in \u201cHamming(N, M)\u201d represent\u00a0the block length and the message size \u2013 so, <strong>(7,4) means that it\u00a0encodes four bits into seven bits by adding three additional parity bits<\/strong> \u2013 as simple as that.<\/p>\n<p>This particular version can detect and correct single-bit errors, and detect (but not correct) double-bit errors.<\/p>\n<p>In the Hamming\u2019s codeword, parity bits always occupy all indexes that are powers of two (if we use 1-based-indexing).<\/p>\n<p>So, if our initial message is <em>1111,\u00a0<\/em>the codeword will look somewhat like <em>[][]1[]111<\/em> \u2013 with three parity bits\u00a0for us to fill in.<\/p>\n<p>If we want to calculate the n-th parity bit, we start on its position in a codeword, we take n elements, skip n elements, take n elements, skip n elements\u2026 and so on.\u00a0If the number of taken ones is odd, we set the parity bit to one, otherwise zero.<\/p>\n<p>In our case:<\/p>\n<ul>\n<li>For the first parity bit, we check indexes 1,3,5,7 \u00a0 \u00a0 \u00a0 -> <em>(1)()1()111<\/em><\/li>\n<li>For the second parity bit, we check indexes 2,3,6,7 ->\u00a0<em>(1)(1)1()111<\/em><\/li>\n<li>For the third parity bit, we check indexes 4,5,6,7 \u00a0 \u00a0 ->\u00a0<em>(1)(1)1(1)111<\/em><\/li>\n<\/ul>\n<p>And that\u2019s all \u2013 the codeword is <em>1111111<\/em>.<\/p>\n<p>In this case, it might be tempting to think that every sequence containing only ones will be encoded to another sequence comprising only ones\u2026 but that\u2019s not the case\u2026 but every message containing only zeros will always be encoded to zeros exclusively.<\/p>\n<h2 id=\"encoding\">Encoding<\/h2>\n<p>First things first, we can leverage Type Driven Development for making our life easier when working with Strings representing raw and encoded messages:<\/p>\n<pre>data class EncodedString(val value: String)\n\ndata class BinaryString(val value: String)<\/pre>\n<p>Using this approach, it\u2019ll be slightly harder to mix them up.<\/p>\n<p>We\u2019ll need a method for calculating the encoded codeword size for a given message. In this case, we simply find the lowest number of parity pairs that can cover the given message:<\/p>\n<pre>fun codewordSize(msgLength: Int) = generateSequence(2) { it + 1 }\n  .first { r -> msgLength + r + 1 <= (1 shl r) } + msgLength<\/pre>\n<p>Next, we\u2019ll need a method for calculating parity and data bits at given indexes for a given message:<\/p>\n<pre>fun getParityBit(codeWordIndex: Int, msg: BinaryString) =\n  parityIndicesSequence(codeWordIndex, codewordSize(msg.value.length))\n    .map { getDataBit(it, msg).toInt() }\n    .reduce { a, b -> a xor b }\n    .toString()\n\nfun getDataBit(ind: Int, input: BinaryString) = input\n  .value[ind - Integer.toBinaryString(ind).length].toString()<\/pre>\n<p>Where <em>parityIndicesSequence()<\/em> is defined as:<\/p>\n<pre>fun parityIndicesSequence(start: Int, endEx: Int) = generateSequence(start) { it + 1 }\n  .take(endEx - start)\n  .filterIndexed { i, _ -> i % ((2 * (start + 1))) < start + 1 }\n  .drop(1) \/\/ ignore the parity bit<\/pre>\n<p>Now, we can put it all together to form the actual solution, which simply is simply going through the whole codeword and filling it with parity bits and actual data:<\/p>\n<pre>override fun encode(input: BinaryString): EncodedString {\n    fun toHammingCodeValue(it: Int, input: BinaryString) =\n      when ((it + 1).isPowerOfTwo()) {\n          true -> hammingHelper.getParityBit(it, input)\n          false -> hammingHelper.getDataBit(it, input)\n      }\n\n    return hammingHelper.getHammingCodewordIndices(input.value.length)\n      .map { toHammingCodeValue(it, input) }\n      .joinToString(\"\")\n      .let(::EncodedString)\n}<\/pre>\n<p>Note that <em>isPowerOfTwo()<\/em> is our custom extension function and is not available out-of-the-box in Kotlin:<\/p>\n<pre>internal fun Int.isPowerOfTwo() = this != 0 && this and this - 1 == 0<\/pre>\n<h4 id=\"inlined\">Inlined<\/h4>\n<p>The interesting thing is that the whole computation can be inlined to a single Goliath sequence:<\/p>\n<pre>override fun encode(input: BinaryString) = generateSequence(0) { it + 1 }\n  .take(generateSequence(2) { it + 1 }\n    .first { r -> input.value.length + r + 1 <= (1 shl r) } + input.value.length)\n  .map {\n      when ((it + 1).isPowerOfTwo()) {\n          true -> generateSequence(it) { it + 1 }\n            .take(generateSequence(2) { it + 1 }\n              .first { r -> input.value.length + r + 1 <= (1 shl r) } + input.value.length - it)\n            .filterIndexed { i, _ -> i % ((2 * (it + 1))) < it + 1 }\n            .drop(1)\n            .map {\n                input\n                  .value[it - Integer.toBinaryString(it).length].toString().toInt()\n            }\n            .reduce { a, b -> a xor b }\n            .toString()\n          false -> input\n            .value[it - Integer.toBinaryString(it).length].toString()\n      }\n  }\n  .joinToString(\"\")\n  .let(::EncodedString)<\/pre>\n<p>Not the most readable version, but interesting to have a look.<\/p>\n<h3 id=\"in-action\">In Action<\/h3>\n<p>We can verify that the implementation works as expected by leveraging JUnit5 and Parameterized Tests:<\/p>\n<pre>@ParameterizedTest(name = \"{0} should be encoded to {1}\")\n@CsvSource(\n  \"1,111\",\n  \"01,10011\",\n  \"11,01111\",\n  \"1001000,00110010000\",\n  \"1100001,10111001001\",\n  \"1101101,11101010101\",\n  \"1101001,01101011001\",\n  \"1101110,01101010110\",\n  \"1100111,01111001111\",\n  \"0100000,10011000000\",\n  \"1100011,11111000011\",\n  \"1101111,10101011111\",\n  \"1100100,11111001100\",\n  \"1100101,00111000101\",\n  \"10011010,011100101010\")\nfun shouldEncode(first: String, second: String) {\n    assertThat(sut.encode(BinaryString(first)))\n      .isEqualTo(EncodedString(second))\n}<\/pre>\n<p>\u2026 and by using a home-made property testing:<\/p>\n<pre>@Test\n@DisplayName(\"should always encode zeros to zeros\")\nfun shouldEncodeZeros() {\n    generateSequence(\"0\") { it + \"0\" }\n      .take(1000)\n      .map { sut.encode(BinaryString(it)).value }\n      .forEach {\n          assertThat(it).doesNotContain(\"1\")\n      }\n}<\/pre>\n<h2 id=\"going-parallel\"><span style=\"font-family: Bitter, Georgia, serif;font-size: 30px\">Going Parallel<\/span><\/h2>\n<p>The most important property of this implementation is statelessness \u2013 it could be achieved by making sure that we\u2019re using only pure functions and avoiding shared mutable state \u2013 all necessary data is always passed explicitly as input parameters and not held in any form of internal state.<\/p>\n<p>Unfortunately, it results in some repetition and performance overhead that could\u2019ve been avoided if we\u2019re just modifying one mutable list and passing it around\u2026 but now we can utilize our resources wiser by parallelizing the whole operation \u2013 which should result in a performance improvement.<\/p>\n<p>Without running the code that\u2019s just wishful thinking so let\u2019s do that.<\/p>\n<p>We can parallelize the operation (naively) using Java 8\u2019s parallel streams:<\/p>\n<pre>override fun encode(input: BinaryString) = hammingHelper.getHammingCodewordIndices(input.value.length)\n  .toList().parallelStream()\n  .map { toHammingCodeValue(it, input) }\n  .reduce(\"\") { t, u -> t + u }\n  .let(::EncodedString)<\/pre>\n<p>To not give the sequential implementation an unfair advantage (no t<em>oList()<\/em> conversion so far), we\u2019ll need to change the implementation slightly:<\/p>\n<pre>override fun encode(input: BinaryString) = hammingHelper.getHammingCodewordIndices(input.value.length)\n  .toList().stream() \/\/ to be fair.\n  .map { toHammingCodeValue(it, input) }\n  .reduce(\"\") { t, u -> t + u }\n  .let(::EncodedString)<\/pre>\n<p>And now, we can perform some benchmarking using JMH (message.size == 10_000):<\/p>\n<pre>Result \"com.pivovarit.hamming.benchmarks.SimpleBenchmark.parallel\":\n 3.690 \u00b1(99.9%) 0.018 ms\/op [Average]\n (min, avg, max) = (3.524, 3.690, 3.974), stdev = 0.076\n CI (99.9%): [3.672, 3.708] (assumes normal distribution)\n\nResult \"com.pivovarit.hamming.benchmarks.SimpleBenchmark.sequential\":\n  10.877 \u00b1(99.9%) 0.097 ms\/op [Average]\n  (min, avg, max) = (10.482, 10.877, 13.498), stdev = 0.410\n  CI (99.9%): [10.780, 10.974] (assumes normal distribution)\n\n\n# Run complete. Total time: 00:15:14\n\nBenchmark                   Mode  Cnt   Score   Error  Units\nSimpleBenchmark.parallel    avgt  200   <strong>3.690<\/strong> \u00b1 0.018  <strong>ms\/op<\/strong>\nSimpleBenchmark.sequential  avgt  200  <strong>10.877<\/strong> \u00b1 0.097  <strong>ms\/op<\/strong><\/pre>\n<p>As we can see, we can notice a major performance improvement in favor of the parallelized implementation \u2013 of course; results might drastically change because of various factors so do not think that we\u2019ve found a silver bullet \u2013 they do not exist.<\/p>\n<p>For example, here\u2019re the results for encoding a very short message (message.size == 10)):<\/p>\n<pre>Benchmark                   Mode Cnt Score   Error Units\nSimpleBenchmark.parallel    avgt 200 <strong>0.024<\/strong> \u00b1 0.001 <strong>ms\/op<\/strong>\nSimpleBenchmark.sequential  avgt 200 <strong>0.003<\/strong> \u00b1 0.001 <strong>ms\/op<\/strong><\/pre>\n<p>In this case, the overhead of splitting the operation among multiple threads makes the parallelized implementation perform eight times slower(sic!).<\/p>\n<p>Here\u2019s the full table for the reference:<\/p>\n<pre>Benchmark            (messageSize) Mode Cnt Score   Error    Units\nBenchmark.parallel   10            avgt 200 0.022   \u00b1 0.001  ms\/op\nBenchmark.sequential 10            avgt 200 0.003   \u00b1 0.001  ms\/op\n \nBenchmark.parallel   100           avgt 200 0.038   \u00b1 0.001  ms\/op\nBenchmark.sequential 100           avgt 200 0.031   \u00b1 0.001  ms\/op\n\nBenchmark.parallel   1000          avgt 200 0.273   \u00b1 0.011  ms\/op \nBenchmark.sequential 1000          avgt 200 0.470   \u00b1 0.008  ms\/op\n\nBenchmark.parallel   10000         avgt 200 3.731   \u00b1 0.047  ms\/op\nBenchmark.sequential 10000         avgt 200 12.425  \u00b1 0.336  ms\/op\n<\/pre>\n<h2 id=\"conclusion\">Conclusion<\/h2>\n<p>We saw how to implement a thread-safe <a href=\"https:\/\/en.wikipedia.org\/wiki\/Hamming(7,4)\">Hamming(7,4)<\/a> encoder using Kotlin and what parallelization can potentially give us.<\/p>\n<p>In the second part of the article, we\u2019ll implement a Hamming decoder and see how we can correct single-bit errors and detect double-bit ones.<\/p>\n<p>Code snippets <a href=\"https:\/\/github.com\/pivovarit\/articles\/tree\/master\/kotlin-null-nonsafety\">can be found on GitHub.<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"Hamming code is one of the Computer Science\/Telecommunication classics. In this article, we\u2019ll revisit the topic and implement&hellip;\n","protected":false},"author":68,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11],"tags":[556],"class_list":["post-13246","post","type-post","status-publish","format-standard","category-development-design","tag-kotlin"],"_links":{"self":[{"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/posts\/13246","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/users\/68"}],"replies":[{"embeddable":true,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/comments?post=13246"}],"version-history":[{"count":5,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/posts\/13246\/revisions"}],"predecessor-version":[{"id":13251,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/posts\/13246\/revisions\/13251"}],"wp:attachment":[{"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/media?parent=13246"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/categories?post=13246"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/tags?post=13246"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}