{"id":13246,"date":"2017-10-17T09:32:02","date_gmt":"2017-10-17T07:32:02","guid":{"rendered":"https:\/\/touk.pl\/blog\/?p=13246"},"modified":"2017-10-17T09:32:02","modified_gmt":"2017-10-17T07:32:02","slug":"hamming-error-correction-with-kotlin-part-1","status":"publish","type":"post","link":"https:\/\/touk.pl\/blog\/2017\/10\/17\/hamming-error-correction-with-kotlin-part-1\/","title":{"rendered":"Hamming Error Correction with Kotlin &#8211; part 1"},"content":{"rendered":"<p>Hamming code is one of the Computer Science\/Telecommunication classics.<\/p>\n<p>In this article, we&#8217;ll revisit the topic and implement a stateless Hamming(7,4) encoder using Kotlin.<\/p>\n<h2 id=\"hamming-error-correction\">Hamming Error Correction<\/h2>\n<p>Our communication channels and data storages <a href=\"http:\/\/www.cs.toronto.edu\/~bianca\/papers\/sigmetrics09.pdf\">are error-prone<\/a> &#8211; bits can flip due to various things like electric\/magnetic interferences, background radiation, or just because of the low quality of materials used.<\/p>\n<p>Since the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Neutron_flux\">neutron flux<\/a> is ~300 higher at around 10km altitude, a particular attention is necessary when dealing with systems operating at high altitudes &#8211; the case study of the\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Cassini\u2013Huygens\">Cassini-Huygens<\/a> proves it &#8211; in space, a number of reported errors was over four times bigger than on earth, hence the need for efficient error correction.<\/p>\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Richard_Hamming\">Richard Hamming<\/a>&#8216;s Code is one of the solutions to the problem. It&#8217;s a <em>perfect code<\/em> (at least, according to Hamming&#8217;s definition) which can expose\u00a0and correct errors in transmitted messages.<\/p>\n<p>Simply put, it adds metadata to the message (in the form of parity bits) that can be used for validation and correction of errors in messages.<\/p>\n<h3 id=\"a-brief-explanation\">A Brief Explanation<\/h3>\n<p>I bet you already wondered what did (7,4) in &#8220;Hamming(7,4)&#8221; mean.<\/p>\n<p>Simply put, N and M in &#8220;Hamming(N, M)&#8221; represent\u00a0the block length and the message size &#8211; so, <strong>(7,4) means that it\u00a0encodes four bits into seven bits by adding three additional parity bits<\/strong> &#8211; as simple as that.<\/p>\n<p>This particular version can detect and correct single-bit errors, and detect (but not correct) double-bit errors.<\/p>\n<p>In the Hamming&#8217;s codeword, parity bits always occupy all indexes that are powers of two (if we use 1-based-indexing).<\/p>\n<p>So, if our initial message is <em>1111,\u00a0<\/em>the codeword will look somewhat like <em>[][]1[]111<\/em> &#8211; with three parity bits\u00a0for us to fill in.<\/p>\n<p>If we want to calculate the n-th parity bit, we start on its position in a codeword, we take n elements, skip n elements, take n elements, skip n elements&#8230; and so on.\u00a0If the number of taken ones is odd, we set the parity bit to one, otherwise zero.<\/p>\n<p>In our case:<\/p>\n<ul>\n<li>For the first parity bit, we check indexes 1,3,5,7 \u00a0 \u00a0 \u00a0 -&gt; <em>(1)()1()111<\/em><\/li>\n<li>For the second parity bit, we check indexes 2,3,6,7 -&gt;\u00a0<em>(1)(1)1()111<\/em><\/li>\n<li>For the third parity bit, we check indexes 4,5,6,7 \u00a0 \u00a0 -&gt;\u00a0<em>(1)(1)1(1)111<\/em><\/li>\n<\/ul>\n<p>And that&#8217;s all &#8211; the codeword is <em>1111111<\/em>.<\/p>\n<p>In this case, it might be tempting to think that every sequence containing only ones will be encoded to another sequence comprising only ones&#8230; but that&#8217;s not the case&#8230; but every message containing only zeros will always be encoded to zeros exclusively.<\/p>\n<h2 id=\"encoding\">Encoding<\/h2>\n<p>First things first, we can leverage Type Driven Development for making our life easier when working with Strings representing raw and encoded messages:<\/p>\n<pre>data class EncodedString(val value: String)\n\ndata class BinaryString(val value: String)<\/pre>\n<p>Using this approach, it&#8217;ll be slightly harder to mix them up.<\/p>\n<p>We&#8217;ll need a method for calculating the encoded codeword size for a given message. In this case, we simply find the lowest number of parity pairs that can cover the given message:<\/p>\n<pre>fun codewordSize(msgLength: Int) = generateSequence(2) { it + 1 }\n  .first { r -&gt; msgLength + r + 1 &lt;= (1 shl r) } + msgLength<\/pre>\n<p>Next, we&#8217;ll need a method for calculating parity and data bits at given indexes for a given message:<\/p>\n<pre>fun getParityBit(codeWordIndex: Int, msg: BinaryString) =\n  parityIndicesSequence(codeWordIndex, codewordSize(msg.value.length))\n    .map { getDataBit(it, msg).toInt() }\n    .reduce { a, b -&gt; a xor b }\n    .toString()\n\nfun getDataBit(ind: Int, input: BinaryString) = input\n  .value[ind - Integer.toBinaryString(ind).length].toString()<\/pre>\n<p>Where <em>parityIndicesSequence()<\/em> is defined as:<\/p>\n<pre>fun parityIndicesSequence(start: Int, endEx: Int) = generateSequence(start) { it + 1 }\n  .take(endEx - start)\n  .filterIndexed { i, _ -&gt; i % ((2 * (start + 1))) &lt; start + 1 }\n  .drop(1) \/\/ ignore the parity bit<\/pre>\n<p>Now, we can put it all together to form the actual solution, which simply is simply going through the whole codeword and filling it with parity bits and actual data:<\/p>\n<pre>override fun encode(input: BinaryString): EncodedString {\n    fun toHammingCodeValue(it: Int, input: BinaryString) =\n      when ((it + 1).isPowerOfTwo()) {\n          true -&gt; hammingHelper.getParityBit(it, input)\n          false -&gt; hammingHelper.getDataBit(it, input)\n      }\n\n    return hammingHelper.getHammingCodewordIndices(input.value.length)\n      .map { toHammingCodeValue(it, input) }\n      .joinToString(\"\")\n      .let(::EncodedString)\n}<\/pre>\n<p>Note that <em>isPowerOfTwo()<\/em> is our custom extension function and is not available out-of-the-box in Kotlin:<\/p>\n<pre>internal fun Int.isPowerOfTwo() = this != 0 &amp;&amp; this and this - 1 == 0<\/pre>\n<h4 id=\"inlined\">Inlined<\/h4>\n<p>The interesting thing is that the whole computation can be inlined to a single Goliath sequence:<\/p>\n<pre>override fun encode(input: BinaryString) = generateSequence(0) { it + 1 }\n  .take(generateSequence(2) { it + 1 }\n    .first { r -&gt; input.value.length + r + 1 &lt;= (1 shl r) } + input.value.length)\n  .map {\n      when ((it + 1).isPowerOfTwo()) {\n          true -&gt; generateSequence(it) { it + 1 }\n            .take(generateSequence(2) { it + 1 }\n              .first { r -&gt; input.value.length + r + 1 &lt;= (1 shl r) } + input.value.length - it)\n            .filterIndexed { i, _ -&gt; i % ((2 * (it + 1))) &lt; it + 1 }\n            .drop(1)\n            .map {\n                input\n                  .value[it - Integer.toBinaryString(it).length].toString().toInt()\n            }\n            .reduce { a, b -&gt; a xor b }\n            .toString()\n          false -&gt; input\n            .value[it - Integer.toBinaryString(it).length].toString()\n      }\n  }\n  .joinToString(\"\")\n  .let(::EncodedString)<\/pre>\n<p>Not the most readable version, but interesting to have a look.<\/p>\n<h3 id=\"in-action\">In Action<\/h3>\n<p>We can verify that the implementation works as expected by leveraging JUnit5 and Parameterized Tests:<\/p>\n<pre>@ParameterizedTest(name = \"{0} should be encoded to {1}\")\n@CsvSource(\n  \"1,111\",\n  \"01,10011\",\n  \"11,01111\",\n  \"1001000,00110010000\",\n  \"1100001,10111001001\",\n  \"1101101,11101010101\",\n  \"1101001,01101011001\",\n  \"1101110,01101010110\",\n  \"1100111,01111001111\",\n  \"0100000,10011000000\",\n  \"1100011,11111000011\",\n  \"1101111,10101011111\",\n  \"1100100,11111001100\",\n  \"1100101,00111000101\",\n  \"10011010,011100101010\")\nfun shouldEncode(first: String, second: String) {\n    assertThat(sut.encode(BinaryString(first)))\n      .isEqualTo(EncodedString(second))\n}<\/pre>\n<p>&#8230; and by using a home-made property testing:<\/p>\n<pre>@Test\n@DisplayName(\"should always encode zeros to zeros\")\nfun shouldEncodeZeros() {\n    generateSequence(\"0\") { it + \"0\" }\n      .take(1000)\n      .map { sut.encode(BinaryString(it)).value }\n      .forEach {\n          assertThat(it).doesNotContain(\"1\")\n      }\n}<\/pre>\n<h2 id=\"going-parallel\"><span style=\"font-family: Bitter, Georgia, serif;font-size: 30px\">Going Parallel<\/span><\/h2>\n<p>The most important property of this implementation is statelessness &#8211; it could be achieved by making sure that we&#8217;re using only pure functions and avoiding shared mutable state &#8211; all necessary data is always passed explicitly as input parameters and not held in any form of internal state.<\/p>\n<p>Unfortunately, it results in some repetition and performance overhead that could&#8217;ve been avoided if we&#8217;re just modifying one mutable list and passing it around&#8230; but now we can utilize our resources wiser by parallelizing the whole operation &#8211; which should result in a performance improvement.<\/p>\n<p>Without running the code that&#8217;s just wishful thinking so let&#8217;s do that.<\/p>\n<p>We can parallelize the operation (naively) using Java 8&#8217;s parallel streams:<\/p>\n<pre>override fun encode(input: BinaryString) = hammingHelper.getHammingCodewordIndices(input.value.length)\n  .toList().parallelStream()\n  .map { toHammingCodeValue(it, input) }\n  .reduce(\"\") { t, u -&gt; t + u }\n  .let(::EncodedString)<\/pre>\n<p>To not give the sequential implementation an unfair advantage (no t<em>oList()<\/em> conversion so far), we&#8217;ll need to change the implementation slightly:<\/p>\n<pre>override fun encode(input: BinaryString) = hammingHelper.getHammingCodewordIndices(input.value.length)\n  .toList().stream() \/\/ to be fair.\n  .map { toHammingCodeValue(it, input) }\n  .reduce(\"\") { t, u -&gt; t + u }\n  .let(::EncodedString)<\/pre>\n<p>And now, we can perform some benchmarking using JMH (message.size == 10_000):<\/p>\n<pre>Result \"com.pivovarit.hamming.benchmarks.SimpleBenchmark.parallel\":\n 3.690 \u00b1(99.9%) 0.018 ms\/op [Average]\n (min, avg, max) = (3.524, 3.690, 3.974), stdev = 0.076\n CI (99.9%): [3.672, 3.708] (assumes normal distribution)\n\nResult \"com.pivovarit.hamming.benchmarks.SimpleBenchmark.sequential\":\n  10.877 \u00b1(99.9%) 0.097 ms\/op [Average]\n  (min, avg, max) = (10.482, 10.877, 13.498), stdev = 0.410\n  CI (99.9%): [10.780, 10.974] (assumes normal distribution)\n\n\n# Run complete. Total time: 00:15:14\n\nBenchmark                   Mode  Cnt   Score   Error  Units\nSimpleBenchmark.parallel    avgt  200   <strong>3.690<\/strong> \u00b1 0.018  <strong>ms\/op<\/strong>\nSimpleBenchmark.sequential  avgt  200  <strong>10.877<\/strong> \u00b1 0.097  <strong>ms\/op<\/strong><\/pre>\n<p>As we can see, we can notice a major performance improvement in favor of the parallelized implementation &#8211; of course; results might drastically change because of various factors so do not think that we&#8217;ve found a silver bullet &#8211; they do not exist.<\/p>\n<p>For example, here&#8217;re the results for encoding a very short message (message.size == 10)):<\/p>\n<pre>Benchmark                   Mode Cnt Score   Error Units\nSimpleBenchmark.parallel    avgt 200 <strong>0.024<\/strong> \u00b1 0.001 <strong>ms\/op<\/strong>\nSimpleBenchmark.sequential  avgt 200 <strong>0.003<\/strong> \u00b1 0.001 <strong>ms\/op<\/strong><\/pre>\n<p>In this case, the overhead of splitting the operation among multiple threads makes the parallelized implementation perform eight times slower(sic!).<\/p>\n<p>Here&#8217;s the full table for the reference:<\/p>\n<pre>Benchmark            (messageSize) Mode Cnt Score   Error    Units\nBenchmark.parallel   10            avgt 200 0.022   \u00b1 0.001  ms\/op\nBenchmark.sequential 10            avgt 200 0.003   \u00b1 0.001  ms\/op\n \nBenchmark.parallel   100           avgt 200 0.038   \u00b1 0.001  ms\/op\nBenchmark.sequential 100           avgt 200 0.031   \u00b1 0.001  ms\/op\n\nBenchmark.parallel   1000          avgt 200 0.273   \u00b1 0.011  ms\/op \nBenchmark.sequential 1000          avgt 200 0.470   \u00b1 0.008  ms\/op\n\nBenchmark.parallel   10000         avgt 200 3.731   \u00b1 0.047  ms\/op\nBenchmark.sequential 10000         avgt 200 12.425  \u00b1 0.336  ms\/op\n<\/pre>\n<h2 id=\"conclusion\">Conclusion<\/h2>\n<p>We saw how to implement a thread-safe <a href=\"https:\/\/en.wikipedia.org\/wiki\/Hamming(7,4)\">Hamming(7,4)<\/a> encoder using Kotlin and what parallelization can potentially give us.<\/p>\n<p>In the second part of the article, we&#8217;ll implement a Hamming decoder and see how we can correct single-bit errors and detect double-bit ones.<\/p>\n<p>Code snippets <a href=\"https:\/\/github.com\/pivovarit\/articles\/tree\/master\/kotlin-null-nonsafety\">can be found on GitHub.<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"Hamming code is one of the Computer Science\/Telecommunication classics. In this article, we&#8217;ll revisit the topic and implement&hellip;\n","protected":false},"author":68,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11],"tags":[556],"class_list":{"0":"post-13246","1":"post","2":"type-post","3":"status-publish","4":"format-standard","6":"category-development-design","7":"tag-kotlin"},"_links":{"self":[{"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/posts\/13246","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/users\/68"}],"replies":[{"embeddable":true,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/comments?post=13246"}],"version-history":[{"count":5,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/posts\/13246\/revisions"}],"predecessor-version":[{"id":13251,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/posts\/13246\/revisions\/13251"}],"wp:attachment":[{"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/media?parent=13246"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/categories?post=13246"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/tags?post=13246"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}