{"id":13534,"date":"2018-08-07T12:00:39","date_gmt":"2018-08-07T10:00:39","guid":{"rendered":"https:\/\/medium.com\/p\/19ac7e13fd06"},"modified":"2023-03-16T16:49:26","modified_gmt":"2023-03-16T15:49:26","slug":"analyzing-2018-world-cup-match-data-with-clojure","status":"publish","type":"post","link":"https:\/\/touk.pl\/blog\/2018\/08\/07\/analyzing-2018-world-cup-match-data-with-clojure\/","title":{"rendered":"Analyzing 2018 World Cup match data with Clojure"},"content":{"rendered":"<p>So, Russia 2018 World Cup is over. There were lots of good moments, and some worse (especially for us, here in Poland\u2026.). But is there something we can learn from this event as programmers? Recently, I had a few free evenings and played a bit with a <a href=\"https:\/\/worldcup.sfg.io\/\">World Cup API<\/a>. In this post I want to show you what is Clojure way of dealing with REST APIs and when it beats other languages in this field.<br \/>\n<!--more--><br \/>\n<strong>TLDR;<\/strong> The <a href=\"https:\/\/github.com\/pjagielski\/worldcup\">code is on GitHub<\/a>. Setup is based on Atom and <a href=\"https:\/\/github.com\/jasongilman\/proto-repl\">proto-repl<\/a> package which makes the interactive development experience really pleasant. I highly recommend you follow its installation guide while dealing with my solution.<\/p>\n<p>As a problem to solve, I wanted to find something not trivial, to show some of the Clojure super-powers. Finally, I came up with this:<\/p>\n<blockquote><p>The problem: find all matches the winning team made a come-back\u200a\u2014\u200athey lost a goal first, but managed to win.<\/p><\/blockquote>\n<p>For example Sweden\u200a\u2014\u200aGermany in a group&nbsp;stage:<\/p>\n<figure><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/619\/1*8J65h0s381WyH1zHAtQ7gw.png\" alt=\"\"><\/figure>\n<p>Ok, so a good starting point would be getting all matches data for further processing. I\u2019m using <code>clj-http<\/code> as an http client (with <code>cheshire<\/code> for json parsing support):<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">(ns worldcup.matches\r\n  (:require\r\n    [clj-http.client :as http]))\r\n\r\n(def api-root \"https:\/\/worldcup.sfg.io\")\r\n\r\n(defn get-all-matches []\r\n  (-&gt;\r\n    (http\/get (str api-root \"\/matches\") {:as :json})\r\n    :body))\r\n<\/pre>\n<p>The Clojure philosophy relies heavily on dealing with built-in data structures. In statically typed languages you\u2019d probably start with modelling response JSON as some POJO classes; here you just get a map and can start to investigate it right from the beginning. Huge difference.<\/p>\n<p>One thing that may not be clear in the above code is this <code>-&gt;<\/code> <a href=\"https:\/\/clojuredocs.org\/clojure.core\/-%3E\">macro<\/a>. It is really helpful for nested collection operations. Instead of nesting <code>(xxx (yyy (zzz&nbsp;\u2026<\/code> calls, you can use both <code>-&gt;<\/code> and <code>-&gt;&gt;<\/code> macros to make code similar to e.g. Java equivalent:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">(-&gt;&gt; lst\r\n  (map op1)\r\n  (map op2)\r\n  (filter p1)\r\n  first)\r\n\r\nlst.stream()\r\n    .map(op1)\r\n    .map(op2)\r\n    .filter(p1)\r\n    .findFirst()\r\n<\/pre>\n<p>Ok, let\u2019s start to investigate the API response. Using <code>proto-repl<\/code> it is trivial to examine the value of any data structure straight from your editor. Let\u2019s see the match data\u200a\u2014\u200aby using <code>Proto REPL: Execute Block<\/code> action\u200a\u2014\u200a<code>ctrl+alt+,s<\/code> shortcut by&nbsp;default:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/868\/1*8_4a0My5-Xs9pX-jRH_RbQ.png\" alt=\"\"><\/p>\n<p>It\u2019s a common practice to wrap interactive invocations into <code>comment<\/code> macro which makes them ignored when evaluating whole file at&nbsp;once.<\/p>\n<p>Ok, so we have some basic attributes of the match, the number of goals for each team and some team events. Let\u2019s investigate these:<\/p>\n<figure><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/975\/1*cd7mzu6s3bu2ABN2wwLIeQ.png\" alt=\"\"><\/figure>\n<p>Looks promising\u200a\u2014\u200awe have all the times of the goals scored by each team. Having this, our algorithm may look like&nbsp;this:<\/p>\n<ol>\n<li>Get times of goals for each&nbsp;team<\/li>\n<li>Sort goals by&nbsp;times<\/li>\n<li>Get the team with the first&nbsp;goal<\/li>\n<li>Check if the winner is the other&nbsp;team<\/li>\n<\/ol>\n<p>Let\u2019s start with filtering the goals from team\u2019s events. First, we need to find how to filter goals from match&nbsp;events:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/729\/1*ncUOl7JJfJAqTrMqC8SYtA.png\" alt=\"\"><\/p>\n<p>Ok, so we have <code>goal<\/code>, <code>goal-penalty<\/code> and <code>goal-own<\/code>:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">defn goal? [e]\r\n  (let [type (:type_of_event e)]\r\n    (or (= \"goal\" type)\r\n        (= \"goal-penalty\" type)\r\n        (= \"goal-own\" type))))\r\n<\/pre>\n<p><figure><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/977\/1*kmypLavpZTnk5DFEXtmANw.png\" alt=\"\"><\/figure>\n<p>Looks good.<\/p>\n<p>We\u2019ll also need the time when the goal was scored and this is in a really strange format, e.g <code>90'+4'<\/code>. We need to split on <code>+<\/code> character, remove <code>'<\/code> and <code>trim<\/code> and then add additional time to get exact minute of the&nbsp;goal:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">(require '[clojure.string :as string])\r\n\r\n(defn goal-time [t]\r\n  (-&gt;&gt; \r\n    (string\/split t #\"\\+\")\r\n    (map string\/trim)\r\n    (map #(string\/replace % #\"'\" \"\"))\r\n    (map #(Integer\/parseInt %))\r\n    (reduce +)))\r\n<\/pre>\n<p><figure><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/796\/1*tCCjmnEy2SVuvpVSx3lOKQ.png\" alt=\"\"><\/figure>\n<p><strong>EDIT:<\/strong><\/p>\n<p><a href=\"https:\/\/twitter.com\/bartektartanus\">Bartek Tartanus<\/a> (thanks!) found a bug in just adding additional time to half time: suppose we had late goal in the first half and early goal in the second (<code>45'+5'<\/code> and <code>46<\/code>)\u200a\u2014\u200amy algorithm will pick second half goal as earlier. We need to somehow pass half (or part for extra-time) data to sort goal times. Fortunately, Clojure sorts collections of pairs well:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*7p29ZTHeXZdwhG5TQekpTg.png\" alt=\"\" \/><\/p>\n<p>We need to calculate both half (part) and exact time of scoring a goal:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">(defn part [goal]\r\n  (let [time (first goal)]\r\n    (cond\r\n      (&lt;= time 45) 1\r\n      (&lt;= time 90) 2\r\n      (&lt;= time 105) 3   \r\n      (&lt;= time 120) 4))) ;; parts 3 and 4 are for extra-time\r\n\r\n(defn parts [time-str]\r\n  (-&gt;&gt;\r\n    (string\/split time-str #\"\\+\")\r\n    (map string\/trim)\r\n    (map #(string\/replace % #\"'\" \"\"))\r\n    (map #(Integer\/parseInt %))))\r\n\r\n(defn part-and-time [time-str]\r\n  (let [parts (parts time-str)]\r\n    [(part parts) (reduce + parts)]))\r\n<\/pre>\n<p>Now we can test improved sorting:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*LlfUshbUe76Wnl8LYGVW9Q.png\" alt=\"\"><\/p>\n<p>Seems to work! So let\u2019s try to&nbsp;find&nbsp;a&nbsp;team&nbsp;that&nbsp;scored&nbsp;a&nbsp;first&nbsp;goal. My idea was to mix both&nbsp;<code>:home<\/code> and&nbsp;<code>:away<\/code> goal times in one collection:<\/p>\n<pre>([:home [1 38]] [:home [1 39]] [:away [1 28]])<\/pre>\n<p>and then sort by&nbsp;times:<\/p>\n<pre>([:away [1 28]] [:home [1 38]] [:home [1 39]])<\/pre>\n<p>to know the side that scored first goal (<code>:away<\/code> in this&nbsp;case).<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">(defn first-scored-side [match]\r\n  (let [goal-times\r\n          (concat\r\n            (team-goal-times (:home_team_events match) :home)\r\n            (team-goal-times (:away_team_events match) :away))]\r\n   (-&gt;&gt; goal-times (sort-by second) first first)))\r\n<\/pre>\n<p>I use <code>first<\/code> to get the first pair from sorted collection and then <code>first<\/code> again to get the first element from the&nbsp;pair:<\/p>\n<figure><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*nuzeEJ0uijTMN1m3_XzVYg.png\" alt=\"\"><\/figure>\n<p>To create a list with goal times and side I used this function:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">(defn team-goal-times [events side]\r\n  (-&gt;&gt; events\r\n    (filter goal?)\r\n    (map :time)\r\n    (map part-and-time)\r\n    (map vector (repeat side))))\r\n<\/pre>\n<p>What\u2019s happening in this last line? So, <code>repeat<\/code> returns an infinite lazy sequence with an element repeated and <code>vector<\/code> just creates an <a href=\"https:\/\/clojure.org\/reference\/data_structures#Vectors\">indexed collection<\/a> from provided elements.<\/p>\n<figure><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/868\/1*Q-S-BqqEPllaY5ykpTFP1w.png\" alt=\"\"><\/figure>\n<p>And map can also take two collections, applying provided function to pairs: <code>&lt;c1_first, c2_first&gt;, &lt;c1_second, c2_second&gt;<\/code>&#8230;<\/p>\n<figure><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/844\/1*z0-uyJkIzVWY4tQi0UFxAw.png\" alt=\"\"><\/figure>\n<p>Which leads us easily to&nbsp;this:<\/p>\n<figure><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*-xx5oIrfnVpk4porEPj-NA.png\" alt=\"\"><\/figure>\n<p>Ok, now we are ready write this <code>come-back?<\/code> function&nbsp;now:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">(defn winner-side [match]\r\n  (let [winner (:winner match)]\r\n    (cond\r\n      (= winner (:home_team_country match)) :home\r\n      (= winner (:away_team_country match)) :away)))\r\n\r\n(defn come-back? [match]\r\n  (let [first-scored-side (first-scored-side match)\r\n        winner-side (winner-side match)]\r\n    (and (not (nil? winner-side))\r\n         (not= winner-side first-scored-side))))\r\n<\/pre>\n<p>Let\u2019s see if it works, just filter the team\u2019s data from the match to remove&nbsp;noise:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">(defn teams [match]\r\n  (select-keys match [:home_team_country :away_team_country :winner]))\r\n<\/pre>\n<p><figure><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/921\/1*uR1-0VG8xmUEnNUEqa945g.png\" alt=\"\"><\/figure>\n<p>Ok, we got some results! But, hmm\u2026 Morocco\u200a\u2014\u200aIran?<\/p>\n<figure><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/684\/1*HBhQf2OxgrDt4AJUCldePA.png\" alt=\"\"><\/figure>\n<p>0:1 with last-minute own goal? This definitely doesn\u2019t look like come-back\u2026. Let\u2019s look at this match&nbsp;events:<\/p>\n<figure><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/669\/1*9Br1yQ6PICw5KCdYc07M3w.png\" alt=\"\"><\/figure>\n<p>Ok, now I can see the problem. Own goals are contained in wrong side\u2019s events! This goal should be in Iran\u2019s events, otherwise our solution won\u2019t&nbsp;work\u2026<\/p>\n<p>We can make a special case for own goals or just move own goals to correct side events. I decided to go with the second solution, knowing that Clojure is really good at transforming nested data structures e.g. with update and update-in functions:<\/p>\n<figure><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/891\/1*aOQFNujFOHkWyE8nfNqyYw.png\" alt=\"\"><\/figure>\n<p>So, <code>update<\/code> takes a map, a key and a function and applies to value under that key, leaving rest of the map untouched. <code>update-in<\/code> works the same, but you can pass a whole path to some nested element in your data structure.<\/p>\n<p>Let\u2019s use update to move all the own-goals from <code>:home_team_events<\/code> to&nbsp;<code>:away_team_events<\/code> and vice-versa:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">(defn own-goal? [event]\r\n  (= \"goal-own\" (:type_of_event event)))\r\n\r\n(defn own-goals [events]\r\n  (-&gt;&gt; events (filter own-goal?)))\r\n\r\n(defn remove-own-goals [events]\r\n  (remove own-goal? events))\r\n\r\n(defn fix-match [match-to-fix]\r\n  (let [own-home-goals (own-goals (:home_team_events match-to-fix))\r\n        own-away-goals (own-goals (:away_team_events match-to-fix))]\r\n   (-&gt; match-to-fix\r\n     (update :home_team_events remove-own-goals)\r\n     (update :away_team_events remove-own-goals)\r\n     (update :home_team_events concat own-away-goals)\r\n     (update :away_team_events concat own-home-goals))))\r\n<\/pre>\n<p>And finally let\u2019s fix <code>come-back?<\/code> function:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">(defn come-back? [match-to-fix]\r\n  (let [match (fix-match match-to-fix)\r\n        first-scored-side (first-scored-side match)\r\n        winner-side (winner-side match)]\r\n    (and (not (nil? winner-side))\r\n         (not= winner-side first-scored-side))))\r\n<\/pre>\n<p>Check the results again:<\/p>\n<figure><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/894\/1*cL51zGOOjspsEC3QpMS4BQ.png\" alt=\"\"><\/figure>\n<p>Yes, we have it! We found 9 such games, with overall leader Croatia coming back 3 times (2 by penalties).<\/p>\n","protected":false},"excerpt":{"rendered":"So, Russia 2018 World Cup is over. There were lots of good moments, and some worse (especially for&hellip;\n","protected":false},"author":5,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11],"tags":[491,649,648,647],"class_list":{"0":"post-13534","1":"post","2":"type-post","3":"status-publish","4":"format-standard","6":"category-development-design","7":"tag-clojure","8":"tag-data-analysis","9":"tag-rest-api","10":"tag-world-cup"},"_links":{"self":[{"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/posts\/13534","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/comments?post=13534"}],"version-history":[{"count":10,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/posts\/13534\/revisions"}],"predecessor-version":[{"id":15308,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/posts\/13534\/revisions\/15308"}],"wp:attachment":[{"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/media?parent=13534"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/categories?post=13534"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/tags?post=13534"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}