{"id":10623,"date":"2012-06-18T10:08:09","date_gmt":"2012-06-18T09:08:09","guid":{"rendered":"http:\/\/zygm0nt.github.com\/blog\/2012\/06\/18\/hadoop-for-enterprises"},"modified":"2023-03-22T14:39:11","modified_gmt":"2023-03-22T13:39:11","slug":"hadoop-for-enterprises","status":"publish","type":"post","link":"https:\/\/touk.pl\/blog\/2012\/06\/18\/hadoop-for-enterprises\/","title":{"rendered":"Hadoop for Enterprises"},"content":{"rendered":"<div class=\"post\">\n<p>Hadoop\u2019s usage as a big data processing framework gains a lot of attention lately. Now, not only big players see, that they can embrace the data their sites or products are generating and develop their businesses on it. For that to happen two things are needed: the data itself and means of processing really big amounts of it.<\/p>\n<p><span class=\"image-wrap\"><img decoding=\"async\" style=\"border: 0px solid black;width: 400px\" src=\"http:\/\/blog.innovative-labs.com\/blog\/3488314950_453466f762_b.jpg\" alt=\"\" \/><\/span><\/p>\n<p>Gathering data is relatively easy. These are not necessarily structured data, you don\u2019t need to plan their usage at first. Just start collecting them and than you may experiment with their potential usage. If they\u2019ll come out as useless rubbish \u2013 deleting them won\u2019t be hard <img loading=\"lazy\" decoding=\"async\" class=\"emoticon\" src=\"http:\/\/zygm0nt.github.com\/confluence\/images\/icons\/emoticons\/smile.gif\" alt=\"\" width=\"20\" height=\"20\" align=\"absmiddle\" border=\"0\" \/> But imagine the values it may contribute to your business:<\/p>\n<ul>\n<li>faster services \u2013 working on optimized data<\/li>\n<li>more clients \u2013 because of more relevant search results<\/li>\n<li>happy clients \u2013 your service can \u201cread their minds\u201d <img loading=\"lazy\" decoding=\"async\" class=\"emoticon\" src=\"http:\/\/zygm0nt.github.com\/confluence\/images\/icons\/emoticons\/smile.gif\" alt=\"\" width=\"20\" height=\"20\" align=\"absmiddle\" border=\"0\" \/><\/li>\n<li>etc.<\/li>\n<\/ul>\n<p>There are many companies that utilize Hadoop ecosystem for their own needs. You can read about some of them here: <a class=\"external-link\" href=\"http:\/\/wiki.apache.org\/hadoop\/PoweredBy\" rel=\"nofollow\">http:\/\/wiki.apache.org\/hadoop\/PoweredBy<\/a> But since that page lacks insight into specific applications of Hadoop I\u2019ve tried to delve into<\/p>\n<p>details of how Hadoop helped tame some companies\u2019 big data sets.<\/p>\n<h2 id=\"facebook\"><a name=\"test-Facebook\"><\/a>Facebook<\/h2>\n<p>Being a social network provider, a widely used one, they require no introduction. However if you\u2019ve lived under a rock for last couple years just visit their website <a class=\"external-link\" href=\"http:\/\/facebook.com\" rel=\"nofollow\">http:\/\/facebook.com<\/a><\/p>\n<p>Their main usage is data warehousing. Since they require to be able to access the data fast and reliably they had a need for real-time querying of their huge, and always growing data set. Their switch from MySQL databases was required due to the increasing workloads they experienced with standard databases. What they got \u201cout of the box\u201d with Hadoop was all the benefits of distributed file system (HDFS features). They expanded the ideas behind that even further and implemented truly Highly Available file system without Single Point of Failure.<\/p>\n<p>Facebook has 3 interesting usage scenarios in which Hadoop plays a major role:<\/p>\n<ul>\n<li>Titan \u2013 is Facebook\u2019s messaging system. It processes messages exchanged between users. Ensures that it happens fast and without glitches. Here Hadoop is used mainly as a huge, unlimited storage.<\/li>\n<li>Puma \u2013 Facebook Insights \u2013 a tool providing page statistics for advanced Facebook users. Based on streams of data (clicks, likes, shares, comments and impressions) it graphs those data and makes it available near instantly.<\/li>\n<li>ODS \u2013 Operational Data Store \u2013 which stores Facebook\u2019s internal metrics \u2013 collections of OS and cluster health metrics. And it facilitates multiple accounting solutions.<\/li>\n<\/ul>\n<h2 id=\"twitter\"><a name=\"test-Twitter\"><\/a>Twitter<\/h2>\n<p>This popular micro-blogging platform, where you can register your account and follow friends and celebrities for their micro-messages does some pretty interesting things with their Hadoop cluster.<\/p>\n<p>One of their motivations is to speed up their web-page\u2019s functionality. That is why the compute users\u2019 friendships in Twitter\u2019s social graph with Hadoop. Using connections between users they calculate their relationship to each other and estimate groups of users.<\/p>\n<p>Since this service\u2019s users generate lots of content, the company conducts researches based on natural language processing. They probe what could be told about a user from his tweets. They use tweets\u2019 contents for advertisement purpose, trends analysis and many more.<\/p>\n<p>From tweets and user\u2019s behaviours they characterise usage scenarios. Also, they gather usage statistics, like number of searches daily, number of tweets. Based on this seemingly irrelevant data they run comparisons of different types of users. Twitter analyzes data to determine whether mobile users, users who use third party clients or power users use Twitter differently from average users. Of course theses seem like really specific applications but nevertheless they are very original and base on the data that Twitter has been gathering for some time now.<\/p>\n<h2 id=\"ebay\"><a name=\"test-EBay\"><\/a>EBay<\/h2>\n<p>Being the biggest auctioning site on the Internet, EBay uses Hadoop processing for increasing search relevance based on click-stream data, user data. This seems pretty obvious, considering their area of operation.<\/p>\n<p>However the also have one other interesting thing \u2013 they try hard to automatically fill auctioned objects\u2019 metadata, based on the descriptions and other data provided by users. They employ data mining approach for this tasks and judging from their constant growth it seems to work <img loading=\"lazy\" decoding=\"async\" class=\"emoticon\" src=\"http:\/\/zygm0nt.github.com\/confluence\/images\/icons\/emoticons\/smile.gif\" alt=\"\" width=\"20\" height=\"20\" align=\"absmiddle\" border=\"0\" \/><\/p>\n<h2 id=\"linkedin\"><a name=\"test-LinkedIn\"><\/a>LinkedIn<\/h2>\n<p>Social network for professionals, thou a lot smaller than Facebook. Based on click-streams they discover relations between users. All the data concerning latest visits on your profile or people you may know from other places \u2013 this comes from Hadoop based analysis of those clicks people make all the time on their sites.<\/p>\n<p>Also a very neat feature, called InMaps (<a class=\"external-link\" href=\"http:\/\/inmaps.linkedinlabs.com\/\" rel=\"nofollow\">http:\/\/inmaps.linkedinlabs.com\/<\/a>) analyse declared schools and companies and generates data for graph with clustered friends of yours.<\/p>\n<h2 id=\"last-fm\"><a name=\"test-Last.fm\"><\/a>Last.fm<\/h2>\n<p>This on-line radio site, praised by many for its invaluable recommendations\u2019 system seems like a rather small and simple service. But behind the facade of simple web page there are lots of data being processed, so that their services could match a certain level of perfection.<\/p>\n<p>Such large volume of their data comes from scrobbles. Each users of their service listening to a song generates a note about this fact \u2013 called scrobble. Based on that and user profiles they calculate global band popularity charts, maps of bands\u2019 popularity and many more usage statistics and timeline charts.<\/p>\n<p><span class=\"image-wrap\"><img decoding=\"async\" style=\"border: 0px solid black;width: 400px\" src=\"http:\/\/blog.innovative-labs.com\/blog\/7346959440_71648c9fe7_b.jpg\" alt=\"\" \/><\/span><\/p>\n<h2 id=\"conclusion\"><a name=\"test-Conclusion\"><\/a>Conclusion<\/h2>\n<p>They just try to detect and trace new patterns in seemingly chaotic data sets. Perhaps you could also do the same? Analyze your data and expand your business value?<\/p>\n<\/div>\n<h2 id=\"comments\">Comments<\/h2>\n<div class=\"comments\">\n<div class=\"comment\">\n<div class=\"author\"><a href=\"http:\/\/www.wedding-cake-decorations.net\">wedding cake decorations<\/a><\/div>\n<div class=\"content\">\n<p>We stumbled over here from a different web address and thought I might check things out.<\/p>\n<p>I like what I see so i am just following you.<\/p>\n<p>Look forward to checking out your web page yet again.<\/p>\n<\/div>\n<\/div>\n<div class=\"comment\">\n<div class=\"author\"><a href=\"http:\/\/www.car-floor-mats.net\">rubber floor mats<\/a><\/div>\n<div class=\"content\">\n<p>I like what you guys are up too. This type of clever work and reporting!<\/p>\n<p>Keep up the awesome works guys I\u2019ve added you guys to my own blogroll.<\/p>\n<\/div>\n<\/div>\n<div class=\"comment\">\n<div class=\"author\"><a href=\"http:\/\/www.bispioner.ru\/phpBB3\/memberlist.php?mode=viewprofile&u=119586\">Svayambhut Ghosh<\/a><\/div>\n<div class=\"content\">\n<p>Greetings from Florida! I\u2019m bored at work so I decided to browse your site on my iphone during lunch break. I enjoy the info you present here and can\u2019t wait to take a look<\/p>\n<p>when I get home. I\u2019m surprised at how quick your blog loaded on my cell phone .. I\u2019m not even using WIFI, just 3G .<\/p>\n<p>. Anyways, very good site!<\/p>\n<\/div>\n<\/div>\n<div class=\"comment\">\n<div class=\"author\"><a href=\"http:\/\/www.salethenorthfacejackets.com\">north face jackets<\/a><\/div>\n<div class=\"content\">\n<p>Comfortableness <a href=\"http:\/\/www.salethenorthfacejackets.com\">north face jackets<\/a><\/p>\n<p>is crucial when they get it that will <a href=\"http:\/\/www.salethenorthfacejackets.com\">north face outlet<\/a> get the best school bags pertaining to going camping <a href=\"http:\/\/www.salethenorthfacejackets.com\">north face sale<\/a>. Your easiest guarantee in the case of even larger delivers has become One with an inner metal framework, one that can wind <a href=\"http:\/\/www.salethenorthfacejackets.com\">cheap north face<\/a> up being aligned to help you appropriately fit your <a href=\"http:\/\/www.salethenorthfacejackets.com\/the-north-face-women-1\">north face women<\/a> body. They should be now have http:\/\/www.salethenorthfacejackets.com secure which were wholly flexible, because essentially in the form of midsection belt to get more aid.<\/p>\n<\/div>\n<\/div>\n<div class=\"comment\">\n<div class=\"author\"><a href=\"http:\/\/www.plants-for-sale.com\">plants sale<\/a><\/div>\n<div class=\"content\">\n<p>I never imagined how much stuff there was out there<\/p>\n<p>on this! Thanks for making it easy to get the picture<\/p>\n<\/div>\n<\/div>\n<div class=\"comment\">\n<div class=\"author\"><a href=\"http:\/\/www.facebook.com\/profile.php?id=100003406472249\">gWgVcetqzVZukd<\/a><\/div>\n<div class=\"content\">\n<p>What Programming Languages Do Jobs Require? | Regular Geek regulargeek.com\/2009\/07\/21\/what-programming-languages-do-jobs-require view page cahecd As a software engineer, you need to keep your skills sharp and current. This is a general requirement of the job. In addition to this, in the current economy you do not want to be without a job. Obviously, this means learning more about what your current company uses for all of its development. What if you do not have a job or you are looking to leave? What technologies or programming languages should you be looking into? From the page<\/p>\n<\/div>\n<\/div>\n<div class=\"comment\">\n<div class=\"author\"><a href=\"http:\/\/businesses.wickedlocal.com\/MA-Westford\/Computer-Service-and-Repair\/15\">computer pc repair<\/a><\/div>\n<div class=\"content\">\n<p>Howdy are using WordPress for your site platform? I\u2019m new to the blog world but I\u2019m trying to<\/p>\n<p>get started and create my own. Do you need any coding expertise to make your own<\/p>\n<p>blog? Any help would be greatly appreciated!<\/p>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"Hadoop\u2019s usage as a big data processing framework gains a lot of attention lately. Now, not only big&hellip;\n","protected":false},"author":11,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11],"tags":[],"class_list":["post-10623","post","type-post","status-publish","format-standard","category-development-design"],"_links":{"self":[{"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/posts\/10623","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/users\/11"}],"replies":[{"embeddable":true,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/comments?post=10623"}],"version-history":[{"count":4,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/posts\/10623\/revisions"}],"predecessor-version":[{"id":15505,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/posts\/10623\/revisions\/15505"}],"wp:attachment":[{"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/media?parent=10623"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/categories?post=10623"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/touk.pl\/blog\/wp-json\/wp\/v2\/tags?post=10623"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}