July 9, 2014

Hadoop 101

image

What is “Hadoop” and what value does it bring to a company is the question that everybody seems to be asking. Though many become lost amid the industry jargon, the answer is actually quite simple: Hadoop is the dominant platform that stores and processes big data. As open-source software that can process structured, semi-structured, and unstructured data, Hadoop is the touchstone for companies seeking to utilize big data to its fullest potential and effectively cut costs.

To understand Hadoop’s potential value to organizations, it is important to first understand its benefits. Its value stems from two main features: its ability to store a limitless number of very large files and its ability to process these large chunks of data. For example, say you have an extremely large file that exceeds your PC’s capacity several times. The Hadoop Distributed File System utilizes multiple nodes or servers that allow for the storage of these extremely large files, which would be impossible otherwise. In addition, its second feature, known as MapReduce, processes all of this data using a unique method. Instead of moving the data to specific software for processing, which would take an eternity, MapReduce actually incorporates the software needed to process the appropriate data. Its first component known as “Map” extracts the desired values or information from the large data sets and its “Reduce” component aggregates these results to produce an “answer”, or final outcome. The total number of times a certain word appears in a database, consisting of over a million articles, is a simple example of an “answer” or final outcome that results from the MapReduce process. (Click here to see an illustration of the process).

Hadoop programming features like Pig and Hive, add additional functionality such as data summarization, query, and analysis and make Hadoop even more user-functional. Now the question CEOs should be asking in regards to adopting Hadoop should not be centered on Hadoop at all. In fact, what CEOs should determine first and foremost is how they are going to use big data analysis and how it is going to benefit their business.

Near-term interest in Hadoop may be short-lived as more and more applications begin to capitalize on the necessity of big data by running on Hadoop. As mentioned in Forbes, these applications will provide more value by answering the pertinent questions in a more efficient manner, shifting attention away from Hadoop itself. Soon enough, few will be talking about Hadoop as it will merely be the conduit via which data is collated and the applications will be the ones which we will actually care about.

Simultaneously, demand for application scalability, cloud app development, data collection, data management, and other factors are creating a need for big data technologies beyond Hadoop, according to an InformationWeek article. These emerging technologies, which include NoSQL databases, will begin to receive more interest and develop in tandem with Hadoop. CEOs will then have a new task: deciding which big data technology best suits their business needs.

 

Selected Hadoop-related Comparables

Note: SGI, SMCI, and TDC are hardware names that can process Hadoop programming and are much cheaper, from a valuation perspective, than more pure-play names like SPLK & WANdisco.

 

Trading Metrics

Trading

 

Operating Metrics

Operating