Q&A with Jack Norris of MapR Technologies
VSM: Why are companies adopting Hadoop to address “Big Data”?
JN: We’re at the cusp of one of the biggest paradigm shifts in computing. An IDC study released earlier this year confirmed that data is growing faster than Moore’s Law. The network is now the bottleneck and the best way to compete is by leveraging compute and data on the same node through Hadoop. Google popularized the MapReduce framework used by Hadoop. Google recognized this paradigm shift earlier than anyone else. When they launched they were yet another search engine in a very crowded market. This new computing paradigm allowed them to index more data, drive better results and do it much, much more economically than their competitors. That’s what’s driving Hadoop adoption today - the ability for companies in many industries to dramatically improve their competitive advantage.
VSM: What advantages does Hadoop provide?
JN: Hadoop provides the broadest, most flexible analytic foundation that translates into many advantages. First, simple algorithms that are run on big data sets beat complex models on smaller data sets every time. This has been confirmed in various applications including complex areas such as natural language processing. Second, Hadoop can handle a wide variety of data, particularly unstructured data. There is no need to engage in complex data transformation exercises to pre-process data. Third, Hadoop is by far the most economical way to handle large data and continue to scale across commodity hardware.
VSM: What are some of the biggest technology challenges Hadoop users face today?
JN: Customers are facing many kinds of issues as they deploy Hadoop. Some of these challenges include:
- Getting data in and out of Hadoop. Hadoop has a special distributed file system API that requires programs to batch load and unload data into a cluster.
- Deploying Hadoop into mission critical business projects. The lack of reliability of current Hadoop software platforms is a major impediment for expansion.
- Protecting data against application and user errors. Hadoop has no backup and restore capabilities. Users have to contend with data loss or resort to very expensive solutions that reside outside the actual Hadoop cluster.
According to industry research firm, ESG, half of the company’s they surveyed plan to leverage commercial distributions of Hadoop as opposed to the open source version. This trend indicates organizations are moving from experimental and pilot projects to mainstream applications with mission-critical requirements that include high availability, better performance, data protection, security, and ease of use.
VSM: How did MapR approach the Hadoop Market?
JN: The requirements to meet customer needs didn’t call for simply wrapping an open source project with additional services or some management components. It required fundamental changes — low level architectural breakthroughs — that drive improvements that are orders of magnitude better in speed, scale and efficiency. We made those investments and spent two full years in development before releasing the MapR Distribution for Apache Hadoop.

