Big Data: Peanut Butter and Jelly without the Jelly - Executive Viewpoint 2013 Prediction: MapR Technologies

By Ted Dunning (Profile)
Share
Thursday, January 17th 2013
Advanced

Over the next year I expect to see some interesting trends that are a bit like a peanut butter and jelly sandwich without the jelly or doughnuts without the holes, that is an apparent contradiction in terms. I also expect that people are going to develop a real taste for doughnuts both with and without holes.

First,

I expect to hear that lots of people install Hadoop clusters, but many will forget to ever run map-reduce programs. This will happen partly because the Hadoop eco-system components that surround the core map-reduce capabilities of Hadoop are becoming capable enough in their own right that more and more users will find out that what they need isn’t necessarily map-reduce, but it is more along the lines of big data in general.

Many of these map-reduce-less clusters will be running HBase or commercial equivalents like MapR’s M7. Some will be running Apache Drill in order to do large aggregating computations without map-reduce. Some will be running Storm to do high-speed real-time computations. Others will be hosting SolR clusters to deliver recommendations. Some will be doing all of the above.

The new idea is that Hadoop clusters will start to really grow up and take on a diversity of roles in the computing pantheon rather than being limited to the few roles available in the past.

Interestingly, this change will be a huge sign of success for Hadoop and the Hadoop community even if it means that Hadoop will sometimes be doing everything except one of the main functions Hadoop was originally designed to do. This benchmark for success may at first glance seem like a contradiction, but it really means that Hadoop is growing up as a platform.

Second,

I expect to hear that many more people are adopting machine-learning techniques than ever before. Except, that is, for the fact that they will call these techniques by exotic names like “counting”. Or “thresholding”.

Many people assume that machine learning requires a lot of complexity and a lot of high-level math. Machine learning traditionally has implied scary stuff with names like Bayesian inference and second order optimization using conjugate gradient descent. It used to imply either getting a PhD yourself or hiring a stable full of developers with degrees in math and sometimes sporting prickly personalities.

But it is an extraordinary thing that as data gets bigger, algorithms can get simpler.

We know that more and more people have big data. That means more and more people have data susceptible to analysis by simple algorithms.

As a result, the trend will be more people doing machine learning particularly with big data systems.  Many of these people won’t even realize they are using machine learning because what they will be doing will seem so simple.

This will be the year that Peter Norvig’s famous nostrum that big data and simple algorithms will beat complex algorithms every day will finally come home to roost.

The prickly personalities will, as always, be strictly optional...