Unlock stock picks and a broker-level newsfeed that powers Wall Street.
What Is The Future of Big Data?

Originally published by Bernard Marr on LinkedIn: What Is The Future of Big Data?

I have had the pleasure to speak to Mike Olson, one of the founders of Cloudera, to explore the future of big data.

Cloudera was established in 2008, when few had heard of the term “Big Data”, and has gone on to establish itself as a driving force in the field. Not only does it provide the Open Source technology which underpins many of today’s most demanding and ground breaking analytics projects. It also invests heavily in the development of new tools and applications which are opening up access to technologies such as machine learning, real time analytics, and more efficient use of unstructured data to a bigger than ever potential user base.

After leaving Oracle, Olson worked on developing open source database software before teaming up with former Yahoo, Google and Facebook engineers who had previous experience with Hadoop.

In 2009 their company, Cloudera, became the first commercial vendor of Hadoop, facilitating an explosion in the use of Big Data analytics in industry. Hadoop offered affordable access to large scale distributed storage and the fundamental technologies such as MapReduce, necessary for what we today call Big Data projects.

Olson tells me “When we started in 2008 no one was talking about Big Data at all. The only people who knew about Hadoop were Java programmers working for Facebook or Yahoo.

“So in the early days we had to be super-evangelical. Why does data matter? Why do we need so much of it, and why is this platform the right approach?”

Fast forward just three or four years and this is no longer the case – every analyst is declaring that Big Data is the tool which will redefine business and the strange sounding word “Hadoop” is on the tip of every tongue in the tech industry.

However, it still existed primarily as two somewhat complex components – the HDFS file system which allows huge amounts of data to be spread across vast volumes of cheap, off-the-shelf storage components. And the MapReduce framework which enables that data to be retrieved and processed.

“You could land the data in one place,” Olson tells me, “and you could get at it with obscure tools like MapReduce, but you had to write the tools to do it.

“What’s happened in the last few years is an explosion – not just of vendors of the platform, companies like ours – but also a rich ecosystem of other companies innovating in the space, adding value and also competing to drive real value for the customer.”

Undoubtedly it was that ecosystem – further Open Source developments such as HBase, Spark and Impala (created by Cloudera) - which has driven the opportunities we are seeing today with Big Data. No longer purely the domain of those trained in statistics and computer science, Big Data is put to work in the medical field to create new treatments and cures, in financial services to prevent fraudulent transactions, and by humanitarian organizations to deal with the results of war and natural disasters.