It’s now established that major Hadoop conferences drive a news cycle, and last week’s Hadoop Summit in San Jose, Calif., fit the pattern well. Although I couldn’t be at the event itself, a few vendors briefed me on their announcements for the show. In fact, each vendor had multiple announcements for the show and, taken together, they further the 2014 trend of Hadoop becoming more ubiquitous and mature. These announcements also strengthen a less-obvious trend this year: the diversification of the Hadoop platform.
Mainstream chops
MapR has announced that a new Hadoop App Gallery — something that could take a lot of the friction out of working with the Hadoop platform. A plethora of solutions for the platform has existed for a while, but discoverability for individual analyses was a lot higher for those “in the know” than for someone, or some company, new to the ecosystem. If Hadoop is to become enterprise standard, this kind of stuff is required to get it there. MapR also announced a partnership with Syncsort, which will enable users of its Hadoop distro to offload ETL workloads to mainframe resources — another sensible, enterprise-oriented move.
As nice as it is in theory to deploy Hadoop on a bunch of server boxes that a company might source and configure on its own, many enterprise customers will prefer more plug-and-play. So while the phrase “Hadoop appliance” may sound like an oxymoron it is nonetheless something customers will want. So Teradata‘s announcement of an enhanced Teradata Appliance for Hadoop will likely resonate with several companies, especially those who are Teradata customers. And if those customers want to ease their deployment with professional services in addition to the power-and-ping appliance,Teradata is beefing up those offerings, too.
So what about that diversification? To begin with, Teradata is introducing its own Hadoop Distribution, which it calls the Teradata Open Distribution for Hadoop (TDH). TDH, which is included with the Appliance for Hadoop, is in fact based on Hortonworks Data Platform (HDP), but with Teradata extras thrown in. Microsoft’s HDInsight distribution of Hadoop is similarly a superset of HDP, although on the Windows platform.
It’s not just for MapReduce anymore
Hadoop Summit also brought announcements of new alternatives to MapReduce on the Hadoop platform. MapR is announcing that Apache Drill (a project with which it is heavily involved) will be supported by the company for use with its Hadoop distribution later this month. At first blush Drill looks like another SQL-on-Hadoop solution — perhaps because, among other things, it is. But rather than requiring data in Hive format with a declared schema in HCatalog, Drill can query virtually any file in HDFS. Drill is also designed to work really well HBase, and has full support for querying nested data inside HBase column families. Finally, Drill supports full ANSI SQL, rather than Hive’s dialect of the query language, known as HiveQL.