Businesses eye cloud for big data deployments
As 2016 draws to a close, a new study suggests big data is growing in maturity and surging in the cloud.
AtScale, which specialises in BI on Hadoop using OLAP-like cubes, recently conducted a survey of more than 2,550 big data professionals at 1,400 companies across 77 countries. The survey was conducted in conjunction with Cloudera, Hortonworks, MapR, Cognizant, Trifacta and Tableau.
AtScale’s 2016 Big Data Maturity Survey found that nearly 70% of respondents have been using big data for more than a year (compared with 59% last year). More than three quarters (76%) of respondents are using Hadoop today, and 73% say they are now using Hadoop in production (compared with 65% last year). Additionally, 74% have more than 10 Hadoop nodes and 20% 20% have more than 100 nodes.
“The maturity of respondents in this survey is a key consideration,” Thomas Dinsmore, big data analytics industry analyst and author of the book “Disruptive Analytics,” said in a statement Wednesday. “One in five respondents has more than 100 nodes and 74% of them are in production, indicating double-digit growth year-over-year.”
Respondents also say they are increasingly turning to the cloud when it comes to hosting their big data analytics. Fifty-three% of respondents say they have already deployed big data in the cloud and 14% of respondent have all their big data in the cloud. Nearly three quarters (72%) plan to use the cloud for a big data deployment in the future.
“There’s been a clear surge in use of big data in the cloud over the last year and what’s perhaps as interesting is the fact that respondents are far more likely to achieve tangible value when their data is in the cloud,” says AtScale CTO and co-founder Matt Baird.
Hadoop is better off-premises
“Hadoop is freaking hard,” adds Dave Mariani, CEO and founder of AtScale. “It’s really hard to deploy, it’s really hard to manage. I see a lot of customers really like not having to worry about managing their Hadoop cluster. Being able to elastically scale, not just add new nodes but also shrink them, and to use object storage as a persistent layer to do that, that is a completely different notion than on-premises Hadoop.”
Alongside big data’s increasing maturity, the primary workloads are also shifting.
“The number one workload last year was ETL, then business intelligence, then data science,” says Bruno Aziza, chief marketing officer of AtScale. “This year, the number one workload was business intelligence.”
BI is big
ETL and data science remain popular big data workloads, but business intelligence (BI), which was already trending upward last year, has become the predominant workload with 75% of respondents using or planning to use BI on big data. And that is not slowing down any time soon if the indications are correct. Fully 97% of respondents said they would do as much or more with big data over the next three months.
While there has been a lot of hype around Spark, the survey found that 42% of organisations use Spark for educational purposes but have no real project using Spark as of yet. A third of respondents say Spark is primarily in development today, while 25% say they have deployed Spark in development and production.
“There’s a lot of excitement around Spark, but very little real-life deployment,” Aziza says.
“If you look at those planning on using Hadoop, most people go in thinking, ‘I’m going to be using Spark as my primary engine.’ But when you actually start using Hadoop, most people use Hive,” Mariani adds. “You would never use Spark for an ETL pipeline. You’re going to use Hive for that. But we would never use Hive for interactive queries; we’d use Spark or Impala for that.”
It should be noted, however, that organisations that have deployed Spark in production were 85% more likely to achieve value.
When it comes to concerns around big data, accessibility, security and governance have become the fastest growing areas of concern year-over year, with worries related to governance growing the most at 21%.
IDG News Service