Database structure

How Big Data is changing the database landscape for good

Pro
Source: Stockfresh

11 November 2015

Mention the word “database,” and most people think of the venerable RDBMS that has dominated the landscape for more than 30 years. That, however, may soon change.

A whole crop of new contenders is now vying for a piece of this key enterprise market, and while their approaches are diverse, most share one thing in common: a razor-sharp focus on Big Data.

Much of what’s driving this new proliferation of alternatives is what’s commonly referred to as the “three V’s” underlying big data: volume, velocity and variety.

Essentially, data today is coming at us faster and in greater volumes than ever before; it’s also more diverse. It’s a new data world, in other words, and traditional relational database management systems weren’t really designed for it.

“The dirty little secret of Big Data is that data still sits in little silos that don’t mesh with other data. We’ve proven it can all be represented mathematically, so it all integrates,” Charles Silver, Algebraix

“Basically, they cannot scale to big, or fast, or diverse data,” said Gregory Piatetsky-Shapiro, president of KDnuggets, an analytics and data-science consultancy.

That is what Harte Hanks recently found. Up until 2013 or so, the marketing services agency was using a combination of different databases including Microsoft SQL Server and Oracle Real Application Clusters (RAC).

Scale-out platform
“We were noticing that with the growth of data over time, our systems couldn’t process the information fast enough,” said Sean Iannuzzi, the company’s head of technology and development. “If you keep buying servers, you can only keep going so far. We wanted to make sure we had a platform that could scale outward.”

Minimising disruption was a key goal, Iannuzzi said, so “we couldn’t just switch to Hadoop.”

Instead, it chose Splice Machine, which essentially puts a full SQL database on top of the popular Hadoop Big Data platform and allows existing applications to connect with it, he said.

Harte Hanks is now in the early stages of implementation, but it’s already seeing benefits, Iannuzzi said, including improved fault tolerance, high availability, redundancy, stability and “performance gains overall”.

There’s a sort of perfect storm propelling the emergence of new database technologies, said Carl Olofson, a research vice president with IDC.

First, “the equipment we’re using is much more capable of handling large data collections quickly and flexibly than in the past,” Olofson noted.

In the old days, such collections “pretty much had to be put on spinning disk” and the data had to be structured in a particular way, he explained.

Now there’s 64-bit addressability, making it possible to set up larger memory spaces, as well as much faster networks and the ability to string multiple computers together to act as single, large databases.

“Those things have opened up possibilities that weren’t available before,” Olofson said.

Changing workloads
Workloads, meanwhile, have also changed. Whereas 10 years ago web sites were largely static, for example, today we have live Web service environments and interactive shopping experiences. That, in turn, demands new levels of scalability, he said.

Companies are using data in new ways as well. Whereas traditionally most of our focus was on processing transactions – recording how much we sold, for instance, and storing that data in place where it could be analysed – today we are doing more.

Application state management is one example.

Say you are playing an online game. The technology must record each session you have with the system and connect them together to present a continuous experience, even if you switch devices or the various moves you make are processed by different servers, Olofson explained.

That data must be made persistent so that companies can analyse questions such as “why no one ever crosses the crystal room,” for example. In an online shopping context, a counterpart might be why more people aren’t buying a particular brand of shoe after they click on the colour choices.

“Before, we weren’t trying to solve those problems, or – if we were – we were trying to squeeze them into a box that didn’t quite fit,” Olofson said.

Hadoop is a heavyweight among today’s new contenders. Though it’s not a database per se, it’s grown to fill a key role for companies tackling big data. Essentially, Hadoop is a data-centric platform for running highly parallelised applications, and it’s very scalable.

Read More:


Back to Top ↑

TechCentral.ie