Data waves

Google’s Cloud Dataflow real-time cloud-based big data analytics

Pro
Source: Stockfresh

17 April 2015

Streaming is the future and it is needed now, was the message from a Google product manager in relation to its cloud services, but not for video, director of product management for the Google Cloud Platform was talking about analytics.

“We really believe that streaming is the way the world is going. Instead of looking at data from two months or two years ago, the data you really care about is happening right now,” said Tom Kershaw.

To this end, Google has launched a real-time data processing engine called Google Cloud Dataflow, first announced a year ago. It has also added new features to its BigQuery analysis tool, introduced in 2010. The two cloud services can be used together to facilitate the real-time processing of large amounts of data, Kershaw said.

Now available as a beta, Google Cloud Dataflow provides the ability to analyse data as it comes from a live stream of updates. Google takes care of all the hardware provisioning and software configuration, allowing users to ramp up the service without worrying about the underlying infrastructure. The service can also analyse data already stored on disk, in batch mode, allowing an organisation to mix historical and current analysis in the same workflow.

The service provides a way “for any Java or Python programmer to write applications using big data,” Kershaw said. “It makes it easy to run end-to-end jobs across very complex data sets.”

In addition to moving Cloud DataFlow into an open beta program, Google also updated its BigQuery service.

BigQuery provides a Structured Query Language (SQL) interface for large unstructured datasets. SQL is commonly used for traditional relational databases, so it is almost universally understood by database administrators. With this update, Google has improved the service so it can now ingest up to 100,000 rows per second per table.

The company has expanded the footprint of BigQuery so European customers can now use the service. BigQuery data can be stored in Google European data centres, which will help organisations that need to meet the European Union’s data sovereignty regulations.

The company has also added row-level permissions to BigQuery, which can limit the accessibility of information based on the user’s credentials. This allows organisations to protect portions of the data, such as names and addresses, while allowing wider access to other portions, such as anonymous purchase history, to be used for research or other purposes.

BigQuery and Dataflow can be used in conjunction with each other, Kershaw said. “The two are very much aligned. You can use Cloud Dataflow for processing and BigQuery to analyse,” he said.

 

IDG News Service

Read More:


Back to Top ↑

TechCentral.ie