Science

Science not size

Pro
(Source: Stockfresh)

6 January 2014

You always know when something has become thoroughly mainstream — there are university courses. With no disrespect to the many pioneering ICT activities of our third level colleges, the offering of Master’s level programmes brings Data Science firmly into the fold. The next step in orthodoxy, forecast to occur within five years according to one of our senior academic interviewees, is a primary degree in Data Science. It looks set to join Computer Science as a general degree subject, complemented by more specialist postgraduate degrees, diplomas and certificate courses.

On the other hand, there is considerable ambiguity about what ‘data science’ actually means or encompasses. In practical terms, however, the differing opinions and schools of thought are quite literally academic. ICT professionals, vendors, marketers and the general market understand perfectly well the general principle of giving primacy to the data rather than the systems. From galactic astronomy and customer relations to social media and entertainment, we now understand better than ever before that the information is what counts. It is intimately tied to the enabling technology, itself constantly evolving and mutating. But all our human use of the technology is aimed at information-related objectives.

A data scientist does not try to understand the phenomenon but to apply various statistical tools to understand the behaviour of the phenomenon. In the practical application of data science we no longer need a model for the underlying process. We observe its behaviour and with enough statistical information we can track and predict the behaviour of the phenomenon. Adding ever more data just refines our understanding of that behaviour, Prof Alan Smeaton, DCU

Terms of abuse
Big Data is a much abused term these days, if indeed it deserves to be called a ‘term’ at all. But again there is a general understanding of what it means and the challenges it presents, especially if the volume is recognised as the least significant of the Vs. Variety and velocity are clearly putting it up to our present technology, especially in analytics and the universal objective of making usable sense of the data. Once we move along to those analytical functions — and the realm of the data scientist — we have to add V for veracity. Big or tiny, in order to use data we always need to know how accurate and trustworthy it is whether for better decision making — the war cry of the IT vendors for decades — or analysis or projections or even definitive records.

This is unambiguously the domain of the data scientist and why it is at least as much a role and a corporate or personal capability as it is a ‘discipline’ in the accepted professional sense. In order to make practical use of the vast masses of data we are producing and harvesting, we need masters of information to make sense of it from origin to analysis to user-friendly reports. This is a much wider role than that of the business or systems analysts, although in practice there may well be some overlap in particular and especially large organisations.

Statistical science
“It all began in many respects with the sheer volume of raw data that began to be made available,” says Professor Alan Smeaton of DCU, one of the founding directors of the new Insight Centre for Data Analytics, a new multi-institution body that also includes UCD, UCC and NUIG. “There have been and are different approaches to understanding and making use of all of that data, with statistical science as our constant friend in that regard. But the traditional scientific approach is to understand phenomena by modelling them, from Copernicus to DNA. In practice that meant that we would try to fit data to the model and its principles and it worked most of the time.

“With today’s wealth of data people have begun to re-examine that approach. Because the sheer volume of data is too large to process and understand in the old way the phenomena that are taking place, perhaps our friend statistics could help because we have enough data to infer behaviour? That, in essence, is what a data scientist does. Scientists have for centuries tried to understand phenomena. A data scientist does not try to understand the phenomenon but to apply various statistical tools to understand the behaviour of the phenomenon. In the practical application of data science we no longer need a model for the underlying process. We observe its behaviour and with enough statistical information we can track and predict the behaviour of the phenomenon. Adding ever more data just refines our understanding of that behaviour.”

Read More:


Back to Top ↑

TechCentral.ie