no image

Data science

Pro
Image: Cisco

1 February 2013

We have had a problem with top level terminology in computing, telecommunications, Internet and Web and all of the other applications of electronic technology since the later 20th century. The simple Information Technology (IT) seemed to embrace pretty well everything but the industry itself felt obliged to make it Information and Communications Technology (ICT), perhaps because the traditional IT side hoped to ensure that those electromechanical, analogue upstarts in telecommunications did not hijack the whole sector with their mobile gadgets. Or that the billions of new punters would realise that their shiny new smart phones actually are computers, if language means anything. (Itself a moot point in the ICT sector.)

In the Web world, many of the distinctions are now blurred if not meaningless. Once everything is digital, voice and video are fundamentally the same as accounts archives or Google content. This digital universe is not entirely seamless yet, but so long as whatever program or data you want to wield speaks IP or HTML or whatever is the requisite electronic lingua franca, the digital junctions should be invisible to mere humans.

Pro elements
It is in this broad context that we really need a term for the serious professional elements of ICT in a corporate context, either as users or as service suppliers. More and more of the systems and platforms have already disappeared into commodity or cloud. Servers and SANs are far from sacred and in fact storage itself is almost purely commodity-even the miracle of solid state-although the data management software has moved to the forefront. That is probably the most outstanding example of what is happening. Primacy has shifted firmly to the data/information and the applications/tasks, arguably where it should always have been even as we were bemused by gadgets and gizmos and go-faster stripes. Hardware systems performance has not diminished in importance, but it has been left behind in that specialist space that contains the services and their specifications and the commodity elements, whether cloud or chip.

 

advertisement



 

In corporate ICT we are seeing ‘data science’ become an increasingly useful and relevant term. Although it has been around for about three decades, it has now become popular and largely replaced ‘computer science’ in the industry and, tellingly, in the academic world. This is largely because of the rapid growth of analytics as the essential complement to massive data growth, and the consumerisation of ICT in every possible way. With all of today’s Big Data and potentially valuable information, plus the demands and expectations of all of those billions of users, ICT generally has belatedly been forced to put the emphasis firmly on applications and content or data.

Information value
Information of all kinds has a value into the indefinite future. This means that the management of data has assumed a primary role as processing and data storage technology have increasingly been commoditised or gone to the cloud. For the moment, data science is particularly associated with database and data management software development and analytics. Clearly, Big Data and the general growth in data volumes have raised challenges that demand specialist responses.

A data scientist today may be expected to have competence in ICT, especially all aspects of data management such as database design and data warehousing. Importantly, this will be combined with other relevant disciplines such as mathematics and statistics and their application in areas such as market research, actuarial and financial risk management. The upper end of applied data science is likely to be in advanced computing areas such as data modelling and visualisation, high performance computing and probability or uncertainty modelling. Since there is no hard and fast definition or even description of a ‘data scientist’ it will depend in practice on the organisation and the role. There is certainly no particular reason why an individual’s core competency has to be in ICT, any more than there is for the role of the CIO, other than that it is still the most likely career route. Mathematics, actuarial studies or market research or other disciplines, may well offer an equally valid skills foundation.

Those skill sets that we are now calling data science are in demand by all of the big software vendors and consultancy firms, especially those that are involved in analytics. Many in the ICT sector believe that the multi-skilled data scientist will be central to the in-house expertise required by any large organisation in the near future. It is a natural complement to the role of the CIO or perhaps even the future career path to the CIO role or whatever that mutates into over time.

Data science teams
When we talked to four senior experts about the growing importance of the data scientist, it was notable that all of them emphasised the importance of the individual and his or her real life experience. Data science is scientific in its approach and methodology but like all ‘soft’ disciplines it also involves the application of judgement and experience. "That’s why I think we are generally going to be talking about ‘the data science team’ rather than individuals," says Michael Connaughton, Oracle’s EMEA director of Big Data. "Big Data is the challenge and gaining value from it distils out to advanced business analytics. So mathematical, statistical science is at one end but we can’t actually do without business experience and expertise. In fact, in Oracle we certainly see Big Data as more of a cultural challenge than a technological one."

The market agrees that Big Data is ‘a good thing’ and it is generally accepted that we can exploit it to transform much of business, government and other sectors. "More things are becoming possible that were almost inconceivable before, in terms of both data and technology. Much of the hype is about gaining unique information and insights, but in fact it is equally likely that Big Data will prove to be a transformative source of greater operational efficiency in many areas," said Connaughton.

The efficiencies will come from the ability to automate decisions in all sorts of areas because the relevant information will be complete and immediately available for rules-based responses. That is already being proven in financial services, for example, where the combination of enormous data resources and enhanced risk models is capable of automating 10% to 20% or even more of lower level, routine approval decisions. "That sort of ability not only reduces costs but frees up resources for the higher level, more difficult decisions," he says.

Tool evolution
Yet as recently as about three years ago Big Data was recognised, but not utilised, because it was literally too hard for generally available technologies. "The advent of Hadoop, in-memory analytics and continuing speed and processing power, gains have all brought us to where we can plan to harvest the benefits of Big Data. And not just what we directly own-third party data sources and social networks can add rich information that we ignore at our peril."

This is where data science comes into play, Connaughton says. "With our renewed focus on analytics in making use of Big Data and applying it in business, government, healthcare and other sectors, data science seems in many ways a natural progression. Now that we are seeing university courses and degrees in data science, it has clearly become mainstream. In Oracle we are seeing the real life tasks and potential and the challenges. So I’m very much inclined to say that in real business situations the data scientist needs much more than the statistics and ICT knowledge. An effective practitioner needs to understand the business context at all times and definitely has to have the communications ability to explain to and persuade the strategic decision makers in the organisation."

Smart business
A similar view is taken by one of Accenture’s leading analytics experts, Paul Pierotti, who is its UK and Ireland lead for its public sector and health analytics. "Today we are seeing that the key value of smart business analytics is realised when the insights that are generated are integrated into business processes. That in turn is possible because the combination of cloud computing and advances in technology mean that most of the technical and resource constraints on applied business analytics have been overcome."

But even as the analytics tools become more sophisticated there is no magic formula, Pierotti believes. "We have more powerful capabilities, especially now in relation to Big Data. But all of the real life conditions still apply, like making a convincing business case for the investment. In designing and applying analytics, there is simply no substitute for human experience and judgement in each specific sector and organisation. These are very much still expert tools for expert users."

He points to fraud prevention and detection and general risk analysis in the financial services sector as an example of the proven success of the latest generation of analytics. "Dealing with attempted fraud in the financial world is notoriously like squeezing a balloon-tighten in one place and it bulges out in another. But smart analytics tools are enabling particular financial institutions to address and reduce their fraud risk management. It is in fact a significant sector where there is an identifiable return on investment from integrating analytics in the management systems."
Pierotti is another expert who is convinced that business and communications skills are as important to the data scientist as the proficiency in analytics and systems. "The capabilities are here and developing all the time, so organisations will find their own ways to utilise and manage them. The CIO is already speaking at board level, so that is fairly obviously where the potential contribution of smart data analytics can be communicated at the highest level. It is also, in practice, where judgements are made about the business value of applied data science-which these days will have business analytics at its core."

Overload
The general phenomenon everywhere is data overload, so extracting value and meaning from all of that data is what all organisations are aiming for. This is purely aspirational until structured and disciplined approaches are taken to get at the right data and analyse it in an appropriate way. "We used to call it ‘data processing’ and with today’s knowledge and experience that is still not a bad term," says Dr Pól MacAonghusa, senior researcher at IBM Research Ireland with a special interest currently in its Smarter Cities programme.

"In many respects the primary need is to re-introduce logical, credible and scientific processes to whatever we wish to do with all of that data, especially in looking to analyse it for meaningful insights. I sometimes think business and other organisations lose all critical capabilities when they encounter social media and rapid change in markets and fashions. Think of some of the basics. To decide on how useful or reliable data is you need to know and make a judgement on its provenance. That is the kind of territory that statisticians and researchers of all kinds know about. In the traditional organisation, DBAs have for years gone quietly about disciplining the data, organising and normalising it before it is fed into systems."

Imagination
These are the kinds of skills that feed into what we are now calling data science, Mac Aonghusa says. "In practice we are probably talking about a sort of mid-spectrum approach. On the one hand, we do not want to hamper a bit of imagination, even creativity by sticking too close to pedantic rigour. On the other hand, a keen critical faculty is essential to ensure that the data and the analysis make sense. Digital analysis is essentially just numbers, but numbers have different significance in different circumstances. A 70% chance of a fine day tomorrow is very different from a 70% chance of a hurricane. In that context, for example, hurricane history and analysis in the USA is only partly meteorological. The data is essential in refining and revising public disaster response plans."

People and leadership
Another advocate of the team approach to providing the data science skills set in an organisation is Andrew McLaren, EMC Head of Enterprise Information Management and Analytics in EMEA, which he happily translates as ‘Mr Big Data’ for EMC and its larger clients. He is equally happy to put the emphasis on people and leadership and top level strategic thinking rather than the technologies involved. "Big Data is just that, enormous and expensive to deal with. So every approach to looking to extract value from it has to be careful, strategic and directly related to the culture and needs of the organisation. The constant question has to be ‘Where is the value?’ It also belongs firmly at the C-suite level. This is not a technology discussion, nor is it simple or easily simplified.

"That is why, for example, we insist on conducting an education programme over three days or more with the senior executives in any client organisation we are about to engage with," says McLaren. "Big Data analytics offers huge opportunity-it is not too much to say quite literally benefits for mankind-and the potential return on well-directed investment is equally great. The whole area is also moving very fast, which is why we have the need for data scientists with combined skills rather than the technical specialists of the past."

McLaren is personally inclined to be agnostic about the technology involved in Big Data and analytics, he says, and is much more concerned with helping to develop a deeper understanding across all large organisations which are, of course, leading the investment and development. The now-famous ‘three Vs’ of the Big Data challenge-Volume, Variety and Velocity-should have two more Vs added, he thinks: Veracity and Value.
Judgement

"We need the discipline and careful judgment of data science just as much as the high performance technology-all the more so because the whole field is still at a very early stage. There is a lot of rushing to results and a fair few commercial pretenders in the analytics space. It is all too easy to impress for a while by leveraging inappropriate or even invalid analytics. In the end, there is a well proven lesson that we probably have to re-learn again and again: the quality and value of the result always depends on the quality of the data that was used."

Read More:


Back to Top ↑

TechCentral.ie