Code Face

Decisions: Big Data and Analytics

Ease of access, breadth of presentation and citizen-driven, are just some of the drivers behind the current crop of tools in the market, finds PAUL HEARNS
Pro
Image: Stockfresh

13 December 2018

 

Business analytics is not a new discipline. For as long as there have been businesses, there have been people developing tools to examine how that business performs. With so many tools becoming available in recent years to handle ever larger and more complex sources of data, what are the developments that need to be tracked?

With the relative riches of data volume, the major trends seem to be towards greater usability and accessibility of tools, easier aggregation, combination and preparation of data, and more meaningful ways of presenting insights for maximum impact.

“At different scales, different users are looking for different things, but generally they are looking for tools that support self-service, rich and complex visualisation,” Shane Groeger, Accenture

Critical capabilities
A Gartner report form earlier this year entitled “Critical Capabilities for Analytics and Business Intelligence Platforms,” said “Analytics and Business Intelligence (ABI) platforms are evolving beyond data visualisation and dashboards to encompass augmented and advanced analytics. Data and analytics leaders should enable a broader set of users with new expanded capabilities to increase the business impact of their investments.”

The authors said that platforms still show substantial differences in functional capabilities, particularly in support for complexity of analysis and data models, as well as in scalability.

As the market matures, says the report, the capabilities offered to build and deliver basic, user-friendly analytic dashboards and interactive visualisations are becoming less differentiated.

The trend toward assisting users with augmented data discovery functionality continues, however, the report warns that no platform provides a complete capability as yet.

The functionality to support the more advanced analytic needs of the emerging citizen data scientist (CDS) group of users varies significantly by platform.

The report says that, when viewed across the whole span of capabilities, significant differences remain between competing platforms and, therefore, also between which are most appropriate for a given use case. In some cases, this is a reflection of strategies adopted by vendors to target particular use cases or differences in product maturity.

Broad landscape
There is a broad landscape of tooling in the analytics and big data space, according to Shane Groeger, applied intelligence delivery lead, Accenture in Ireland.

The typical end-to-end capability platform, said Groeger, is a visual analytics tool, sitting on top of a data management solution, whether a relational database management system (RDBMS), or other.

“We see a lot of data integration programmes,” he said, “where you are integrating more complex data (structured) from different systems and enriching that with the semi-structured and unstructured data.”

“Accessible tools and easy to use visualisation functions enable business analytics to be democratised, pushed down into the organisation to frontline workers, who often have the context to put the dynamic data to use in real time and observe meaningful developments that executives wouldn’t recognise,” Eric Luellen, BioInformatix

The most common data visualisation tools, in Groeger’s experience are, Tableau, QlikView and Power BI.

At different scales, different users are looking for different things, he said, but generally they are looking for tools that support self-service, rich and complex visualisation.

This can lead to a trade-off in the selection of the tool because certain tools may be easier to get started on, but less functionally rich than tools that are potentially harder to get started on, but that allow greater degrees of flexibility in terms of the capability they deliver.

Base set
According to Eric Luellen, a data science strategist, technology product manager, and co-founder of health informatics provider BioInformatix, ease of access, combined with as-a-service availability and dynamically linked data sources form a base set of requirements for any ABI platform.

Luellen emphasises that user-level ease of dynamic graphing, where no developers are required, is key to the growing democratisation trend. The Gartner report refers to this is a citizen data scientist, but recognises that the functionality to support this emerging group “varies significantly by platform”.

Gartner advises that vendors should “expand the scope of usage beyond dashboards and data visualisation by deploying advanced and augmented capabilities to a subset of self-service users to increase the depth and breadth of analyses performed.”

“These minimum functions,” said Luellen, “enable business analytics to be democratised, pushed down into the organisation to frontline workers, who often have the context to put the dynamic data to use in real time and observe meaningful developments that executives wouldn’t recognise.”

However, to allow the citizen data scientist to properly use the various tools, data access, preparation and enrichment also needs improvement.

“Sometimes analytics can be vast, messy and unpredictable, what is needed is a more structured exploration capability,” Andrew Cotgreave, Tableau

 

advertisement



 

Preparation needs
According to Andrew Cotgreave, technical evangelist, senior director, Tableau Software, data professionals can spend up to 80% of their time preparing data, so making those processes easier is a priority across the range.

He said augmenting platforms with automated tasks will “ease the tedium”. There is an increasing use of algorithms to clean and group data, and create data roles.

Cotgreave said that Tableau has done much work in opening up developer functions and options, to allow them to embed more dashboard functionality and other visualisation capabilities in more applications.

However, he added that what Tableau does differently is massive iteration and exploration.

“Many tools have detailed dashboards,” said Cotgreave, “but Tableau allows exploration of the data and facilitates greater ability to see where the data might lead.”

Sometimes analytics can be “vast, messy and unpredictable,” he said, but Tableau allows a more structured exploration.

Community support
Accenture’s Groeger made an important point about selecting tools that is often overlooked. As noted with the rise of the citizen data scientist, such people can benefit greatly from peer discussion and experience sharing.

Support communities are good, and the more active the better, said Groeger. There may be a wealth of wisdom in a community that can help get your own users up to speed quicker, he observed, and more informed as to how best to exploit the tools and the resources.

This was a point expanded by Cotgreave who said that as the pace of development of the tools increases, what needs to happen to allow ABI democratisation to keep up is a focus on data literacy.

Even the point of how to look at charts, he said, sounds simple and yet is often not taught. We need to move more towards seeing data exploration as a mass discipline, he argues. Data explorers can allow people to see more in the data. Natural language processing (NLP) will help, but there is a whole section of human skills necessary to present the intelligence discerned by data explorers to others.

“The companies who invest in that are the ones who will succeed,” said Cotgreave.

This is something that Luellen takes up too.

Counting and statistics
“I think the big things that people miss assessing analytics tools are: the distinction between counting and statistics; and the import of having user-ready tools that don’t require developers for every use or change.”

So many times, Luellen reports, clients or prospective clients send what they consider a “data set”, which is really just counted tallies or totals instead of a flat file of the underlying data.

“Or, they present the “analytics” they’ve been using, and they are just column or line graphs that count frequency. Of course, true statistics are taking descriptive factors such as mean, median, standard deviations, and variances and applying statistical testing tools to show associations at different levels, P-values, etc. This is where true analytics begins — to determine what factors are associated with each other and how strongly, and enables the ability to predict where a trend is headed and what is influencing it,” said Luellen.

The user versus developer issue is more nuanced, he argues. Often, technology departments promote technology options and choices that give them job security, necessitating their ongoing heavy involvement.

“This is a real disservice to the organisation because many tool types can now be found that enable users to work with the technology directly, which removes constraints of personnel, time, and budget from the technology evolution process. It also tends to save a fortune,” he advises.

While all the contributors acknowledged that much work is being done in various quarters with the likes of augmented and virtual reality (AR/VR) technologies for visualisation, there is no mass application or adoption as yet.

ML integration
However, Accenture’s Groeger said machine learning (ML) integration with the likes of R and Python will increase so that they can execute ML models and algorithms. This will go some way towards allowing the tools to take user feedback in real-time and present visualisations around those.

“That’s all not quite there yet. But it feels like a near term thing where they can close the gap,” he said.

Looking out into the near term for future development, Luellen sees more automation.

“Generally, my expectation is to see more automated data-science tools, things like DataRobot,” he said. “The working model that the vast majority of companies still use is trying to recruit and retain scarce data-science talent to analyse their spaces, build data sets, and machine-learning models one at a time, making informed guesses as to which models will be best to compare — all this takes many months. Conversely, automated data-science tools like DataRobot (or Salford Systems) enable data scientists to compare or run dozens to hundreds of models at once thereby maximising the efficacy of the models and speed to produce and evaluate them by many months. They often turn six months of work into six days.”

Adding to Groeger’s point, Luellen says the advent of automated data-science tools is key because data scientists who know both statistics and coding in Python or R are, as yet, extremely scarce and automated data-science tools negate much of the need for coding.

“Automated data science tools can be effectively used by most people with a strong background in statistics or biostatistics,” said Luellen. “There are more statistics-proficient professionals than there are statisticians who code in those two languages. Therefore, the automated data-science tools go a long way toward correcting the talent scarcity problem.”

 

 

Democratisation needs
“It should be possible to restrict user access to the information produced by the tool, and for the tool to support restrictions on user access to underlying data sources”

Singlepoint Hugh Nolan, lead solution architect

Data democratisation has become a critical enabler for business decision making. Today’s business users are for more tech savvy than they were even 10 years ago and more comfortable with using tools to look at underlying data to build their own insights.There is quite an array of tools available now to gather, combine and analyse data and convert it into usable information that can be presented and distributed throughout the enterprise. The tools to be used need to be evaluated on a business by business basis. An industry that is based on high volume manufacturing and distribution of machine parts has quite different analytic requirements from a business that specialises in high value, low volume contract management.

Whatever the focus, there are some key elements to look for in a data analysis tool: it should be able to consume and transform data from a wide variety of structured and unstructured data sources; it should provide a UI that allows analysed datasets to be built without the need to learn a coding language such as SQL (e.g. drag and drop); it should provide the ability to do advanced analytics using a coding language, if so required (i.e. not restricted to just drag and drop); it must support easy presentation and distribution of the information resulting from the analysis (i.e. charting, email and dashboard distribution).

The above meets the democratisation needs to enable the business. There are further features that should be looked at to protect the business. The obvious one is security. It should be possible to restrict user access to the information produced by the tool, and for the tool to support restrictions on user access to underlying data sources. The tool should support data curation features so that the sources and assumptions around the underlying data used to build the business insight, are documented and can be clearly communicated. The business user that creates a report may not be around to explain how she created it six months later. She’s too busy in her new role as CEO.

 

For the team
“Use data to support your decision-making. Don’t use data to make your decision”

Sogeti John McIntyre, head of Analytics and Cognitive Services

Big Data is going to change everything and help us to gain a greater understanding of the world around us. With the volumes of data we are collecting, we will be able to predict everything from illness, maintenance, customer needs and more.Well… maybe.

There are many companies that work with sports teams to analyse data generated by their top sports stars. They turn data into information which helps keeps professional athletes from getting injured. With their products, teams can monitor athletes to make sure they get the best performance possible. However, their data does not come from just one club or sport. Athlete injury predictions are based on data from all the clubs and sports they work with.

Let us say a professional soccer team has 50 professionals, how many injury data points could this one club generate? Big Data is only useful when you collect a variety of data. If we only collect data about things which happen a lot, we will only be able to predict these common events. If we want to predict other events, we need lots of variety in the data. If we want to predict injuries, we need lots of data on injuries, not data on perfectly fit athletes.

So, to take advantage of Big Data you need to collect more data variety:

  • If you want to predict if your staff will find another job, I would not just look at your own internal HR data. I would analyse LinkedIn data and find out how many similar resources change their role and why.
  • If you want to find the best marketing campaign for your product, don’t look at your own campaigns. Look at your competitors and buy Point of Sale data which shows how competitors sales changed.

Also:

  • Consider “as a service” products which give you indirect access to a variety of data.
  • If you are collecting and storing your own data, combine it with as many external datasets as you can.

Lastly, use data to support your decision-making. Don’t use data to make your decision.

 

 

Under the free comment aspect of Decisions, Servcentric’s Roe talks public cloud, and Asystec’s Morris describes the network assessment service

 

Public Cloud with a difference
“Our Enterprise Cloud is a simple to use self-service IaaS service with exceptional price/performance and unrivalled support” Brian Roe, Servecentric (Image: Servecentric)

Servecentric Brian Roe, commercial director

While the proliferation of cloud services has undoubtedly revolutionised the IT consumption market, adoption has, at times, presented new challenges to organisations, particularly in relation to support, performance, connectivity, complexity and, of course, cost. Many of us have been hit with dreaded end of month ‘bill shock’ as cloud costs often don’t meet expectations!Servecentric Enterprise Cloud is an enterprise class infrastructure as a service (IaaS) public cloud based in Dublin, Ireland. The service is designed to provide a full featured, self-service platform for users to run applications and services of any type in a scalable and secure environment.

Unlike some Public Cloud services, Servecentric is fully flexible and allows users to build unbundled servers with individually chosen CPU cycle, RAM and high-performance SSD or lower cost HDD storage values. In addition, the service provides hybrid opportunities with low cost, ultra-low latency 1Gb (or 10Gb optional) backbones in Servecentric’s Dublin Data Centre.

Servecentric’s simple to use service includes full featured migration tools and the ability to spin up templated servers in under 30 seconds and virtually any x86/x64 application can be run without any modification.

The service also boasts support services with average response time of 30 seconds, included as standard for all users. This is a key differentiator with support levels and costs being cited as a problem area with hyperscale providers.

Finally, for those with data privacy concerns, Servecentric Ltd. is a privately owned Irish company, and while operated locally, Servecentric Enterprise Cloud is part of a global CloudSigma network with 14 locations across Europe, North America, Asia and Australia.

Visit cloud.servecentric.com for more information and simple online price calculator or contact Servecentric directly (01) 448 1400 cloud@servecentric.com

 

 

Network assessments
“To prioritise security decisions, organisations need to understand where their vulnerabilities are, and concentrate on managing them”

Asystec Carrie Morris, VMware account manager

Security decisions – what to prioritise? When it comes to security decisions, organisations struggle to keep up with the demands being placed upon them. The rapid adoption of IoT networks and devices, and a highly mobile workforce, mean network vulnerabilities are increasing at an alarming rate.Understanding activity in the network is key to ensuring, when, not if, an attack happens, it can be locked down. Traditional approaches to data centre security are focused on building a strong perimeter to keep threats outside the network, however little is done to understand what is happening within it.

To prioritise security decisions, organisations need to understand where their vulnerabilities are, and concentrate on managing them.
A network assessment will analyse the traffic pattern in your data centre giving you visibility of security risks, pinpointing any issues, and allowing you to put in place a plan to address those issues and secure your organisation.

What will you get from a network assessment?

  • Insights on your security risk in your environment
  • A preview of actionable micro-segmentation recommendations
  • Opportunities to optimise your network
  • Comprehensive net flow assessment and analysis to model security groups and firewall rules
  • Recommendations to make micro-segmentation easier to deploy

How can Asystec help? The volume of attacks continues to increase daily and it is becoming increasingly difficult to stay on top of emerging risks. Having a deep insight into your Network Traffic is key to any security strategy.

Data management solutions company Asystec, together with key partners, can offer solutions that will give organisations the ability to manage enterprise risk and compliance issues whether in a physical or virtual environment across the data centre.

For further information on a network assessment or to book your free assessment, email Carrie.Morris@asystec.ie or call (085) 220 8970

 

 

Read More:


Back to Top ↑

TechCentral.ie