Redmond O'Leary, InterSystems

Manage your data more effectively or your competitors may bury you under it

Effective data management is essential to getting the most out of AI, says Redmond O'Leary
Redmond O'Leary, InterSystems

20 January 2022

In association with Intersystems

Even the most successful businesses can be overwhelmed by the task of organising torrents of data that flow into its systems from many different sources.

The need to organise, clean and manage all data with maximum effectiveness has become essential. Without effective data management, organisations will never empower their teams to use AI, machine learning or visualisation tools that open the door to greater efficiency, rapid innovation and new revenues. Yet as global technology consultancy Gartner said in a paper earlier this year, the path to becoming data-driven is complicated by the increasing diversity of data and its distributed nature.




Managing the ever-accumulating mass of information is difficult, even when dealing with relatively straightforward transactional data. But management and preparation become highly problematic when data is in many formats, including unstructured streaming video, still images, social media posts, monitoring logs, meeting notes or documents prepared by many hands. As organisations struggle with management, data fields are easily confused, undermining the trust in the data that end-users within the business need. An organisation that is a heavy user of images, for example, will run into the serious problem of mislabelled photographic information, causing significant hurdles to efficiency, customer experience and innovation.

The damage inflicted by poor management of disparate data

The consequences of bad data management can be far-reaching. Artificial intelligence-driven security applications that depend on high quality data to identify threats may exhibit unacceptable biases and become totally unusable without major adjustment, for example. Organisations in highly regulated environments could find they fail to meet reporting requirements, risking fines and penalties.

Sorting out a confused mass of data costs businesses dearly, undermines efficiency and hands an advantage to better-placed competitors. Organisations that can manage and prepare their data easily, quickly and use it in near real time will be able to unlock the true value it holds, perhaps identifying a goldmine or saving resources by reducing the storage of useless duplicates.

The data lake and other solutions need help now

To resolve these problems in the recent past, organisations employed data lakes. After pouring all the data they have into a lake, they hoped they could subsequently find the tools to handle the ‘seven Vs’ of Big Data: volume, variety, velocity, veracity, value, variability and visualisation.

Data lakes do go some of the way towards this, and have the advantage of being relatively low-cost, but they are not the best option when organisations want insights from their live or operational data in real, or near real time. Businesses with lakes can end up storing data for its own sake, with little insight into its quality. Although there are analytics and business intelligence tools that will give a picture of an organisation’s data health, well-paid and highly qualified data scientists still need to put the information into order and clean it up, which is not the greatest use of their time.

An alternative or successor to the data lake is the ‘data warehouse’. Businesses use warehouses to access large volumes of all types of data, such as customer transactions, product line volumes, and so on. The warehouse is a structured version of the lake and while it offers better access to usable data across an entire organisation, it is expensive to maintain and also requires skills to import, structure, maintain and back up data effectively.

The ‘data mart’ is another alternative. As a subset of the warehouse, it works well for a department such as HR or payroll, being faster, more secure and lower cost. It is also easier to maintain than a warehouse. Yet its obvious drawback is that it primarily serves one function or division of a business and does not support multiple enterprise-wide uses.

Cloud-migration adds massively to the complications

These problems managing and operationalising data are increasingly complicated even for enterprises that have set on digital transformation and have migrated data and applications to the cloud.

Even the cloud has legacy systems that make data workloads difficult and expensive to move, inhibiting innovation and increasing cost. The cloud itself is now becoming a silo, gaining in complexity as it hosts more information and more applications. Data may be secure in the cloud, but it still must be organised and managed and its quality checked and improved before an enterprise can use it. This is rendered more complex as organisations adopt hybrid on-premises/cloud infrastructures and by the cloud vendors’ own system requirements. Cloud vendors may change or update systems according to their own schedules, affecting an enterprise more widely than data management alone.

Businesses therefore need a different approach so they can organise, manage and cleanse all their data to render it fully operational, regardless of source or location. But given the massive volumes of data the modern enterprise accumulates, how can it achieve this without massive architectural upheaval or the implementation of numerous, costly and time-consuming new data management applications?

Simplification on a unified platform is essential

The answer lies in simplification and the concept of the smart data fabric. Rather than undertaking the huge task of consolidating all data, organisations can deploy a data platform which is the essential element of the fabric. Cloud-ready platforms can simplify, standardise and streamline from almost all sources transforming data lakes, warehouses and marts so that a business can embed advanced data capabilities for use by its front-line teams.

A global investment bank, for example, had a petabyte-sized, Hadoop-based data lake which was unable to support real-time streaming data, generating many difficulties in performance, scalability and response times. There was no support for advanced analytics or ad hoc data exploration. After deploying a data management platform, the bank had a fast data layer that gave it all the capabilities it lacked, processing incoming transactional data at high speed and supporting multiple analytic requirements across portfolio analysis, risk and compliance. In its order management system it achieved 75% reduction in operational costs as a result of this implementation.

Data lake acceleration on this scale is highly typical of the advances now available, providing smooth-flowing integration of disparate enterprise data and management through single APIs, regardless of where data is located. By putting in place a data platform, an enterprise can complement its lake with a semantic layer, conferring the ability to combine and analyse data. It can join terabytes of live data with deep historical information as required.

This approach will be equally effective across data warehouses, data marts, relational and column store databases, where the platform will clean, organise and harmonise data, providing the necessary quality and consistency.

New capabilities that the business now demands

Regardless of underlying infrastructure, an in-built analytics layer answers the need for natural language processing capabilities, business intelligence applications and machine learning. Businesses deploy it to make insights available to end-users and employ automation to make the lives of their data scientists easier. The most advanced business tools and real-time applications become available on demand, transforming productivity and decision-making.

Given the scale of the problems generated by ever-expanding data volumes, complexity and a shortage of in-house data science expertise, organisations need to thoroughly re-examine how they address these challenges. If businesses insert data platforms into their infrastructure as the basis of a smart data fabric, they can remove the major barriers to being a data-driven organisation. They will become more agile and efficient, turbo-charging innovation and profitability.

Redmond O’Leary is sales manager for Ireland at InterSystems

Comments are closed.

Back to Top ↑