TechCentral to your inbox


TECHPRO
Email Print
Rethinking messy BI

CIARAN KELLY advises on taking the initial steps toward next-generation data architecture


Software & Services | 12 Aug 2009 : 
As CIO, you know you have an information problem. You've spent countless euros and staff hours getting your data warehouse, financial systems, customer systems, and other transaction systems to generate meaningful reports. You've led Herculean efforts to regularise, transform, and load that data into consistent formats that business intelligence (BI), enterprise resource planning (ERP), analysis, reporting, dashboard, and content management tools can handle.

Yet company executives keep asking for more detailed information to make better decisions, especially about the emerging challenges in the ever-changing markets the company is trying to navigate.

The reason for this state of affairs is not that BI and related systems are bad, but that they were designed for only a small part of the information needs businesses have today. The data structures in typical enterprise tools, such as those from IBM Cognos, Informatica, Oracle, SAP, and SAP BusinessObjects, are very good for what they do. But they weren't intended to meet an increasingly common need: to reuse the data in combination with other internal and external information. Business users seek mashup capabilities because they derive insights from such explorations and analyses that internal, purpose-driven systems were never designed to achieve.

We call this "messy BI." People have always engaged in informal explorations- gleaning insights from spreadsheets, trade publications, and conversations with colleagues-but the rise of the Internet and local intranets has made information available from so many sources that the exploration now possible is of a new order of richness and complexity.

Call it the Google effect; people expect to be able to find rich stores of information to help test ideas, do what-if analyses, and get a sense of where their markets may be moving.

There's no way traditional information systems can handle all the sources, many of which are structured differently or not structured at all. And because the utility of any source changes over time, even if you could integrate all the data you thought were useful into your analytics systems, there would be many you didn't identify that users would want. You don't want to create a haystack just because someone might want a specific straw at some point.

More flexible

Fortunately, the emerging concept of Linked Data points to how CIOs can extend their information architecture to support the ever-shifting mass of information sources not tidily available in enterprise information systems.

The Linked Data approach can help CIOs provide what their business colleagues seek by bringing in a more flexible, agile information architecture that unlocks more value from their current information systems and extends its reach to the wealth of information beyond them. (See Figure 1)

Figure 1

Figure 1

Information systems are typically deployed on the premise that if you migrate enough data to them, you'll get better decisions-as if software systems could replace human insight. But the premise is false, treating everything as predictable or static, known or knowable, and therefore capable of being automated. The Linked Data concept understands that this is not the case; it focuses instead on helping people to identify relevant information and to analyze it better. Humans excel at this kind of relevance processing, so why not take better advantage of their ability?

But simply using Linked Data technologies is not the path to success either. Throwing tools based on the Resource Description Framework (RDF), Web Ontology

Language (OWL), and other evolving Semantic Web technologies at business users, or letting them adopt technologies helter-skelter on their own, will only create chaotic inconsistency, a manifold version of the spreadsheet problem with which many CIOs already struggle.

CIOs shouldn't aim to create a monolithic system to provide business staff the exploratory capabilities they seek. That would be an expensive, time-consuming investment for something whose value is difficult to quantify and whose best practices are not yet known. Instead, CIOs need to create what PwC calls an information mediation layer that lets business staff explore what-if scenarios, assess strategies and risks, and gain insight from the messy reality of the world inside and outside a company's four walls.

As outlined in the a previous paper, an information mediation layer orchestrates information from disparate sources for exploratory analysis rather than discovering an immutable "single source of truth" for archival and reporting purposes.

The CIO needs to create the framework for exploration, one that helps the analysis fit meaningfully with the enterprise's existing information sources and their often unstated assumptions-without limiting that exploration or imposing a closed worldview on it. The goal of this framework and its associated tools is to allow mapping and filtering on the fly, so you don't have to conduct expensive, time-consuming normalisation activities for one-off or low-volume analyses-assuming those were even possible.

Advantages of Linked Data

Unlike corporate data warehouses and other standard information systems, the Linked Data concept accepts that information has different structures depending on the purpose and context in which it was created. Linked Data tries to bridge those differences using semantics (the meaning of the information in context) and ontologies (the relationships among information sources).

Think of Linked Data as a type of database join that relies on contextual rules and pattern matching, not strict preset matches. As a user looks to mash up information from varied sources, Linked Data tools identify the semantics and ontologies to help the user fit the pieces together in the context of the exploration. The tools do not decide the connections, although the use of RDF and OWL tags can help automate the initial state for the user to review before applying human intelligence.

Many organisations already recognise the importance of standards for metadata. What many don't understand is that working to standardise metadata without ontology is like teaching children to read without a dictionary. Using ontologies to organise the semantic rationalisation of the data that flow between business partners is a process improvement over electronic data interchange (EDI) rationalisation because it focuses on concepts and metadata, not individual data elements, such as columns in a relational database management system.

The ontological approach also keeps the CIO's office from being dragged into business-unit technical details and squabbling about terms. And linking your ontology to a business partner's ontology exposes the context semantics that data definitions lack.

Applying the Linked Data approach complements architectural approaches such as service-oriented architecture (SOA), inline operational analytics, and event-driven

architectures that allow various functions to interact as needed to create a dynamic, flexible result that stays within the specified bounds. And it supports the inter enterprise process flows common in today's networks of value chains, whether a traditional supply-and delivery chain of retailing goods or an information-validation chain such as that of the pharmaceutical industry and its regulators.

Linked Data technologies, such as RDF, also have scalability and efficiency in their favor, says Jason Kolb, a technical lead at Cisco Systems who previously ran a BI company called Latigent. "By contrast, data warehousing's cost and inefficiency may be prohibitive at the large scale necessary in the near future," he says.

Figure 3

Figure 3

Two Linked Data paths

We recommend that CIOs begin to rethink their information strategy with the Linked Data approach in mind. We do not recommend you embark on a big-bang initiative; that's unrealistic for an emerging technology whose best practices have yet to be learned. But we do recommend you test some of the principles of this approach as part of your larger information and data efforts. Here are some specific suggestions for ways to do this.

Depending on your own strengths and priorities, we see two possible paths for you to take with Linked Data technologies such as RDF and OWL. The paths are not mutually exclusive; you could pursue both if resources and inclinations permit.

The first path would be to extend your current data warehouse and structured data stores to account for the missing dimension of ontology- and semantics- oriented metadata. This extension will provide the necessary context to your data, allowing uses beyond the strict purpose originally intended. This extension could be phased in over time and would unlock more value from data investments you've already made.

It would ensure a consistency at the core that does matter: You want a common language all the way through the stack-you want one way of describing your resources and their relationships throughout. The second path would be to empower your business users with exploration tools that they could use with existing internal data and with external data of their choosing. These tools would let them find the best business cases and make immediate use of the Linked Data technologies at a low cost to IT, since most of these tools are reasonably priced. Think of this as building and operating the "car"-your technology platforms and associated processes-that executes the business users' "driving." In essence, you would create the heads-up dashboard display that has contextual and configurable gauges for the people driving your business- unlike the fixed gauges of today's structured systems-and let them make their own assessments and explorations. In this approach, you let data become the applications, adding the power of action and insight to data.

Both approaches start from a common base: establishing a basic business ontology that expresses the relationships among the business's key processes and entities. The ontology provides the common framework by which the various data sources-internal and external-can be "joined" in the exploratory analysis, ensuring that they are mapped to and filtered against common concepts no matter where they originated. The same ontology development could be extended outside your walls through partnerships with others in your industry, as Chevron is beginning to do in the oil and gas business.

Linked Data strategy

Because of the emerging nature of the Linked Data approach forecasts will be crucial to an organisation's ability to deploy an information mediation layer, a CIO should approach the effort as directional and exploratory, not as a project to complete. The CIO is in the best position to evangelise this concept, having both the knowledge of the core information systems already in place and the relationships with business users to understand their information needs-and to connect those to the possibilities of the Linked Data approach.

The explorations previously described would provide valuable insight into where the Linked Data approach truly helps solve "messy BI" issues and what technologies work best for areas deemed valuable. Thus, the CIO can adjust course and priorities without fear of being seen to under-deliver, thanks to the explicitly exploratory nature of any Linked Data effort. Because Linked Data thinking is still evolving, the CIO should expect to bring in support for several areas, whether through consultancies, training, or staff members tasked to educate themselves. These areas include enterprise architecture models; RDF and OWL structures; taxonomies, semantics, and ontologies; scenario building for strategic thinking around enterprise domain subsets; and master data management (MDM).

The CIO must be prepared for the discovery that, despite their promise, the Linked Data technologies don't deliver as hoped. The exploration of this approach should - at a minimum - create a better understanding of the organisation's "messy BI" problem and how it can be lessened. The exploratory effort burnishes the CIO's reputation as a visionary and a strategic leader.

Information framework

Identifying the benefits of an approach to handle the "messy BI" gap is itself a significant first step. Organisations either don't know they have a problem, leaving them at risk, or they use inappropriate technologies to solve it, wasting time and money.

A CIO's middle name is "information," making the CIO the obvious person to lead the organisation's thinking about ontology, semantics, and metadata-the core values of information that make it more valuable to everyone than the typical structured data. The CIO should lead the enterprise's information thinking, because the technology systems IT created and manages exist to deal with information. Losing sight of that short-changes the business and relegates the CIO to little more than an infrastructure manager.

Therefore, the CIO should lead the development of the business ontology. The CIO should help key parts of the business-those with the highest business value-build their subsets. The CIO and the line-of-business managers will then have the key ontological domains in place that begin to create the metadata to apply both to new data and retroactively to existing data where it matters.

Once in place, they can lead to harmonised operating models within the 0rganisation. And that leads to agility and better decision making. (See Figure 3)

Figure 2

Figure 2

For example, thinking about the ontologies of supplier and customer can create a better context for taking advantage of transaction data in mashups that combine with messy data to explore everything from potential product alternatives to unmet customer demands. In his way, you still can use the database-structured information at the base of your information stack without having to transform it for those flexible explorations.

To successfully apply semantics and ontologies to your existing structured data, you need that data to be consistent. All too often, an enterprise's information management systems are inconsistent; they have differing data definitions (often handled through mapping at the information-movement stage) and, worse, different contexts or no context for those definitions (which results in different meanings). The classic cases are the definition of a customer and of a sale, but inconsistencies exist in all kinds of data elements.

Thus, it's crucial to focus on MDM approaches to rationalise the structured data and their context (metadata) in your existing information systems. The more inconsistent your internal systems are, the more difficult it will be to map that data semantically or ontologically to external sources. Thus, an MDM effort is imperative not only to reduce the cost and increase the effectiveness of your internal systems, but also to enable you to work with external sources using Linked Data approaches.

Thinking through your business ontology and semantics to create the right framework to support Linked Data explorations should help you think through your organisation's overall information architecture, identifying which information has a contextual source of authority, which has a temporal source, and which has a single, master source. Knowing these sources of authority helps establish where the framework needs to be rigid and where it does not, as well as in what way it should be rigid or not.

For example, a customer might be contextual-an internal customer, an original equipment manufacturer (OEM), or an individual consumer-and thus the ontology allows multiple mappings to this concept that the user can choose from for a current exploration. But a part number is allowed to be only one thing. Another key facet of ontologies and semantics is that they are not necessarily strictly hierarchical.

Relationships can occur at and across different levels, and some information is naturally arranged more in a tagged cloud than in a hierarchical form. This adds more flexibility to the Linked Data exploration for a real world analysis, but it can be difficult for IT staff to think beyond hierarchical data arrangements. Take care not to force everything into a hierarchy.

User explorations

The benefit of information mediation is most immediate at the business unit level, where business analysts and other strategic thinkers are primed to explore ideas.

"With properly linked data, people can piece together the puzzle themselves," notes Uche Ogbuji, a partner at Zepheira, which provides semantics-oriented analysis tools and training. In the past, he adds, assembling the puzzle pieces was viewed as an engineering challenge, leading to a resource-intensive effort to structure data for traditional analysis techniques.

Providing a baseline ontology for the business unit and helping its analysts "join" that ontology with those available outside the company through beta tools will let you test the value of this approach, test your ontology in real-world contexts, and create buy-in from key business users to drive further investment. Plus, ontology development, as the BBC Earth's Tom Scott says, is a "contact sport"-it is best when there's lots of feedback, experimentation, and information exchange, so IT should not do it alone.

The CIO needs to loosen controls over the information and tools used in this exploration. You're not building a system but testing an approach. Yes, you must retain control over your organisation's core information, which is typically the data used in transactions and archive systems. But the non-destructive nature of the Linked Data approach means you can expose that data more freely. In other words, users should be able to explore and transform data on the fly (since they're not changing the source data and its metadata-just the copy, at most-in their virtual explorations).

A good rule of thumb is that semantic reach and control reach both telescope inversely to distance: The further information sources are from your core data, the less precise they will be and the more freedom users should have to manipulate their meaning to impose a precise context for their particular exploration.

Strategic application

The beauty of the Linked Data approach's intelligent information linking is that data normalisation is a transient state, like a database join, that leaves the original data untouched-eliminating the huge data-rationalisation effort that usually destroys metadata along the way. This fact also makes it easier to bring external data into an analysis-from the Web, information brokers, and your value networks.

The key for an analysis is to map the metadata from the various sources, something that having a core ontology simplifies and that semantic tools help deliver for each exploration. Think of this Linked Data mapping as an information mashup. This linking is generally about providing context for the information being explored, and it's the context that provides the specificity that makes the analysis useful.

Large, heterogeneous data sets can seem impossible to structure. However, creating domain-specific ontologies is feasible, and if they are shared, you can follow them to other domains and reuse what's been created in those domains. Plus, the Linked Data approach means that "collaboration between departments can happen [because] full agreement doesn't have to be enforced.

Semantic technology makes it possible to agree to disagree, and the schemas can reflect the degree of agreement and disagreement," Ogbuji says. These attributes show the powerful advantage of the Linked Data approach.

As CIO, you would be foolish to not put these approaches on your agenda. These approaches can help your organisation perform better in ways that will improve the business-improvements based on what you ultimately are supposed to lead: the strategic application of information. Placing information mediation through Linked Data on your agenda puts you squarely in the strategic role your company increasingly expects you to play, rather than focusing on bits and bytes best left to hands-on technology-operations subordinates.


Enjoyed this article? Share it with others!

Digg Delicious Reddit Stumble facebook Newsvine Linked In



Please make a comment


What do you think?

Name:   Email:
Your email address will not be published
Comment:
(Max. 1000 chars.)


(Note: If you cannot read the numbers in the above image, click here to generate a new one.)