Ontologies & Business Intelligence

The best way to predict the future is to create it

Peter Drucker


Success stories  happen when words can morph actual circumstances into hoped for ones. For that to be achieved business intelligence has to span the ranges between enterprises and their business environment.

Looking for business opportunities (Bruce Chatwin)

With regard to technical ranges, that objective is made easier by digitized business environments and flows; as for semantic ranges, profiled ontologies (aka semantic layers) offer support at process as well as corporate levels:

For long these capabilities have been managed separately according to operational and strategic concerns; they must now be yoked together in order to deal with the generalization of digitized business flows and the ironing out of traditional markets segmentation. As a corollary, business intelligence must frame all pertaining information within enterprise architecture, and that’s what new technologies like knowledge graphs are meant to do.

Business Intelligence & Decision-making

The dual meaning of the concept of Intelligence, both cognitive ability and information content, is replicated for business intelligence (BI) which encompasses front and back office activities, the former for decision-making, the latter for knowledge management.

To begin with data analytics, the objective is to process the flows of raw data into meaningful information to be put to use as knowledge:

Making sense: the aim of data analytics is to refine the flows of raw data into meaningful information

As far as business intelligence is concerned, the focus is to be put on four basic tasks:

  • Data understanding: gives form and meaning to raw material.
  • Business understanding: charts business contexts and concerns in terms of objects and processes descriptions.
  • Modeling: integrates data and business understanding with information systems models (descriptive, predictive, or operational).
  • Evaluation: assesses and improves accuracy and effectiveness with regard to objectives and decision-making.

That frame is to support decision-making, operational, tactical, or strategic. A seamless integration is made feasible as well as required given the ubiquity of digitized businesses and the integration of enterprises with their environments. While respective objectives and time-frames of internal (information systems) and external (business intelligence)  models are to remain distinct, resources and schedules should be managed within a common decision-making framework.

As it happens, combining the well established OODA (Observation, Orientation, Decision, Action) loop with enterprise architecture, would meet decision-making requirements about the integration of business processes and supporting systems on one hand, and the continuity between tiers of decision-making on the other hand:

  1. Observation: understanding the nature, origin, and time-frame of changes in business environments (aka territories).
  2. Orientation: assessment of the reliability and shelf-life of pertaining information (aka maps) with regard to stakes and current positions and operations.
  3. Decision: weighting of options with regard to enterprise stakes and capabilities.
  4. Action: carrying out of decisions according to stakes and time-frames.
Mingling the loops of data analytics and decision-making

With regard to BI’s front-office, ontologies can be used to synchronize data analytics and decision-making loops; with regard to back-office they would serve as conceptual frames to knowledge management.

Business Intelligence: In The Loop

The aim of data analytics is to give structure and meaning to raw data. That is to be achieved through an iterative process mixing four basic operations:

  • Data understanding (a>b): identification of entities, facets, and features, setting apart generic syntax (blue) from domain specific semantics (green).
  • Business understanding (b>c): mapping targeted business entities, facets, and features to current business objects and processes supported by systems.
  • Modeling (c>d): integration with business models and information architecture.
  • Evaluation: profiled ontologies are used to consolidate external (business environment) and internal (information systems) models according to business operational and strategic objectives.
Basic cycle of ontology driven data mining: data understanding (a>b), parsing (b>c), business understanding (c>d), ontology design.

As for every iterative process, loops are to be defined with regard to invariants and exit conditions. Yet, compared to traditional data modeling by nature descriptive or prescriptive, data analytics is focused on predictive capabilities: the objective is not to specify systems but to explore the business relevancy of putative circumstances. Since changes in syntactic and semantic representations are governed by completely different rules, probing loops should be defined accordingly.

Concerning syntax, the objective is to factor out a backbone that could be processed independently of meanings:

  • Invariants: anchors representing categories of objects or behaviors with unambiguous identities and semantics.
  • Iterations: fleshing out anchors with features, facets or connections defined independently of meanings.
  • Exit condition: a stable syntactic backbone unaffected by semantic adjustments.
Data Analytics Loops: moving syntactic and semantic lines

Concerning semantics, the objective is to detail the meaning of terms in relation with domains:

  • Invariants: syntactic backbone
  • Iterations: adding semantic features, facets or connections to anchors in relation with semantic domains.
  • Exit condition: all elements are set with semantics and ready for BI evaluation .

The difficulty is to avoid circular changes and recursive traps; since there is no reason to assume that syntax and semantics can be set apart upfront, which would enable the respective loops to be carried out independently, iterations must be anchored to shared and stable elements. That’s what data understanding is meant to do.

Data understanding: Terms, Anchors, Syntax

The first step of data understanding is to find an initial set of anchors, i.e categories of objects or behaviors with stable identities and unambiguous semantics. Such anchors (#), usually selected from current business functions, will serve two purposes: on one hand they are to be used to map data to business contexts and concerns; on the other hand they are to be fleshed out with facets, features and associations:

Semantic categories for anchors (#) & basic syntactic constructs
  • Facets (aka aspects) are categories of objects or behaviors with clear semantics but dependent identities.
  • Features are properties (attributes or operations) whose realization and semantics is tied to anchors or facets.
  • Power-types (aka partitions) define classifications.
  • Associations may be represented by unspecified connectors or by specific constructs for inheritance, aggregates, compositions, or collections.

Applying a simple RDF (Resource Description Framework) graph, a preliminary objective would be to align the nature of semantics with the type of nodes; taking music marketing as example:

How to allocate generic constructs (bottom left) and domain specific semantics (bottom, middle and right) to RDF graph nodes and connectors.
  • Primary nodes (rectangles) would represent elements specific to domains, to be decorated with features (bottom right).
  • Connection nodes (circles and arrows) would be used for semantically neutral (aka syntactic) associations to be uniformly implemented across domains, e.g with predicate calculus (bottom left).

As already noted, overlaps are to be expected, with connectors supporting both syntactic (left, blue) and semantic (middle, mauve) associations. Assuming that a subset of “pure” syntactic connectors can be secured and combined with anchors, iterations would proceed with business understanding of domain semantics, either as nodes’ features or nodes’ connectors.

Business Understanding: Contexts, Concerns, Semantics

As figured above, and insofar as decision-making is concerned, intelligence is primarily a matter of observation and orientation, which implies adjusting perspectives according to business contexts and objectives; which is precisely what ontologies are meant to do. Along that reasoning a preliminary distinction should be set between reference domains set outside the enterprise (e.g regulations) and the ones set at enterprise level, the former for concepts and documents set independently of purpose, the latter for targeted items in domains of concern. Using the music example:

  • Contexts set independently of purpose : technical specifications of musical instruments, regulations for authors and musicians rights.
  • Contexts set by purposes: commercial and artistic activities.
Key domains for music marketing

That distinction can be refined and generalized to define a taxonomy of contexts based on governance and stability:

  • Social: No authority, volatile, continuous and informal changes.
  • Institutional: Regulatory authority, steady, changes subject to established procedures.
  • Professional: Agreed upon between parties, steady, changes subject to established procedures.
  • Corporate: Defined by enterprises, periodic, changes subject to internal decision-making.
  • Personal: Customary, defined by named individuals (e.g research paper).

Ontologies set along that taxonomy of contexts could also be refined as to be crossed with enterprise architecture layers: enterprise, systems, platforms, e.g:

Ontologies, capabilities (Who,What,How, Where, When), and architectures (enterprise, systems, platforms).

Crossing business taxonomies with enterprise architecture capabilities could significantly improve the integration of business processes and supporting systems providing that clear-cut semantics of engineering models could be neatly mapped to the miscellany of business intelligence items, namely: terms (“car”), concepts (vehicle), documents (car insurance ), actual instances (my car), role instances (my bus), actual categories (blue buses), types (Toyota Prius), etc.

That could be achieved by introducing a taxonomy of concerns that will take into account the epistemic nature of targeted items: terms, documents, engineered artifacts, and business objects and processes. That would outline four basic concerns that may or may not be combined:

  • Thesaurus: ontologies covering terms and concepts.
  • Document Management: ontologies covering documents with regard to topics.
  • Organization and Business: ontologies pertaining to enterprise organization and business objects and activities.
  • Engineering: ontologies pertaining to the symbolic representation (aka surrogates) of organizations, businesses, and systems.
Ontologies: Purposes & Targets

Contexts and concerns taxonomies could then be mapped into semantic layers and implemented as knowledge graphs.

Models & Ontologies: Semantic Layers

In contrast to information systems, the primary purpose of business intelligence models is predictive. As a result, discrepancies are to be expected between semantic categories used to probe business opportunities and the ones used to build supporting systems. Hence the need of setting apart modeling options reflecting business objectives. Taking music marketing as example:

  • Technical and regulatory domains are only for concepts (instruments and rights) and documents defined externally.
  • Musician, audience, and authors are identified as actual persons; audience behaviors are to be explored and explained (P.Clusters).
  • Beneficiary are only considered as roles, possibly without actual impersonation.
  • Instruments are instances of actual resources; rights are instances of symbolic accounts. Both are described by concepts whose features which could be used to assess and adapt artistic and commercial policies.
  • Sales are computed from events receipts (voucher, coupon, pass, etc.), recorded albums or titles, and licences from other media.
  • Titles, albums, and rights are managed symbolic representations (aka surrogates); titles features are to be explored and explained (M.Clusters).
  • Locations are instances of actual spaces.
BI models combine business objects identified by information systems, with given (grey) or managed (green) semantic layers.

Yet, in line with the SSP core assumption, identified entities in predictive models are to be anchored to their symbolic counterparts in descriptive models; ontologies can then be used to identify categories and flesh them out with features and connections. It must be reminded that it’s an iterative process, which means that connectors should be given some latitude: while generic ones (e.g collections or composition) can be explicitly defined, connectors bearing domain specific semantics constitute a key independent variable of predictive modeling.

Models & Ontologies: Epistemic Layers

Compared to context-based semantic layers, epistemic layers are set according to the nature of targeted items:

  • Concepts and documents, which are meant to stay as they are and therefore managed in thesauruses and content management systems (CMS).
  • Business objects and behaviors meant to be analyzed, which may or may not be already represented in EA models.
  • Business objects and behaviors already represented in EA models.
Epistemic layers

That distinction is a key success factor for integration because it sets apart modeling concerns: on one hand business intelligence, by nature predictive, open and versatile, on the other hand enterprise architecture, by nature descriptive, circumscribed, and stable.

Driven by big tech firms investments in natural language interfaces, ontologies are making a spectacular return to fashion under the names of semantic layers and knowledge graphs. The ubiquity and performances of these technologies rely on a clear distinction between representation and contents:

  • Logic and truth-preserving operator are used to represent and process knowledge independently of specific semantics.
  • Semantic layers are used to organize knowledge according to its epistemic nature (concepts, documents, artifacts, …), target (environment, enterprise, systems, platforms, …), and contexts (institutional, professional, corporate, social, …).

Such a principled knowledge architecture opens the door to the full panoply of statistical and machine learning tools.

Evaluation: Business Intelligence Distilled

Compared to descriptive modeling whose aim is to reduce complexity, expanding prospects is the bread and butter of predictive modeling and business intelligence. As demonstrated by semantic layers and knowledge graphs, ontologies radically change the way statistical and machine learning methods and tools can be combined.

Taking music marketing for example, descriptive modeling classifications assign individuals to categories depending on features; in contrast, predictive modeling takes nothing for granted as the objective is to pick or even design the categories and features that would best befit business contexts and objectives.

With features, correlations would be used to select attributes with best predictive (aka informative) outcome and use them to segment populations according to notional categories. From the music example, one would try to correlate personal attributes and pick the ones that could best explain music titles individuals in the data set bought.

Business Intelligence Distilled

Alternatively, statistical regression could be used to fit collected data into notional categories. For instance, one could try to assign individuals to marketing profiles based on their income and education.

Similarity could also be used to assess distances between music titles measured with regard to some business-defined criteria, and cluster individuals accordingly, e.g: distances between neighbors are estimated from musical characteristics, and then used to group titles depending on their suitability to be performed live.

As the relevance and reliability of statistical methods mostly depend on samples (for individuals), predictors (for categories), and estimators (for metrics), these schemes by themselves have two options: build models along prior hypothesis, or rely on random probes. Machine learning comes in the middle: on one hand mining samples, identifying predictors, and selecting estimators; on the other hand shaping emerging forms and causalities.

But juggling between explicit and implicit knowledge would not be possible without framing explanatory models into semantic layers, aka profiled ontologies.

Conclusion: EA Maps & Roadmaps

Once integrated into enterprise architectures, ontologies can be used  to map business intelligence according to contexts and concerns:

  • Base maps: spacial (left), or topological or logical (right).
  • Superimposed maps to support multidimensional exploration of base ones along specific dimensions: geography, customers, sales, products, income, institutions, climate, etc.
Ontologies can be superimposed like geographical maps

Then, set on the broader perspective of economic intelligence, such maps could serve as a basis for roadmaps and support holistic enterprise governance and planning.

Annex A: Ontological Kernel

An ontological kernel has been developed to illustrate the integration of information processing, from data mining to knowledge management and decision-making:

  • Data is first captured through aspects.
  • Categories are used to process data into information on one hand, design production systems on the other hand.
  • Concepts serve as bridges to knowledgeable information.

A beta version is available for comments on the Stanford/Protégé portal with the link: Caminao Ontological Kernel (CaKe).

Annex B: Syntax Summary

A core of formal constructs can defined for connectors with regard to:

  • Target: nodes (instance or types), and connectors (origin-2-destination).
  • Mutability for links (aggregation vs composition) or abstractions (functional vs structural)
  • Boolean operators.
Basic syntax constructs

Further Reading

External Links