Focus: Data vs Information


Distinctions must serve a purpose and be assessed accordingly. On that account, what would be the point of setting apart data and information, and on what basis could that be done.

From Data Stripes to Information Structure (Victor Vasarely)

Until recently the two terms seem to have been used indifferently; until, that is, the digital revolution. But the generalization of digital surroundings and the tumbling down of traditional barriers surrounding enterprises have upturned the playground as well as the rules of the game.

Previously, with data analytics, information modeling, and knowledge management mostly carried out as separate threads, there wasn’t much concerns about semantic overlaps; no more. Lest they fall behind, enterprises have to combine observation (data), reasoning (information), and judgment (knowledge) as a continuous process. But such integration implies in return more transparency and traceability with regard to resources (e.g external or internal) and objectives (e.g operational or strategic); that’s when a distinction between data and information becomes necessary.

Economics: Resources vs Assets

Understood as a whole or separately, there is little doubt that data and information have become a key success factor, calling for more selective and effective management schemes.

Being immersed in digital environments, enterprises first depend on accurate, reliable, and timely observations of their business surroundings. But in the new digital world the flows of data are so massive and so transient that mining meaningful and reliable pieces is by itself a decisive success factor. Next, assuming data flows duly processed, part of the outcome has to be consolidated into models, to be managed on a persistent basis (e.g customer records or banking transactions), the rest being put on temporary shelves for customary uses, or immediately thrown away (e.g personal data subject to privacy regulations). Such a transition constitutes a pivotal inflexion point for systems architectures and governance as it sorts out data resources with limited lifespan from information assets with strategic relevance. Not to mention the sensibility of regulatory compliance to data management.

Processes: Operations vs Intelligence

Making sense of data is pointless without putting the resulting information to use, which in digital environments implies a tight integration of data and information processing. Yet, as already noted, tighter integration of processes calls for greater traceability and transparency, in particular with regard to the origin and scope: external (enterprise business and organization) or internal (systems). The purposes of data and information processing can be squared accordingly:

  • The top left corner is where business models and strategies are meant to be defined.
  • The top right corner corresponds to traditional data or information models derived from business objectives, organization, and requirement analysis.
  • The bottom line correspond to analytic models for business (left) and operations (right).

Squaring the purposes of Data & Information Processing

That view illustrates the shift of paradigm induced by the digital transformation. Prior, most mappings would be set along straight lines:

  • Horizontally (same nature), e.g requirement analysis (a) or configuration management (b). With source and destination at the same level, the terms employed (data or information) have no practical consequence.
  • Vertically (same scope), e.g systems logical to physical models (c) or business intelligence (d). With source and destination set in the same semantic context the distinction (data or information) can be ignored.

The digital transformation makes room for diagonal transitions set across heterogeneous targets, e.g mapping data analytics with conceptual or logical models (e).

That double mix of levels and scopes constitutes the nexus of decision-making processes; their transparency is contingent on a conceptual distinction between data and information.

At operational level the benefits of the distinction are best expressed through what is commonly known as the OODA (Observation, Orientation, Decision, Action) loop:

  • Data is used to align operations (systems) with observations (territories).
  • Information is used to align categories (maps) with objectives.

Roles of Data (red) & Information (blue) in integrated decision-making processes

Then, the conceptual distinction between data and information is instrumental for the integration of operational and strategic decision-making processes:

  • Data analytics feeding business intelligence
  • Information modeling supporting operational assessment.

Not by chance, these distinctions can be aligned with architecture layers.

Architectures: Instances vs Categories

Blending data with information overlooks a difference of nature, the former being associated with actual instances (external observation or systems operations), the latter with symbolic descriptions (categories or types). That intrinsic difference can be aligned with architecture layers (resources are consumed, assets are managed), and decision-making processes (operations deal with instances, strategies with categories).

With regard to architectures, the relationship between instances (data) and categories (information) can be neatly aligned with capability layers, as represented by the Pagoda blueprint:

  • The platform layer deals with data reflecting observations (external facts) and actions (system operations).
  • The functional layer deals with information, i.e the symbolic representation of business and organization categories.
  • The business and organization layer defines the business and organization categories.

It must also be noted that setting apart what pertains to individual data independently of the informations managed by systems clearly props up
compliance with privacy regulations.

Architectures & Decision-making

With regard to decision-making processes, business intelligence uses the distinction to integrate levels, from operations to strategic planning, the former dealing with observations and operations (data), the latter with concepts and categories (information and knowledge).

Representations: Knowledge Architecture

As noted above, the distinction between data and information is a necessary counterpart of the integration of operational and intelligence processes; that implies in return to bring data, information, and knowledge under a common conceptual roof, respectively as resources, assets, and service:

  1. Resources: data is captured through continuous and heterogeneous flows from a wide range of sources.
  2. Assets: information is built by adding identity, structure, and semantics to data.
  3. Services: knowledge is information put to use through decision-making.

Ontologies, which are meant to encompass all and every kind of knowledge, are ideally suited for the management of whatever pertains to enterprise architecture, thesaurus, models, heuristics, etc.


That approach has been tested with the Caminao ontological kernel using OWL2; a beta version is available for comments on the Stanford/Protégé portal with the link: Caminao Ontological Kernel (CaKe_).

Conclusion: From Metadata to Machine Learning

The significance of the distinction between data and information shows up at the two ends of the spectrum:

On one hand, it straightens the meaning of metadata, to be understood as attributes of observations independently of semantics, a dimension that plays a critical role in machine learning.

On the other hand, enshrining the distinction between what can be known of individuals facts or phenomena and what can be abstracted into categories is to enable an open and dynamic knowledge management, also a critical requisite for machine learning.


External Links

About the ‘S’ in MBSE


As demonstrated by a simple Google search, the MBSE acronym seems to be widely and consistently understood. Yet, the consensus about ‘M’ standing for models comes with different meanings for ‘S’ standing either for software or different kinds of systems.

Tools At Hand (Annette Messager)

In practice, the scope of model-based engineering has been mostly limited to design-to-code (‘S’ for software) and manufacturing (‘S’ for physical systems); leaving the engineering of symbolic systems like organizations largely overlooked.

Models, Software, & Systems

Models are symbolic representations of actual (descriptive models) or contrived (prescriptive models) domains. Applied to systems engineering, models are meant to serve specific purposes: requirements analysis, simulation, software design, etc. With software as the end-product of system engineering, design models can be seen as a special case of models characterized by target (computer code) and language (executable instructions). Hence the letter ‘S’ in the MBSE acronym, which can stand for ‘system’ as well as ‘software’,

As far as practicalities are considered, the latter is the usual understanding, specifically for the use of design models to generate code, either for software applications, or as part of devices combining software and hardware.

When enterprise systems are taken into consideration, such a limited perspective comes with consequences:

  • It puts the focus on domain specific implementations, ignoring the benefits for enterprise architecture.
  • It perpetuates procedural processes built from predefined activities instead of declarative ones governed by the status of artefacts.
  • It gives up on the conceptual debt between models of business and organization on one side, models of systems on the other side.

These stand in the path of the necessary integration of enterprises architectures immersed into digital environments.

Organizations as Symbolic Systems

As social entities enterprises are set in symbolic realms: organizational, legal, and monetary. Now, due the digital transformation, even their operations are taking a symbolic turn. So, assuming models could be reinstated as abstractions at enterprise level, MBSE would become the option of choice, providing a holistic view across organizations and systems (conceptual and logical models) while encapsulating projects and applications (design models).

MBSE provides a holistic view of organisations and systems.

That distinction between symbolic and actual alignments, the former with conceptual and logical models set between organization and systems, the latter with design models set between projects and applications, is the cornerstone of enterprise architecture. Hence the benefits of implementing it through model based system engineering.

Leveraging MBSE

While MBSE frameworks supporting the final cycle of engineering (from design downstream) come with a proven track record, there is nothing equivalent upstream linking business and organization to systems, except for engineering silos using domain specific languages. Redefined in terms of enterprise architecture abstractions, MBSE could bring leveraged benefits all along the development process independently of activity, skills, organization or methods, for enterprises as well as services and solutions providers.

As a modeling framework, it would enhance the traceability and transparency for products (quality) as well as processes (delays and budgets) along and across supply chains.

‘S’ For Service

Implemented as a service, MBSE could compound the benefits of cloud-based environments (accessibility, convenience, security, etc.), and could also be customized without undermining interoperability.

To that end, MBSE as a service could be reframed in terms of:

Customers (projects): services should address cross-organizational and architecture concerns, from business intelligence to code optimization, and from portfolio management to components release.

Policy (processes): services should support full neutrality with regard to organizations and methods, which implies that tasks and work units should be defined only with regard to the status of artifacts.

Messages (artefacts): the specification of artefacts must be strictly aligned with enterprise architecture layers:

Contracts (work units and outcomes): services are to support the definition of work units and the assessment of outcomes:

  • Work units are to be defined bottom-up from artefacts.
  • Outcomes are to be assessed with regard to work units
  • Value in Models Transformations:
  • Transparency and Traceability: Two distinct model sets – Architecture Models and Implementation Models.

Endpoints (collaboration): if services are to be neutral with regard to the way they are provided, the collaboration between the wide range of is to be managed accordingly; that can only be achieved through a collaboration framework built on layered and profiled ontologies.

As a concluding remark, cross-breeding MBSE with Software as a Service (SaaS) could help to integrate systems and knowledge architectures, paving the way to a comprehensive deployment of machine learning technologies.


Squared Outline: Models As Currency

As every artifact, models can be defined by nature and function. With regard to nature, models are symbolic representations, descriptive (categories of actual instances) or prescriptive (blueprints of artifacts). With regard to function, models can be likened to currency, as they serve as means of exchange, instruments of measure, or repository.

Along that understanding, models can be neatly characterized by their intent:

  1. No use of models, direct exchange (barter) can be achieved between business analysts and software engineers.
  2. Models are needed as medium supporting exchange between organizational units with different business or technical concerns.
  3. Models are used to assess contents with regard to size, complexity, quality, …
  4. Models are kept and maintained for subsequent use or reuse.

Depending on organizations, providers and customers could then be identified, as well as modeling languages.


Squared Outline: Metrics

As it’s the case of every measurement, software engineering metrics must be defined by clear targets and purposes, and using them shouldn’t affect neither of them.

On that account, a clear distinction should be maintain between business value (set independently of supporting systems), the size and complexity of functionalities, and the work effort needed for their development. As far as systems are concerned, the Function Points approach can be defined with regard to the nature of requirements (business or system), and their scope (primary for artifact, adjustment for architecture):

  1. Measures of business requirements are based on intrinsic domain complexity (domains function points, or DFP), adjusted for activities (adjustment function point, or AFP); they are set at artifact level independently of operational constraints or supporting systems.
  2. Business requirements metrics are added up and adjusted for operational constraints.
  3. Functional requirements measures target the subset of business requirements meant to be supported by systems. As such they are best defined at use case level (use case function points (UCFP).
  4. Metrics for quality of service may be specific to functionalities or contingent on architectures and operational constraints.

Whatever the difficulties of implementation, function points remain the only principled approach to software and systems assessment, and consequently to reliable engineering costs/benefits analysis and planning.


Squared Outline: Agile

The Agile development model should not be seen as a panacea or identified with specific methodologies. Instead it should be understood as a default option to be applied whenever phased solutions can be factored out.

Agile (a,b) versus phased (d,b,c,) development processes
  1. Scope: Of the twelve agile principles, ten apply to any kind of development, and only two are specific, namely shared ownership and continuous delivery .
  2. Characteristics: Assuming conditions are met, agile software engineering can be fully and neatly defined by a combination of users stories and iterative development.
  3. Alternative: When conditions cannot be met, i.e when phased solutions are required, model-based system engineering frameworks should be used to integrate business-driven projects (agile) with architecture oriented ones (phased).
  4. Variants and extensions: Even when conditions about shared ownership and continuous delivery are met, scaling issues may have to be taken into account; in that case they should be sorted out between broader business objectives on one hand, systems architecture engineering on the other hand

These guidelines are not meant to define how agile projects are to be carried out, only to determine their scope and relevance along other systems engineering processes.

Further Reading

Squared Outline: Enterprise Architecture

Whatever their nature, architectures can be defined as structured collections of assets and mechanisms shared by a set of active entities with common purposes: houses for dwelling, factories for manufacturing processes, office buildings for administrative ones, human beings for living, etc.

Layers of Problems & Solution

Along that reasoning enterprises architectures should be defined in terms of one distinction and three layers:

  1. A distinction between specific and changing business contexts and opportunities on one hand, shared and stable capabilities on the other hand (represented with the Zachman’s framework above).
  2. The enterprise layer deals with the representation of business environment and objectives (aka business model), organization and processes.
  3. The system layer deals with the functionalities of supporting systems independently of platforms.
  4. The platform layer deals with actual systems implementations.

It must be noted that while the layered perspective is widely agreed (names may differ), taxonomies often overlap.

Further Reading

Redeeming Conceptual Debts


To take advantage of their immersion into digital environments enterprises have to differentiate between data (environment’s facts), information (systems’ representations), and knowledge (enterprise behavior).

Outside / Insight (Anna Hulacova)

That cannot be achieved without ironing out the semantic discrepancies between corresponding representations.

Symbolic Representations

Along with the Symbolic System modeling paradigm, the aim of computer systems is to manage the symbolic representations of business objects and processes pertaining to enterprises contexts and concerns. That view can be summarized in terms of maps and territories:

Maps and territories of systems and their environment

Behind the various labels and modus operandi, maps can be defined on three basic layers:

  • Conceptual models, targeting enterprises organization and business independently of supporting systems.
  • Logical models, targeting the symbolic objects managed by supporting systems as surrogates of business objects and activities.
  • Physical models, targeting the actual implementation of symbolic surrogates as binary objects.

Pagoda Architecture Blueprint

These maps can be aligned with commonly agreed enterprise architecture layers, respectively for organizations and processes, systems functionalities, and platforms, with a fourth added for analytical models of business environments.

Conceptual Debt

Ideally, that alignment should pave the way to the integration of systems and knowledge architectures, as represented by the Pagoda blueprint:

Insofar as systems engineering is concerned, that would require two kinds of transformations: from conceptual to logical models (aka analysis), and from logical to physical models (aka design).

While the latter is just a matter of expertise (thank to the GoF), that’s not the case for the former which has to deal with a semantic gap between descriptions of specific and changing business domains and organizations on one side, generic and stable systems architectures on the other side.

As a result, what can be termed a conceptual debt has accumulated with the the number of logical models supporting physical ones without the backing of relevant ones for business or organization. The objective is therefore to bring all models into a broader knowledge architecture.

Models & Ontologies

As introduced by Greek philosophers, ontologies are systematic accounts of whatever is known about a domain of concern. From that point, three basic observations can be made:

  1. Ontologies are made of categories of things, beings, or phenomena; as such they may range from lexicon or simple catalogs to philosophical doctrines.
  2. Ontologies are driven by cognitive (i.e non empirical) purposes, namely the validity and consistency of symbolic representations.
  3. Ontologies are meant to be directed at specific domains of concerns, whatever their epistemic nature: engineering, business, politics, religions, mythologies, astrology, etc.

With regard to models, only the second observation puts ontologies apart: compared to models, ontologies are about understanding and are not necessarily driven by empirical purposes.

On that account ontologies appear as an option of choice for the integration of symbolic representations:

  • Data: instances identified at territory level, associated with terms or labels; they are mapped to business intelligence (environments) and operational (systems) models.
  • Information: categories associated with sets of instances; categories can be used for requirements analysis or software design.
  • Knowledge: ideas or concepts connect changing and overlapping sets of terms and categories; documents can be associated to any kind of item.

With models consistently mapped to ontologies, the conceptual debt could be restructured in the broader context of enterprise knowledge architecture.

Ontologies & Knowledge

As expounded by Davis, Shrobe, and Szolovits in their pivotal article, knowledge is made of five constituents:

  1. Surrogates, used as symbolic counterparts of actual objects and phenomena.
  2. Ontological commitments defining the categories of things that may exist in the domain under consideration.
  3. Fragmentary theory of intelligent reasoning defining what things can do or can be done with.
  4. Medium making knowledge understandable by computers.
  5. Medium making knowledge understandable by humans.

Points 1 and 5 are not concerned by the conceptual gap, the former being dealt with through the anchoring of identified individuals to surrogates (see below), and the latter being with human interfaces. That leaves points 2-4 as the conceptual hub where information models have to be integrated into knowledge architecture.

Assuming RDF (Resource Description Framework) graphs are used for knowledge representation (point 4), and taking a restaurant for example, the contents of information models (point 2) will be denoted by:

  • Primary nodes (rectangles), for elements specific to cooking and customers relationship management, to be decorated with features (bottom right).
  • Connection nodes (circles and arrows), for semantically neutral (aka syntactic) associations to be uniformly implemented across domains, e.g with predicate calculus (bottom left).
  • Semantic connectors supporting both syntactic and semantic associations (bottom, middle). 

Inserting information into knowledge architecture

Using ontologies to integrate models into knowledge architecture is to enable the restructuring of the conceptual debt.

Minding Semantic Gaps

Keeping with the financial metaphor, conceptual debts can be expressed in terms of spreads between models, and as such could be restructured through models transformation.

To begin with, all representations have to be anchored to environments through identified (#) instances.

Anchoring systems to their environment

Then, instances are to be associated to categories according to features
(properties or relationships) :

  • Customers, reservations, tables, and waiters are identified individuals managed through symbolic surrogates.
  • Names of dishes and ingredients do not refer to symbolic surrogates representing business objects, but are just labels pointing to recipes (documents).
  • Idem for the names of wines, except for exceptional vintages with identified bottles to be managed through symbolic surrogates.

As defined above, these models can be equivalently expressed as ontologies:

  • Properties are single-valued attributes.
  • Relationships define links between categories.
  • Aspects are structured sets of features meant to be valued through category instances.
  • Documents are contents to be accessed directly or through networks, (e.g preparations or wine reviews).
Fleshing out model backbone with features, relationships, and documents (black, italic)

It must be noted that the distinction between neutral and specific contents is not meant to be universal but be justified by pragmatic concerns, for instance:

  • Addresses are not defined as aspects but as category instances so that surrogates of actual addresses can be used to optimize deliveries.
  • Links to customers and addresses, being self-explanatory, can be defined as non specific.
  • The relationship from dishes to ingredients is structured and specific.

Sorting out truth-preserving constructs from domain specific ones is a key success factor for models transformation, and consequently debt restructuring.

Restructuring The Debts

Restructuring financial debts means redefining assets and incomes; with regard to systems it would mean reassessing architectures with regard to value chains.

To begin with, the Pagoda blueprint central pillar is to support the integration of systems and knowledge architectures and consequently the dynamic alignment of systems capabilities, meant to be stable and shared, with business opportunities, by nature changing and specific.

Then, the pairing of systems and knowledge architectures, like a DNA double helix, is to be used to restructure both technical and conceptual debts.

Pairing assets and incomes across architectures

With regard to technical debts, restructuring isn’t to present significant difficulties:

  • Pairing income flows (applications) to tangible assets (platforms) can be done at data level.
  • Model transformations between data (code) and information (models) levels can be achieved using homogeneous domain specific and programming languages.

Things are more complex with conceptual debts, for pairing as well as transformations:

  • There is no direct pairing because value chains (processes) are set across assets (organization).
  • Model transformations are to bridge the semantic gap between the
    symbolic representations of environments (knowledge) and systems (information) .

Nonetheless, these difficulties can be overcame combining integrated architectures and ontologies.

Regarding the structure of the conceptual debt, the income part is to be defined through business objectives (customers, products, channels, supply chain, etc.), and assets to be defined by corresponding enterprise architectures capabilities.

How to mind the gap between external and systems representations.

Regarding models transformations, ontologies will be used to mind the semantic gap between environments (knowledge) and systems (information) representations:

  • Power-types: describe instances of categories (age, income, education, …).
  • Specialization and generalization: defined with regard of intent, subsets for individuals (wine, gender), sub-types for aspects (temperature, serve in menu).
  • Knowledge based relationships (dashed line): used to describe objects and phenomena, actual, planned, or expected (face recognition of customers, influence of weather on dishes, association of wines and dishes, …
  • Concepts: introduced to relate information and knowledge: gourmet.
Ontological descriptions

With the backbones of symbolic representations soundly anchored to environment, it would be possible to complement functional and logical models with their conceptual counterpart and by doing so to eliminate conceptual debts. A symmetric policy could be applied to refactoring in order to redeem the technical debt associated to legacy code.

Managing Conceptual Debt

Like financial ones, conceptual debts are facts of life that have to be managed on a continuous basis, which entails:

  • A separate management of models directly tied to systems, and ontologies with broader justification.
  • A distinction between a kernel (aka knowledge engine), environment profiles, and business domains.
EA & Knowledge Management

Further Reading