Focus: Data vs Information

Preamble

Distinctions must serve a purpose and be assessed accordingly. On that account, what would be the point of setting apart data and information, and on what basis could that be done.


From Data Stripes to Information Structure (Victor Vasarely)

Until recently the two terms seem to have been used indifferently; until, that is, the digital revolution. But the generalization of digital surroundings and the tumbling down of traditional barriers surrounding enterprises have upturned the playground as well as the rules of the game.

Previously, with data analytics, information modeling, and knowledge management mostly carried out as separate threads, there wasn’t much concerns about semantic overlaps; no more. Lest they fall behind, enterprises have to combine observation (data), reasoning (information), and judgment (knowledge) as a continuous process. But such integration implies in return more transparency and traceability with regard to resources (e.g external or internal) and objectives (e.g operational or strategic); that’s when a distinction between data and information becomes necessary.

Economics: Resources vs Assets

Understood as a whole or separately, there is little doubt that data and information have become a key success factor, calling for more selective and effective management schemes.

Being immersed in digital environments, enterprises first depend on accurate, reliable, and timely observations of their business surroundings. But in the new digital world the flows of data are so massive and so transient that mining meaningful and reliable pieces is by itself a decisive success factor. Next, assuming data flows duly processed, part of the outcome has to be consolidated into models, to be managed on a persistent basis (e.g customer records or banking transactions), the rest being put on temporary shelves for customary uses, or immediately thrown away (e.g personal data subject to privacy regulations). Such a transition constitutes a pivotal inflexion point for systems architectures and governance as it sorts out data resources with limited lifespan from information assets with strategic relevance. Not to mention the sensibility of regulatory compliance to data management.

Processes: Operations vs Intelligence

Making sense of data is pointless without putting the resulting information to use, which in digital environments implies a tight integration of data and information processing. Yet, as already noted, tighter integration of processes calls for greater traceability and transparency, in particular with regard to the origin and scope: external (enterprise business and organization) or internal (systems). The purposes of data and information processing can be squared accordingly:

  • The top left corner is where business models and strategies are meant to be defined.
  • The top right corner corresponds to traditional data or information models derived from business objectives, organization, and requirement analysis.
  • The bottom line correspond to analytic models for business (left) and operations (right).

Squaring the purposes of Data & Information Processing

That view illustrates the shift of paradigm induced by the digital transformation. Prior, most mappings would be set along straight lines:

  • Horizontally (same nature), e.g requirement analysis (a) or configuration management (b). With source and destination at the same level, the terms employed (data or information) have no practical consequence.
  • Vertically (same scope), e.g systems logical to physical models (c) or business intelligence (d). With source and destination set in the same semantic context the distinction (data or information) can be ignored.

The digital transformation makes room for diagonal transitions set across heterogeneous targets, e.g mapping data analytics with conceptual or logical models (e).

That double mix of levels and scopes constitutes the nexus of decision-making processes; their transparency is contingent on a conceptual distinction between data and information.

At operational level the benefits of the distinction are best expressed through what is commonly known as the OODA (Observation, Orientation, Decision, Action) loop:

  • Data is used to align operations (systems) with observations (territories).
  • Information is used to align categories (maps) with objectives.

Roles of Data (red) & Information (blue) in integrated decision-making processes

Then, the conceptual distinction between data and information is instrumental for the integration of operational and strategic decision-making processes:

  • Data analytics feeding business intelligence
  • Information modeling supporting operational assessment.

Not by chance, these distinctions can be aligned with architecture layers.

Architectures: Instances vs Categories

Blending data with information overlooks a difference of nature, the former being associated with actual instances (external observation or systems operations), the latter with symbolic descriptions (categories or types). That intrinsic difference can be aligned with architecture layers (resources are consumed, assets are managed), and decision-making processes (operations deal with instances, strategies with categories).

With regard to architectures, the relationship between instances (data) and categories (information) can be neatly aligned with capability layers, as represented by the Pagoda blueprint:

  • The platform layer deals with data reflecting observations (external facts) and actions (system operations).
  • The functional layer deals with information, i.e the symbolic representation of business and organization categories.
  • The business and organization layer defines the business and organization categories.

It must also be noted that setting apart what pertains to individual data independently of the information managed by systems clearly props up compliance with privacy regulations.


Architectures & Decision-making

With regard to decision-making processes, business intelligence uses the distinction to integrate levels, from operations to strategic planning, the former dealing with observations and operations (data), the latter with concepts and categories (information and knowledge).

Representations: Knowledge Architecture

As noted above, the distinction between data and information is a necessary counterpart of the integration of operational and intelligence processes; that implies in return to bring data, information, and knowledge under a common conceptual roof, respectively as resources, assets, and service:

  1. Resources: data is captured through continuous and heterogeneous flows from a wide range of sources.
  2. Assets: information is built by adding identity, structure, and semantics to data.
  3. Services: knowledge is information put to use through decision-making.

Ontologies, which are meant to encompass all and every kind of knowledge, are ideally suited for the management of whatever pertains to enterprise architecture, thesaurus, models, heuristics, etc.

CaKe_DataInfoKnow

That approach has been tested with the Caminao ontological kernel using OWL2; a beta version is available for comments on the Stanford/Protégé portal with the link: Caminao Ontological Kernel (CaKe 2.0).

Conclusion: From Metadata to Machine Learning

The significance of the distinction between data and information shows up at the two ends of the spectrum:

On one hand, it straightens the meaning of metadata, to be understood as attributes of observations independently of semantics, a dimension that plays a critical role in machine learning.

On the other hand, enshrining the distinction between what can be known of individuals facts or phenomena and what can be abstracted into categories is to enable an open and dynamic knowledge management, also a critical requisite for machine learning.

FURTHER READING

External Links

Squared Outline: Models As Currency

As every artifact, models can be defined by nature and function. With regard to nature, models are symbolic representations, descriptive (categories of actual instances) or prescriptive (blueprints of artifacts). With regard to function, models can be likened to currency, as they serve as means of exchange, instruments of measure, or repository.

Along that understanding, models can be neatly characterized by their intent:

  1. No use of models, direct exchange (barter) can be achieved between business analysts and software engineers.
  2. Models are needed as medium supporting exchange between organizational units with different business or technical concerns.
  3. Models are used to assess contents with regard to size, complexity, quality, …
  4. Models are kept and maintained for subsequent use or reuse.

Depending on organizations, providers and customers could then be identified, as well as modeling languages.

FURTHER READINGS

Squared Outline: Metrics

As it’s the case of every measurement, software engineering metrics must be defined by clear targets and purposes, and using them shouldn’t affect neither of them.

On that account, a clear distinction should be maintain between business value (set independently of supporting systems), the size and complexity of functionalities, and the work effort needed for their development. As far as systems are concerned, the Function Points approach can be defined with regard to the nature of requirements (business or system), and their scope (primary for artifact, adjustment for architecture):

  1. Measures of business requirements are based on intrinsic domain complexity (domains function points, or DFP), adjusted for activities (adjustment function point, or AFP); they are set at artifact level independently of operational constraints or supporting systems.
  2. Business requirements metrics are added up and adjusted for operational constraints.
  3. Functional requirements measures target the subset of business requirements meant to be supported by systems. As such they are best defined at use case level (use case function points (UCFP).
  4. Metrics for quality of service may be specific to functionalities or contingent on architectures and operational constraints.

Whatever the difficulties of implementation, function points remain the only principled approach to software and systems assessment, and consequently to reliable engineering costs/benefits analysis and planning.

FURTHER READING

Squared Outline: Agile

The Agile development model should not be seen as a panacea or identified with specific methodologies. Instead it should be understood as a default option to be applied whenever phased solutions can be factored out.

Agile (a,b) versus phased (d,b,c,) development processes
  1. Scope: Of the twelve agile principles, ten apply to any kind of development, and only two are specific, namely shared ownership and continuous delivery .
  2. Characteristics: Assuming conditions are met, agile software engineering can be fully and neatly defined by a combination of users stories and iterative development.
  3. Alternative: When conditions cannot be met, i.e when phased solutions are required, model-based system engineering frameworks should be used to integrate business-driven projects (agile) with architecture oriented ones (phased).
  4. Variants and extensions: Even when conditions about shared ownership and continuous delivery are met, scaling issues may have to be taken into account; in that case they should be sorted out between broader business objectives on one hand, systems architecture engineering on the other hand

These guidelines are not meant to define how agile projects are to be carried out, only to determine their scope and relevance along other systems engineering processes.

Further Reading

Squared Outline: Enterprise Architecture

Whatever their nature, architectures can be defined as structured collections of assets and mechanisms shared by a set of active entities with common purposes: houses for dwelling, factories for manufacturing processes, office buildings for administrative ones, human beings for living, etc.

EASquare_PbsSols
Layers of Problems & Solution

Along that reasoning enterprises architectures should be defined in terms of one distinction and three layers:

  1. A distinction between specific and changing business contexts and opportunities on one hand, shared and stable capabilities on the other hand (represented with the Zachman’s framework above).
  2. The enterprise layer deals with the representation of business environment and objectives (aka business model), organization and processes.
  3. The system layer deals with the functionalities of supporting systems independently of platforms.
  4. The platform layer deals with actual systems implementations.

It must be noted that while the layered perspective is widely agreed (names may differ), taxonomies often overlap.

Further Reading

Squared Outline: Actors


UML Actors (aka Roles) are meant to provide a twofold description of interactions between systems and their environment: organization and business process on one hand, system and applications on the other hand.

That can only be achieved by maintaining a conceptual distinction between actual agents, able to physically interact with systems, and actors (aka roles), which are their symbolic avatars as perceived by applications.

As far as the purpose is to describe interactions, actors should be primary characterized by the nature of language (symbolic or not), and identification coupling (biological or managed):

  1. Symbolic communication, no biological identification (systems)
  2. Analog communication, no biological identification (active devices or equipments)
  3. Symbolic communication, biological identification (people)
  4. Analog communication, biological identification (live organisms)

While there has been some confusion between actors (or roles) and agents, a clear-cut distinction is now a necessity due to the centrality of privacy issues, whether it is from business or regulatory perspective.

FURTHER READING

Squared Outline: States

States are used to describe relevant aspects in contexts and how the changes are to affect systems representations and behaviors.

On that account, events and states are complementary: the former are to notify relevant changes, the latter are to represent the partitions (²) that makes these changes relevant. Transitions are used to map the causes and effects of changes.

  1. State of physical objects.
  2. State of processes’ execution.
  3. State of actors’ expectations.
  4. State of symbolic representations.

Beside alignment with events, defining states consistently across objects, processes, and actors is to significantly enhance the traceability and transparency of architectures designs.

FURTHER READINGS

Squared Outline: Cases vs Stories

Use cases and users’ stories being the two major approaches to requirements, outlining their respective scope and purpose should put projects on a sound basis.

Cases & Stories

To that end requirements should be neatly classified with regard to scope (enterprise or system) and level (architectures or processes).

  • Users stories are set at enterprise level independently of the part played by supporting systems.
  • Use cases cut across users stories and consider only the part played by supporting systems.
  • Business stories put users stories (and therefore processes) into the broader perspective of business models.
  • Business cases put use cases (and therefore applications) into the broader perspective of systems capabilities.

Position on that simple grid should the be used to identify stakeholders and pick between an engineering model, agile or phased.

FURTHER READING

Squared Outline: Layers

The immersion of enterprises into digital environments is blurring the traditional distinctions between architecture layers. Hence the need of clarifying the basic notions.


The Pagoda Architecture Blueprint is derived from the Zachman’s framework

Beyond the differences in terminologies (layers, levels, tiers, etc), four basic taxonomies can be applied:

  1. Enterprise architecture: business processes and organization, systems, platforms (Pagoda blueprint).
  2. Functional architecture: interfaces, control, persistency, services (Model/View/Controller).
  3. Representation: physical, logical, conceptual (Pagoda blueprint).
  4. Economic intelligence: data, information, knowledge

While some alignments are intrinsic, making explicit use of taxonomies is useful because they serve specific purposes.

n.b. The term “application layer” is usually defined in the context of communication architectures.

Further Reading

Squared Outline: Events

In line with the Symbolic Systems paradigm, events are best understood in terms of interactions between systems (or enterprises) and the environment they represent (or live off).

Events with regard to nature and coupling

On that account events are to be associated with four kinds of notifications:

  1. Actual (aka non symbolic) changes in the state of objects.
  2. Actual (aka non symbolic) changes in the state of activities.
  3. Changes in the state of expectations (e.g requests/acknowledgments).
  4. Neutral changes in symbolic representations (e.g messages).

It must be noted that while synchronization constraints (e.g UML’s calls vs signals) characterize events’ communication semantics, they say nothing about events themselves.

That distinction is of a particular importance for the definition of time as a
change in a dedicated physical device, ensuring that, according to Einstein, events don’t happen at once.

FURTHER READINGS