Distinctions must serve a purpose and be assessed accordingly. On that account, what would be the point of setting apart data and information, and on what basis could that be done.
Until recently the two terms seem to have been used indifferently; until, that is, the digital revolution. But the generalization of digital surroundings and the tumbling down of traditional barriers surrounding enterprises have upturned the playground as well as the rules of the game.
Previously, with data analytics, information modeling, and knowledge management mostly carried out as separate threads, there wasn’t much concerns about semantic overlaps; no more. Lest they fall behind, enterprises have to combine observation (data), reasoning (information), and judgment (knowledge) as a continuous process. But such integration implies in return more transparency and traceability with regard to resources (e.g external or internal) and objectives (e.g operational or strategic); that’s when a distinction between data and information becomes necessary.
Economics: Resources vs Assets
Understood as a whole or separately, there is little doubt that data and information have become a key success factor, calling for more selective and effective management schemes.
Being immersed in digital environments, enterprises first depend on accurate, reliable, and timely observations of their business surroundings. But in the new digital world the flows of data are so massive and so transient that mining meaningful and reliable pieces is by itself a decisive success factor. Next, assuming data flows duly processed, part of the outcome has to be consolidated into models, to be managed on a persistent basis (e.g customer records or banking transactions), the rest being put on temporary shelves for customary uses, or immediately thrown away (e.g personal data subject to privacy regulations). Such a transition constitutes a pivotal inflexion point for systems architectures and governance as it sorts out data resources with limited lifespan from information assets with strategic relevance. Not to mention the sensibility of regulatory compliance to data management.
Processes: Operations vs Intelligence
Making sense of data is pointless without putting the resulting information to use, which in digital environments implies a tight integration of data and information processing. Yet, as already noted, tighter integration of processes calls for greater traceability and transparency, in particular with regard to the origin and scope: external (enterprise business and organization) or internal (systems). The purposes of data and information processing can be squared accordingly:
- The top left corner is where business models and strategies are meant to be defined.
- The top right corner corresponds to traditional data or information models derived from business objectives, organization, and requirement analysis.
- The bottom line correspond to analytic models for business (left) and operations (right).
That view illustrates the shift of paradigm induced by the digital transformation. Prior, most mappings would be set along straight lines:
- Horizontally (same nature), e.g requirement analysis (a) or configuration management (b). With source and destination at the same level, the terms employed (data or information) have no practical consequence.
- Vertically (same scope), e.g systems logical to physical models (c) or business intelligence (d). With source and destination set in the same semantic context the distinction (data or information) can be ignored.
The digital transformation makes room for diagonal transitions set across heterogeneous targets, e.g mapping data analytics with conceptual or logical models (e).
That double mix of levels and scopes constitutes the nexus of decision-making processes; their transparency is contingent on a conceptual distinction between data and information.
At operational level the benefits of the distinction are best expressed through what is commonly known as the OODA (Observation, Orientation, Decision, Action) loop:
- Data is used to align operations (systems) with observations (territories).
- Information is used to align categories (maps) with objectives.
Then, the conceptual distinction between data and information is instrumental for the integration of operational and strategic decision-making processes:
- Data analytics feeding business intelligence
- Information modeling supporting operational assessment.
Not by chance, these distinctions can be aligned with architecture layers.
Architectures: Instances vs Categories
Blending data with information overlooks a difference of nature, the former being associated with actual instances (external observation or systems operations), the latter with symbolic descriptions (categories or types). That intrinsic difference can be aligned with architecture layers (resources are consumed, assets are managed), and decision-making processes (operations deal with instances, strategies with categories).
With regard to architectures, the relationship between instances (data) and categories (information) can be neatly aligned with capability layers, as represented by the Pagoda blueprint:
- The platform layer deals with data reflecting observations (external facts) and actions (system operations).
- The functional layer deals with information, i.e the symbolic representation of business and organization categories.
- The business and organization layer defines the business and organization categories.
It must also be noted that setting apart what pertains to individual data independently of the information managed by systems clearly props up compliance with privacy regulations.
With regard to decision-making processes, business intelligence uses the distinction to integrate levels, from operations to strategic planning, the former dealing with observations and operations (data), the latter with concepts and categories (information and knowledge).
Representations: Knowledge Architecture
As noted above, the distinction between data and information is a necessary counterpart of the integration of operational and intelligence processes; that implies in return to bring data, information, and knowledge under a common conceptual roof, respectively as resources, assets, and service:
- Resources: data is captured through continuous and heterogeneous flows from a wide range of sources.
- Assets: information is built by adding identity, structure, and semantics to data.
- Services: knowledge is information put to use through decision-making.
That approach has been tested with the Caminao ontological kernel using OWL2; a beta version is available for comments on the Stanford/Protégé portal with the link: Caminao Ontological Kernel (CaKe_).
Conclusion: From Metadata to Machine Learning
The significance of the distinction between data and information shows up at the two ends of the spectrum:
On one hand, it straightens the meaning of metadata, to be understood as attributes of observations independently of semantics, a dimension that plays a critical role in machine learning.
On the other hand, enshrining the distinction between what can be known of individuals facts or phenomena and what can be abstracted into categories is to enable an open and dynamic knowledge management, also a critical requisite for machine learning.
- Systems, Information, Knowledge
- Knowledge Architecture
- Conceptual Debts
- EA and the Pagoda Architecture Blueprint
- EA: Entropy Antidote
- Open Concepts
- Models & Meta-models
- Ontologies & Models
- Open Ontologies: From Silos to Architectures
- Ontologies & Enterprise Architecture,
- Enterprise Governance & Knowledge
- Data Mining & Requirements Analysis
- Ontologies as Productive Assets
- Caminao Ontological Kernel (Protégé/OWL 2)
- Abstractions & Emerging Architectures
- Models Transformation
- Knowledge Based Models Transformation
- The Cases for Reuse
- The Economics of Reuse
- Legacy & Modernization