The limits of my language mean the limits of my world
The world is the totality of facts, not of things
The swift and widespread tide of Large language models (LLMs), and more generally of Generative AI technologies, seems to be relegating Enterprise architecture on a back seat. But tides turn and EA may return accompanied by a significant other that could enable the integration and interoperability of data analytics, information systems, and corporate knowledge. That’s the objective of the Caminao kernel (CaKe): an OWL-based ontological framework with built-in knowledge modalities.
Ontologies & Languages
Ontologies are about knowledge, more precisely about what can be said about realities, assuming that we don’t know for sure what reality is, or how many there can be, a particular challenge for enterprise governance.
Knowledge maps are by nature symbolic and as such must be expressed through language, knowledge graphs (KGs) being a de-facto standard; that concomitance of contents (knowledge) and arrangements (graphs) induces some confusion which masks the modalities of knowledge (the nature and degrees of knowns and unknowns) generally ignored or uniformly represented through semantic connectors. A built-in distinction is thus needed between knowledge as facts (data and percepts), categories (information and reason), and concepts (ideas, values, intents, and judgments), with modalities defined accordingly:
- Conceptual (aka intensional), for the nature of mental representations: abstract, concrete, virtual, nominal, …
- Symbolic (aka logical), for the representation of existential, structural, and behavioral features deemed to be relevant: identification, instanciation, life cycle, communication, …
- Empiric (aka extensional), for the way facts are known: observed, asserted, assessed, deduced, …
Overlaps ensure the consistency of symbolic contents through thesauruses (facts/concepts), taxonomies (facts/categories), and ontologies (concepts/categories).
Models & Ontologies
Ignoring knowledge modalities opens the door to a confusion between models and ontologies; a confusion especially detrimental when new technologies like LLMs (for documents) or data-mining (for facts) are combined with established ones. That confusion is comforted by the legacy of an outdated data modeling paradigm which, beside lip service, makes no conceptual room for information and knowledge, as illustrated by the rebranding of Entity-Relationship (E/R) models as Knowledge graphs (KGs).
Taking the simple example of an actual character (WA Mozart) and its virtual avatars in movies, except for a few ontological connectors (eg isa of include), with E/R modeling the semantics of entities and relationships is supposed to be domain-specific whatever their epistemic nature: nominal tokens, documents (music pieces, movies), managed records (subscribers), or actual people.
Such miscellany of nodes (entities) and connecting nodes (relationships) can be misguidedly equated with (flattened) knowledge graphs, and consequently ontologies, with knowledge modalities being represented like standard E/R relationships. Such an approach comes with two major caveats:
- The lack of transparency regarding the semantics of connectors (ontological or domain specific) undermines designed interoperability between applications or systems set on different kinds of knowledge, typically implicit or nominal (eg LLMs) and explicit (eg ontologies).
- Any scaling up is bound to produce spaghetti messes of exponential complexity.
The primary objective of ontologies designers should therefore to sort symbolic contents with regard to their nature: facts, concepts, or categories:
Representing the exemple with the OWL/WebProtégé Caminao Kernel (CaKe):
- The term “Mozart” comes with a primary nominal reference (nominRef#_) to actual WA Mozart, and a nominal reference (nominRef_) to the movie Amadeus
- The movie includes (composition) fictional characters with identifying references (linkRef#_), and references (linkRef_) to the actual WA Mozart
- The actual WA Mozart appears as a person and a musician, musician being defined as an actual role with a conceptual reference to agents, contrary to fictional roles
In order to take advantage of their immersion in digital environments enterprises must ensure the interoperability of all their intangible resources and assets independently of supporting languages and tools. With ontological prisms as symbolic hubs, interfaces with EA engineering platforms would be of two kinds:
Cross interfaces, to provide shallow access to all resources according to names (thesauruses), structure (XML), and purpose (RDF):
- Thesauruses ensure the consistency of the semantics attached to names
- Resources Description Framework (RDF) is the Swiss Army Knife for the graph-based representation of symbolic contents
- Extensible Markup Language (XML) is the Swiss Army Knife for the representation of documents structure, ensuring the storing, transmitting, and reconstructing of arbitrary contents independently of their meaning
These interfaces give access to symbolic artifacts independently of purposes and semantics.
Dedicated interfaces pertain to symbolic artifacts according to purpose and semantics, eg:
- Data files: JSON
- Data/Facts: SQL (datasets), statistical series (SAS, MATLAB, SPS), references (NIEM), Object-oriented analysis (ORM, NIAM)
- Information/Categories: general (UML) and specialised (Archimate, SysML, BPMN) modeling languages and methods (OOD)
- Knowledge and concepts: ontology (OWL) and logic (Prolog) languages
Using OWL ontology programming interfaces (OPI) could be developed with general purpose environments (eg XML, UML, SQL) or specific ones (eg LLMs, Prolog, ORM).
Caminao Kernel (CaKe)
This is a brief summary of built-in CaKe’s ontological modalities and properties (CaKe 4.0).
Ontological modalities can be understood as an ultimate form of meta-models bringing under a common roof empiric, conceptual, and logical representations. The Caminao kernel introduces three kinds of modalities:
- Empiric modalities characterise extensional percepts: observations, assertions, assessments, deductions, managed surrogates
- Conceptual modalities characterize intentional representations: abstract (no direct reference to environments), concrete (pertaining to physical or symbolic environments), virtual (pertaining to hypothetical or fictional environments), and nominal (pertaining to the vocabulary used to label environments), …
- Logical modalities characterise symbolic descriptions, ie the categories used to describe objects and phenomenons: identification, instantiation, agency, containment, life cycle, communication, …
Postfixes are used to mark CaKe built-in constructs (_), roots of conceptual modalities (Ξ), modeling templates (≈), and user-defined anchors (#):
Modalities are combined with ontological properties in order to support the interoperability and transparency of ontologies.
Interoperability means that representations can be combined across modalities, and transparency can be achieved through filters applied to modalities. To that effect CaKe adds generic (ontological) connectors to OWL/Protégé properties:
- Nominal connectors associate terms to facts, concepts, or categories
- Instance connectors are used to define reference and execution links between individuals
- Category connectors are used to define structural and functional relationships between descriptions of sets, types, or classes
- Logic connectors associate expressions to to facts, concepts, or categories
- Ontology connectors are used to build thesauruses
- Organizational connectors are used to define contexts
- Process connectors are used to define activities
Tailored views can thus be built by selecting properties using Protégé filters, eg:
Use Cases Overview
Whatever the perspective (facts, concepts, or categories), enterprises environments are in constant change, and the primary objective of ontologies should be to ensure the consistency of changes between factual observations, business models and organisation, and supporting systems.
Business is by nature opportunistic, and success consequently depends on temporal knowledge set at the hub of different time frames:
- Changes in data reflect changes in environments, and associated observations are by themselves time dependent
- Changes in concepts are driven by social values and purposes which may or may not be directly determined by changes in environments
- Changes in categories can be emerging from facts (bottom-up) as well as designed from concepts (top-down), before being embodied in organisations and processes
Taking for granted that knowledge is meant to be shared across a wide range of social entities governed by different purposes, its management must ensure the continuity and consistency of representations across the respective time frames; to that end three primary activities must be integrated:
- Naming: putting labels on relevant individuals or sets thereof in environments
- Thinking: defining the mental representations (meanings) necessary to support communication
- Modeling: defining the symbolic representations meant to be shared across organisations and/or systems
Ontologies can achieved such integration with two mechanisms:
- Anchors, to associate elements to their original dimension: identified facts (#), conceptual modalities (Ξ), and modeling templates (≈).
- Hybrid catalogs, between facts and concepts (thesauruses), facts and categories (taxonomies), and concepts and categories (ontologies).
Knowledge can then be nurtured bottom-up from facts to concepts (eg data mining) or categories (eg process mining), or top-down explaining facts with concepts (analysis) or organizing them along categories (architecture, design, realization).
Bottom-up Use Cases
Bottom-up use cases are driven by changes on named observations organised as datasets or documents, the former for series of homogeneous elements, the latter for structured sets of heterogenous ones.
On clockwise (facts/concepts) paths, data mining can be applied to digital (eg soundtracks) and symbolic (eg subset of subscribers fond of Mozart) datasets to identify concepts and profiles; indexing tools (eg SPIRES) can be used to build glossaries and thesauruses; machine learning methods like Large language models (LLMs) generate meanings and visuals from textual tokens.
On counterclockwise (facts/categories) paths, physical models can be built for datasets using statistical methods and tools (eg SPSS, SAS, MATLAB); and logical and conceptual ones built for documents using parsers and modeling methods and tools (eg NIAM, ORM).
Top-down Use cases
Top-down use cases are driven by intents or designs, starting with concepts and/or categories to explore environments, and/or develop actual and symbolic artefacts.
On clockwise (concepts/categories) paths, use cases cover general development steps:
- Modeling (Ξ/≈) translates values and goals into categories
- Realisations (≈/#) transforms models into artefacts
On counterclockwise (concepts/facts) paths, use cases can be summarily grouped into three kinds of exploration:
- Fetches (#?) are meant to retrieve datasets and/or documents based on names and/or features
- Searches (≈?) are meant to retrieve datasets and/or documents based on categories and/or features
- Queries (Ξ?) are meant to add intents to searches
These use cases are meant to provide the nuts and bolts of enterprise architecture governance.
EA Abstractions & Realisations
Abstractions & Realisations
While abstraction is routinely considered as a panacea for models interoperability, differences in abstraction semantics are generally ignored. That neglect is of little consequence when abstraction is applied to the same kind of symbolic representations, but it turns to be critical when abstraction is set across different modeling semantics, hence the need of a distinction between:
- Homogeneous (aka modeling) abstractions, for generalisations and specialisations between representations with consistent semantics: data, business, or system models
- Heterogenous (aka architecture) abstractions, for increasing symbolic levels of representations across different semantics
- Realisation, for decreasing symbolic levels of representations across different semantics
Homogeneous abstraction semantics can be defined for concepts (isa), facts (partitions and subsets), and categories (subtypes).
- Musicians can appear in facts as terms, actual individuals, or fictional ones
- Individuals can be partitioned along any kind of feature: age, instrument, style, etc
- Partitions can be represented by categories, with types and subtypes defined with regard to managed features: poet, composer, performer, soloist, etc
- Individuals and/or partitions can be mapped to concepts with is-a or kind-of relationships which may or may not coincide with categories, features, and subtypes
As long as abstractions can be circumscribed to bounded semantics decisions can be made within relevant organisational units and hided from enterprise architecture level.
Heterogeneous abstractions and realisation semantics must be defined at enterprise architecture level in line with main governance domains.
Business Intelligence serves as a bridge between extensional representations of physical and symbolic environments on the one hand, intensional representations of concepts, values, and objectives on the other hand. Clockwise, business analysts abstract facts into concepts; counterclockwise, they try to understand how concepts are realised through facts.
System engineering does the same for requirements and operations (environments) and symbolic representations of organisation and systems. Clockwise, system engineers use models to realise actual systems; counterclockwise, business and system analysts abstract models from requirements.
Strategic planning, contrary to business intelligence and system engineering, is not bound to environments as its aim is to align values, objectives, and time frames on the one hand, organisation and system architectures on the other hand. Clockwise, business stakeholders and enterprise architects define blueprints and roadmaps meant to realise business models; counterclockwise, enterprise and systems architects try to weave emerging transformations with planned ones.
- Signs & Symbols
- Generative & General Artificial Intelligence
- Thesauruses, Taxonomies, Ontologies
- EA Engineering interfaces
- Ontologies Use cases
- Cognitive Capabilities
- LLMs & the matter of transparency
- LLMs & the matter of regulations
- Uncertainty & The Folds of Time