A Knowledge Engineering Framework

A Knowledge Factory (Ala ad-Din Mansur-Shirazi )

For enterprises, and more generally for organizations, a comprehensive and effective digital transformation entails the integration of systems and knowledge architectures. For the ones living in competitive environments, it also calls for continuous individual and collective learning. Hence the benefits of a knowledge engineering framework enabling the interoperability of enterprise governance’s basic functions: business intelligence, systems design, and decision-making.

Ontologies & Knowledge Management

Taking a leaf from Spinoza, organizations can expand their knowledge (or learn) in three ways:

  • Through senses, typically by applying data- and process mining to facts observed at the digital level.
  • Through reasoning, by applying knowledge graphs to the information managed by systems.
  • Through judgment, as carried out by people and organizations putting information to use as knowledge.

Those principles can be effectively applied to EA engineering using a knowledge based framework combining ontologies and epistemic modalities.

Ontologies

The aim of ontologies is to provide a comprehensive and consistent account of the nature of things. For organizations, that entails a twofold integration of symbolic contents (for shared representations) and semantics (for domain-specific communications).

Contents & Representations

Symbolic contents must be organized according to their status:

  • Terms, employed to name individuals entries whatever their nature (glossaries or lexicons)
  • Facts from environments (data)
  • Categories, meant to make sense of facts and define objectives, organization, and systems (models)
  • Documents, vessels used to store, exchange, or communicate contents
  • Concepts, for organizing cognitive (aka mental) representations independently of reifications and/or representations
Anatomy of Ontologies

Then, since contents are expressed through languages, they must be mapped to linguistic dimensions:

  • Syntactic: rules defining how the terms can be combined
  • Lexical: for the individual meaning of the terms employed
  • Semantic: for the meaning of the syntactic constructs (aka phrases)
  • Pragmatic: for the meaning of the semantic constructs depending on contexts

Finally, in order to ensure their interoperability, ontologies must maintain an epistemic distinction between what is known and how is it known.

That organization of contents in terms of representation (thesauruses, models, ontologies), epistemic nature (modalities), and language constitutes the backbone of ontologies.

Semantics & Communication

The aim of the kernel is to ensure the consistency and interoperability of symbolic resources (data, information, knowledge) independently of contexts and concerns. The first objective is thus to avoid the ambiguous, contradictory, and circular definitions thwarting most institutional standards; that can be done with a set of axioms operating like a music scale, with equivalent benefits:

  • It circumscribes semantics controversies to small sets of clear-cut definitions meant to be either accepted or rejected as they are.
  • Axioms can serve as firebreaks in thesauruses preventing circular definitions.
  • Like keys for music variations, axioms enable open-ended definitions, facilitate translations, and can serve as semantic bridges between alternative representations.

Applying the same principle, models could be expressed or translated (played) using different sets of axioms (or key) depending on purpose, without unwarranted assumptions with regard to the truth of representations. Taking the kernel as an exemple, seven axioms are selected with regard to instanciation (attribute, instance, identity), containment (collection), and behavior (event, state):

It must be noted that the only purpose of these axioms is to eliminate circular definitions and improve the organization of thesauruses.

Ontological Prisms

The digital revolution calls for a change of paradigm that could take into account the difference between:

  • Data, for facts as defined in environments
  • Information, for the categories used to manage symbolic representations
  • Knowledge, for the concepts used to define enterprises’ objectives, assets (tangible or otherwise), and value chains

That can be achieved by a simple shift of perspective from layered pyramids to prisms ensuring a seamless and consistent integration of thesauruses, models, and knowledge graphs.

Architecture

Ontological prisms can be seen as symbolic gears between organizations and their environment, gears that could be implemented by digital twins.

On the one hand facts (data) and concepts (knowledge) are meant to be mapped to physical (digital) and symbolic (business) environments.

On the other hand categories (information) are meant to provide a comprehensive, consistent, and actionable representation of enterprises’ architectures.

Ontological diffraction & the Fabric of Knowledge

In between ontological prisms ensure the conceptual interoperability of ontological contents through built-in epistemic modalities:

  • Extensional modalities: whether facts are observed, asserted, assessed, deduced, managed
  • Intentional modalities: whether concepts are meant to be abstract (meanings, no instances), concrete (symbolic or physical instances), nominal (terms or labels, no meanings), or virtual (meanings set in the mind of beholders)
  • Representation modalities: how categories are apprehended

Epistemic modalities can be used to manage the external validity and the internal consistency of representations.

Functionalities

Ontological prisms enable the composition/decomposition of enterprises’ symbolic resources ensuring the interoperability of functionalities at different levels.

First, enterprise architecture governance issues:

  • Requirements (actual facts)
  • Data analytics (virtual facts)
  • Business analysis (facts/concepts)
  • Business intelligence (concepts)
  • Strategic planning (concepts/categories)
  • Systems engineering (categories)
  • Systems modeling (facts/categories)
EA Governance Issues

With regard to systems engineering, ontological prisms enable the integration and interoperability of the three main modeling approaches:

  • Object oriented analysis and design (#OO), when the purpose is to manage symbolic representations (aka surrogates)
  • Goal oriented (#OO), when the purpose is to define business objectives and design business processes
  • Aspect oriented (AO) , when the purpose is to give structure and meaning to observed data
Integration & Interoperability of Modeling Paradigms

On a broader perspective, ontological prisms can help to align interfaces and languages with contexts and purposes:

  • Human interfaces: natural, formal, or generative languages
  • Machine interfaces: formal, modeling, and programming languages
  • Nature interfaces (physical or digital): nominals (taxonomies) and scientific languages
Prism alignment with interfaces and languages

Like the mythological chimeras, enterprise architectures can be seen as live assemblages of physical and symbolic artifacts that constantly morph forms and switch functions. 

Ontological prisms can thus provide the gears of continuous, smooth, and double-edged transformations: realization of symbolic (objectives or models) artifacts into physical (hardware) or digital (code) ones; and their reverse as learning new symbolic representations (from environments) or retrieving past ones (from legacy code).

Caminao Kernel

The aim of the Caminao kernel is to support ontological prisms; it is built on core OWL constructs set along clear boundaries.

Categories & Modalities

The Kernel is characterized by built-in constructs, (postfix ‘_’), modal categories (postfix ‘Ξ’), and stereotypes and patterns (postfix ‘≈’); OWL data properties are not concerned and left to ontologies designers.

The kernel categories are aligned with established notions of objects, agents, activities, events, locations; these categories can be subtyped independently of the modal ones.

Kernel Modal Categories ((Ξ)

OWL 2 hierarchies (yellow lines) are used to define subtypes on a conceptual basis without making assumptions regarding abstraction semantics, except for templates. For example, the way Orchestra and Organisation are meant to realize their modal categories is left to ontologies designers.

Aspects & Properties

Aspects are used for the description of instances identified locally, i.e. independently of individuals in symbolic or physical environments; it ensues that the instances of aspects can only exist as local components identified within organizations or systems. 

  • Features (Properties and operations).
  • Numeric and logic expressions.
  • Abstract data types (collections, graphs, …)
  • States (lifecycles)

Since aspects and properties do not represent individuals in environments their meaning is not affected by shifts in external semantics and can thus be set in terms of standard generic of functional constructs, ensuring interoperability with modeling languages.

With OWL 2, the kernel defines aspects in the Class hierarchy, connectors as object properties, and attributes as data properties.

Local objects as aspects, connectors as object properties, attributes as data properties

Kernel connectors are built-in constructs meant to be used independently of OWL 2 hierarchies; the aim is to ensure ecumenism (with regard to methods), continuity (along engineering cycles), and interoperability (across engineering environments):

  • Syntax connectors constitute the kernel backbone. They ensure an epistemic distinction between instance and category levels, to be applied uniformly independently of domain semantics.
  • Ontology connectors deal with semantics (terms) and reasoning (logic)
  • Organization and engineering connectors are specific to enterprise architecture

The semantics of these connectors is conditioned by the prism dimension (facts, concepts, categories), with the realization_ connectors employed across dimensions.

Since there is no sub-typing of OWL2/Protégé properties, hierarchies in menus are just indicative.

Instances

Most conceptual models define individuals as objects that can be uniquely identified in reality without considering the meaning of reality itself. The kernel goes further and introduce epistemic modalities to characterize the reality of targeted individuals.

Regarding identity, a distinction is made between intrinsic and managed identities:

  • Intrinsic identities cannot be dissociated from objects, typically for biological constraints
  • Managed identities (symbolic or physical) are set by context, typically institutions or organizations (social identities), or industries (manufactured identities)

Regarding reality, a distinction is also made between nature (symbolic or physical) and status (actual or virtual).

Compared to exclusive semantics of modeling languages ontologies make room for multiple and changing options; e.g.:

Stereotypes (≈) & Modal Categories (Ξ)
  • Characters in movies (fictional) can point to actual persons with social and physical identities
  • Persons can be identified directly or through roles (e.g. MusicComposer) and capacities (e.g. Musician)

Combined with core consolidated syntax, the all-inclusive semantics of ontological prisms ensures the interoperability of all enterprises’ symbolic resources.

Ontologies & Knowledge Management

With thesauruses providing semantic clearing houses for terms across ontologies, the aim of ontological prisms is to guarantee the continuity and consistency of meanings, i.e. the mapping of new facts, concepts, or categories with existing ones.

Knowledge management (M) and engineering (E).

That can be achieved through six basic areas, three for the management of homogeneous contents (M), and three for cross-content engineering (E).

Data & Facts

Facts are derived from data obtained from symbolic (business) or physical (digital) environments. They can be directly observed, asserted from past experience, or assessed or deduced from partial observations. Observed facts can also be managed.

Facts are rooted in physical or symbolic environments

From a technical perspectives facts can be acquired through three basic stages: nominal items, datasets, or propositions.

  • Nominal items constitute the first stage when observations come with no names. As epitomized by training phases of supervised machine learning, data items must be turned into nominal (i.e. labelled) ones before being used to define facts. With terms logged in thesauruses, facts themselves can be obtained by establishing connections between nominal items.
  • One step further, datasets are tabulated data with labels meant to be further developed into categories.
  • Facts can also be introduced as assertions and propositions, for instance using formal languages like Prolog.

Taking for granted an existing body of knowledge (facts, concepts, and categories), the first step is to map new facts with current knowledge.

Consider two datasets, one for musical instruments, the other for musicians:

The musical instruments dataset includes two labelled items (nominal mode) to be possibly assessed (nominalRef_) as known types, one as Harmonica, the other as Lute or Oud.

Contrary to engineering connectors (realizationBy/Of_), nominalRef_ is used between nominal items and concepts or categories, (nominalRef#_ for external identification).

The musicians dataset includes two kinds of (observed) nominal items, labelled ones (e.g. dsElem:Dylan) and anonymous ones (dsElem:MusicianY).

Labelled items can be matched to managed surrogates (e.g. musician:R.A.Zimmerman (aka B.Dylan)) and/or individuals identified in the environment (e.g. Bob Dylan).

It must be noted that, in contrast to the musical instruments exemple, the mapping of nominal items (dsElem:MusicianY) is anchored to an external personal identification (nominalRef#_).

Facts can then be defined through the association between types and aspects or by partitioning sets of objects.

The ORM (Object-role modeling) provides a reference for fact-based modeling, and more precisely the definition of domain objects and datatypes; with the kernel that is done in terms of categories and aspects:

Object-role modeling as a first move

Facts are established using connectors: reference between object types (e.g. MusicInstrument) or inclusion, between object types and datatypes (e.g. birthPlace, birthDate).

Establishing facts using subset, reference, or inclusion connectors

ORM datatypes (aka aspects) can be represented by features with intrinsic semantics, e.g. name or birthDate, or by coded references, e.g. geographicSpace.

Subsets can be used between sets of individuals or sets of values

Partitions are aspects used to characterize subsets of object types sharing selected features. When introduced with facts, i.e. before the definition of categories, partitions can be set independently of properties, e.g. Author sortedBy_ and include_ AuthorGenre can be set separately until consolidated with a category; functional partitions may also remain implicit (e.g. Author sorted by personAge().

Partitions & Subsets

Partitions and subsets may or may not be turned into subtypes, e.g.:

  • Orchestras are partitioned with regard to genre using coded attributes (a)
  • Orchestras and orchestra sections are partitioned with regard to type, using powertypes to manage the values shared by corresponding subsets, e.g. sections’ musicians (b)
  • Sub-types are introduced when features are different between subsets, e.g. sections are specific to philharmonic orchestras (c)

Partitions, powertypes, and subsets can serve as modeling bridges between facts (data view) and categories (information view).

Categories & Information

The aim of categories is to define structural or behavioral aspects shared by individuals of a same type. Categories can serve two kinds of purposes:

  • Extensional categories (descriptive or predictive models), for the representation of individuals identified (and thus observed) externally
  • Intensional categories (prescriptive or technical models), for the specification of individuals identified (and thus designed) internally
Categories provide managed representations between facts and concepts

Depending on target, categories can be introduced along top-down (from concepts) or bottom-up (from facts) modeling perpectives; in any case, as to ensure the consistency of perspectives, categories should be characterized by modalities before being specialized, generalized, or decorated with aspects.

Modalities can be used to manage external and internal validity without having to look into domains’ specifics:

  • External validity : consistent representation of facts by categories independently of abstraction levels
  • Internal validity: consistency of designs (structures and behaviors) independently of environments and abstraction levels

Orchestra for exemple can first be checked for structures by comparing its behavior (real time) with the one of Organization (active), and then for aspects, for instance that Registration (Orchestra) can rely on a social identity (Organization).

Contrary to traditional stereotypes, Kernel modalities are meant to be applied across categories independently of inheritance hierarchies, preventing overlaps or conflicts with modeling languages semantics, and thus ensuring interoperability.

In addition to generic OWL hierarchies, the kernel introduces three types of specific connectors:

  • subset_ and subsetOf_, for structural hierarchies, i.e. when the features of base and subtypes are bound to a common identity
  • extendedBy_ and extensionOf_, for functional hierarchies, i.e. when the features of base and subtypes can be set independently of identities
  • realizedBy_ and realizationOf_, for instantiation, between concepts and facts or categories and artefacts
Orchestras defined by structure (subset_) and/or function (extendedBy_)

Jointly with their structural/functional nature, abstraction constructs can be defined with regard to purpose:

  • Partitions can be seen as a ground zero, represented by attributes when no additional features have to be managed (e.g. music genre)
  • Powertypes are introduced when partitioned subsets are associated with shared values. In that case powertypes instances are used to manage the shared features (e.g. sections’ musicians)

Subtypes are needed when variants induce different aspects, e.g. orchestras’ instruments:

It must be reminded that these abstraction constructs apply uniformly, independently of categories’ modalities and semantics, ensuring the interoperability of models.

Concepts & Knowledge

The first objective of the concepts view is to introduce a semiotics glue to ontologies. To that effect a distinction is made between terms, represented as nominal instances of concepts (or instances of symbols), and the concepts themselves, which can be mapped to actual (physical or symbolic) instances, or virtual or abstract ones.

An all-inclusive semiotics roof for facts and categories

Concepts can be realized by facts and/or represented by categories (the realization_ connectors are used for both).

Signs & Symbols

The concept view comes with two categories of connectors: thesaurus ones for the semiotic glue (introduced above with the definition of axioms), and reasoning ones for standard operators supporting reasoning across views and modalities.

The primary objective of reasoning connectors is to formally attach facts with concepts and categories through logic expressions or business rules, the former about internal consistency, the latter about external validity. To that effect four constructs can be combined:

  • Standard modeling connectors: connectRef1n_ and connectRef#_(a)
  • Logic predicates: recipeWith() (b)
  • Rules with propositions (instances of predicates): asserted/deontic (c) or inferred/alethic (d)
  • Quantifiers: existential (e)
Reasoning with Ontologies

Thesaurus connectors can also be used to navigate across synonyms, causal chains, analogies and metonymies.

Systems & Knowledge Engineering

Knowledge is all about the limits of the Here and Now, pushing the edges of space and time along its three dimensions; not by chance, these dimensions can also be aligned with Spinoza’s learning modalities:

  • Data, for the record of past and present observations and experience
  • Information, for a reasoned representation of present activities and supporting systems in actual environments
  • Knowledge, for judgment and decision-making about present and future undertakings

Ontological prisms can thus render explicit the intricacy and temporality of observations (data), decision-making (knowledge), and supporting systems (information). That can be illustrated with basic undertakings:

Engineering & the Arrow of Time
  • Business intelligence, to anticipate future changes in environments (virtual facts) from past (actual) facts, independently of established categories
  • Goals and strategies, to design and plan changes from present business models and organization in line with anticipations and objectives
  • Engineering, to design, build, and manage supporting systems in line with strategies and goals

Ontological prisms provide the symbolic gears between these three junctures, and consequently between systems and knowledge engineering processes.

Business Intelligence

Business intelligence begins with business analysis and ends with strategic planning, the former crossing facts and concepts, the latter long-term objectives and current assets.

Business Analysis: Actual & Virtual Facts

On the inception hand a key benefit of using ontological prisms is compliance with privacy regulations by maintaining a clear distinction between between observed data (facts) and managed information (categories). Modalities can thus be used to manage observations and assess frequencies and probabilities; for instance:

  • A dataset for meals ordered in December last year (a) with the frequency of lobsters (b)
  • A predicted dataset for meals to be ordered in December next year (c) with a probability function for lobster orders (d)
Planing next year lobsters

Next, logic modalities would help to reason with facts and to extrapolate hypothetical futures (business intelligence) from historical datasets (data analytics); for instance:
– Standard modeling connectors: connectRef1n_ and connectRef#_(a)
– Logic predicates: recipeWith() (b)
– Rules with propositions (instances of predicates): asserted/deontic (c) or inferred/alethic (d)
– Quantifiers: existential (e)

Reasoning with Facts

As logic modalities are applied differently to facts (heuristics), categories (reason), and concepts (judgment), they bear direct consequences for traceability (systems) and accountability (people and organization).

Goals & Strategies

The semantics of Goal illustrate the importance of modalities and interoperability as the term can be used to characterise activities (Agent-oriented models), intents (Goal-oriented models), requirements (i*), or strategies. Knowledge graphs (KGs) are often seen as a panacea but that’s ignoring the intricacies induced by mixing kinds of targets (outcomes or undertakings), their nature (physical or symbolic), types of agents (individual or collective), and status (actual or virtual): the ensuing exponential complexity is bound to turn even simple issues into unmanageable spaghetti plates; hence the need to characterize goals and strategies in terms of modalities:

Goals and Strategies are bridges between mental and managed representations
  • Objectives refer to virtual objects or phenomenons
  • Goals are objectives set with regard to categories of virtual objects or phenomenons
  • Strategies describe how to turn virtual representations into actual ones.

Transitions from virtual to actual reality must take into account organizational, functional, and technical dependencies:

  • Organizational dependencies reflect decision-making processes as derived from business cases (BC)
  • Functional dependencies as defined by use cases (UC)
  • Technical dependencies are derived from resources and outcomes

Ontological modalities can then be used to detail the key aspects of goals and strategies:

  • Realms (where) and time frame (when): enterprise (facts) or business environment (categories)
  • Agents (who): business environment (customers, providers, competitors, …), enterprise (stakeholders, employees, organizational units, …)
  • Objectives (what): resources (customers’ profiles, bills, recipes, …), processes (promotions, recipes, CRM, …)
  • Projects (how): organization and engineering process
  • Agents’ intents (why)
Ontological representation of goals and strategies:

Taking for exemple a simple Diner architecture, current locations and systems are represented as facts:

Current Diner’s EA

Assuming that the Diner’s M&A strategic goal is to migrate systems in restaurants and warehouses to a service oriented architecture, objectives set at a strategic level could be first defined nominally along with legacy (managed≈) context.

Nominal objectives for Supply chain

Built-in categories for business, user, and technical cases can be used to align specifications and deliverables with business, functional, and technical dependencies:

Business view of projects
Use cases are represented by nominal instances when their development is carried out by external units, by typed deliverables otherwise.

Modalities can also be used to represent functional extensions of actual business processes, e.g. introducing diners’ local menus to an existing purchases application:

On a broader perspective, modalities can be used to represent other agents’ goals; that point may be irrelevant for problem-solving but become central for decision-making involving collaborating or competing parties.

Systems Engineering

From an enterprise governance perspective engineering processes describe the realization of use cases, from organization to operations, with projects defined as realizations of processes.

Engineering

Assuming that developments are carried out through dedicated workshops, projects are built from tasks defined bottom-up with regard to final and/or intermediate deliverables, models or code:

Engineering view of projects

It ensues that, instead of top-down estimates of pre-defined activities, tasks can be monitored through dynamic indicators attached to actual engineering flows:

  • Metrics: assessments can be carried out for business value (use cases), complexity (models), and efforts (tasks).
  • Status, with regard to expectations (projects’ owner) and commitments (projects’ teams).
  • Test plans and test cases

Tests management illustrates the benefits of ontological prisms and modalities.

Tests Management

Test cases are meant to be defined in terms of facts (data) and categories (information):

  • On the one hand (data) inputs (asserted facts) and outcomes (derived facts) are managed through datasets
  • On the other hand (information) derived facts are meant to match instances of categories considered with regard to anchors, objects, aspects, and logic
Backbone of test plan representation

Managing the instances asserted as inputs and derived as outcomes comes with two intrinsic caveats: non-regression and versatility.

Tests, and more generally quality management, are most effective when carried out iteratively and incrementally. That cannot be achieved without ensuring non-regression, i.e. that once successfully tested artefacts are immune to outer changes made through iterations and/or increments, in other words that the same test cases will produce the same outcomes all along engineering processes.

Test cases should also be versatile, i.e. to be rerun all along engineering cycles in different environments, typically: development (unit tests), system (integration), and operations (acceptance). Hence the benefit of using modalities.

Taking reservations for exemple, a test case would use two datasets of virtual instances, one for persons (dset:TestInReservs), the other for expected reservations (dset:TestOutReservs), and one for actual reservations serving as references (dset:TestRefReservs).

Tests Cases & Modalities

Subtypes for virtual instances of persons and reservations are introduced to set apart test and operational environments. These instances can be partitioned with regard to tests’ nature (e.g. structural integrity or functional consistency), context (e.g. business domain or system services), or environment (e.g. development, integration, operations).

The use of ontologies and modalities to integrate test cases with requirements on the one hand, managed categories on the other hand, can be pushed further in support of Test driven development and more generally quality management.

Engineering Templates

Taking advantage of ontological prisms “canned” engineering knowledge can be managed through stereotypes, patterns, and profiles:

Engineering Templates
  • Profiles: shared representation meant to characterize typical statutory, business, or technical environments.
  • Stereotypes: shared set of features meant to characterize typical instances. Contrary to types, and to prevent conflicts with the semantics of modeling languages, stereotypes must be defined as specialization of modalities independently of categories.
  • Patterns: structured set of design categories meant to characterize typical representation issues.

Using the OWL ontological kernel these templates are sketched representations of types (Account≈, CarReservation), environments (RentACar≈), and issues (Reservation≈), respectively:

Engineering Templates

Conclusion: Engineering Knowledge

In contrast to systems’s, EA engineering of must take into account domains-specific knowledge that is by nature open-ended. For enterprises immersed in digital environments, such two-pronged engineering processes entails some gearing between engineering and knowledge, with learning-by-doing serving as glue. The ontological areas defined above for management and engineering can be revisited as to represent the learning processes pairing engineering and knowledge within organizations.

Management areas now refer to symbolic contents:

  • Facts areas, for both actions and observations
  • Concepts area, for individual cognitive processes, including imagination
  • Categories area, for shared and/or collective representations

Taking a leaf from Spinoza, learning can expand knowledge in 3+1 ways:

  • Through observation, typically by applying statistics and/or machine-learning to facts
  • Through reasoning, by applying logic to concepts or categories
  • Through judgment, carried out by people directly or through organizational units
  • Through experience (or implicit knowledge), individual or collective

These learning capabilities provide the gears in cross areas:

  • Communication (observation, experience, judgment), for the matching of facts with symbolic representations and cognitive processes
  • Conception (judgment, experience, reasoning), for the actualisation of social, scientific, or cultural ideas or concepts
  • Realisation (reasoning, experience, observation), for the production of actual (physical and/or symbolic) artifacts

Engineering and learning processes can thus be combined across individual and collective levels.

FURTHER READING