The growing footprint of generative AI (GAI) in user-oriented applications puts a question mark on knowledge value chains and more generally on the relationship between language and intelligence. The evolution of languages may provide some clues.
An ecumenical approach to artificial intelligence starts with cognitive resources, namely data (facts), information (categories) and knowledge (concepts):
- Naming relevant objects and phenomena in environments
- Developing mental representations of environments, values and objectives
- Developing shared symbolic representations
and the corresponding cross function:
- Communication (facts/concepts): Exchange of meaningful signs or symbols between living organisms (plants included) and machines.
- Classification (facts/categories): Grouping of identified objects and phenomena according to shared features
- Reasoning (concepts/categories): Truth-preserving processing of mental and/or symbolic representations.
Communication & Representation
Evolution set the original distinction between spoken and written languages, as well as the one between communication and representation. For the human species these distinctions have been accompanied by a transition through symbols and mediated communication; which didn’t happen for other species. It thus can be argued that while communication is common to all animal species, combining communication (signs) and representations (symbols) remains specific to humans, and supposedly to human intelligence.
As illustrated by the evolution of languages, representation technologies are polymorphs: alphabets (written languages) have been employed with phonetics (spoken languages), and logograms (mediated communication) have been expressed not only as signs but also as phonemes (conversational communication). Moreover, as epitomised by Kanji, the technologies are interoperable: a common logographic system supporting (written) representations, to be shared between different alphabetic ones for (spoken) communication.
Linguistics have taken a new turn with the arrival of computers as a third agent along humans and nature, with computational linguistics introducing a layered perspective:
- Nominal layer: words used to put names on facts (sounds, images, texts))
- Modeling and/or programming layer: grammars meant to be executed by machines (syntax and semantics)
- Natural layer: lexicons, semantics, and grammars as used by humans (pragmatics)
At first, computational linguistics have considered natural and machine languages as isomorphic layered constructs, until new cognitive developments undermined the mind-as-a-machine illusion, and machine learning technologies replaced a structural isomorphism with an operational one.
Large Language Models
Broadly speaking, Machine-learning technologies tend to replace a mind-as-a-machine paradigm with a machine-as-mind one, using large language models (LLM) as a test-bed. To that end LLM, and more generally GAI, considers words as facts from which meanings can be mined from massive communication datasets, with grammars and canned pragmatics providing the backbones of conversations.
That approach can be better understood when set in its operational context and compared to empiric and formal alternatives (see figure above):
- Empiric (or scientific) approaches use domain-specific syntax and semantics to map facts into categories and models
- Formal (or logic) approaches use generic and truth-preserving syntax and semantics to align concepts and presumptive models with categories
- Generative (or nominal) approaches bypass categories and rely instead on semantic grammars (a combination of syntax and semantics) and pragmatics for the alignment of facts and meanings.
That operational perspective points to an intrinsic caveat of generative approaches: their reliance on implicit contents, as represented by the area below the NW/SE diagonal in the diagram above.
For organizations, learning raises a two-pronged challenge as it must bridge the gaps between individual and collective knowledge on the one hand, between people and systems agency on the other hand. That conundrum can be sorted out by expressing learning in terms of symbolic and non-symbolic knowledge:
- Symbolic knowledge is explicit, represented by models and/or knowledge graphs
- Non-symbolic knowledge is implicit, embodied in individual know-how, collaboration routines, and neural networks
Assuming that symbolic knowledge can be implemented through systems, the objective could be to use languages to process implicit contents into explicit knowledge, typically:
- Empiric languages, which apply statistics and machine learning to actual observations in order to build descriptive and/or predictive models of environments
- Generative languages, which do the same to textual realms in order to build semantic networks
While the theoretical and pragmatic bases of that approach are well established for the system path through empiric and formal languages, that’s not the case for the organizational path due to the generative languages’ lack of symbolic dimension. Hence the flurry of initiatives towards foundation languages meant to harness generative and knowledge graph technologies.
- Signs & Symbols
- Generative & General Artificial Intelligence
- Thesauruses, Taxonomies, Ontologies
- EA Engineering interfaces
- Ontologies Use cases
- Cognitive Capabilities
- LLMs & the matter of transparency
- LLMs & the matter of regulations