How Artificial Intelligence Can Speak Human Language

Click the blue text above to follow us

How Artificial Intelligence Can Speak Human Language

1. Introduction: Why is

“Natural Language Processing” So Important for AI?

In summary, what is called “Artificial Intelligence” (AI) is a discipline that uses computer technology to simulate or partially simulate human intelligence. One very important aspect of human intelligent activities, to put it simply, is the ability to “speak”, which refers to the capacity for flexible thought communication based on a vocabulary and grammar common within a specific linguistic community. For AI research, enabling computers to “speak human language” has the following important theoretical significance: this work will help us understand the position of language ability within an intelligent system from the perspective of “artificial cognitive architecture”, thereby solidifying the connection between AI research and the broader field of cognitive science.

From another perspective, the scientific and engineering efforts to make AI “speak human language” will also bring substantial practical benefits. In summary, artificial intelligence machines that can “understand human language” will be capable of tasks including (but not limited to): (a) email processing; (b) automatic generation of reading summaries; (c) automatic translation; (d) text generation.

In the AI community, computer programming research responsible for completing these tasks is labeled as “Natural Language Processing” (NLP). As the name suggests, the task of “Natural Language Processing” (NLP) is to program computers so that relevant programs can “understand” human natural language.

However, at this point, readers who are particular might ask: “Is the ability to speak a sufficient and necessary condition for being considered intelligent?”

I tend to believe the answer is affirmative. In other words, if an observed object can reach a language level we recognize (i.e., meets the standard of “speaking”), you can infer that it possesses intelligence; conversely, if it is intelligent, you can deduce that it has a relatively high language level.

Some readers might argue that my view of language processing ability as the core of intelligence seems somewhat “logocentric”, neglecting the role of “embodiment” in the composition of intelligence. In other words, these individuals believe that an intelligent agent is intelligent not primarily because it can speak, but because it can move freely in physical space, perceive light, smell, temperature, avoid danger, etc. “Speaking” is undoubtedly secondary.

From my perspective, such criticism does not capture the essence of my argument. On the contrary, I fully acknowledge that “embodiment” indeed constitutes an important prerequisite for “speaking”. In comparison, the pure non-verbal physical behavior of an object often does not allow observers to judge the level of certain abstract abilities of the object. From this perspective, judging the intelligence level of the observed object from the angle of verbal behavior—rather than physical behavior—has its unique methodological advantages. Thus, we can easily deduce that for artificial intelligent agents, their performance level in the NLP field should also serve as an indicator of their overall intelligence level.

2. Does the Current Level of AI Development

Meet the Demand for “Speaking Human Language”?

Currently, smart voice speakers, various human-computer dialogue applications installed on mobile phones, “Baidu Translate”, and “Google Translate” are all significant achievements of this research. However, it should be noted that the apparent prosperity of such products does not mean that current NLP products have reached the level of “speaking human language”. The reasons for this judgment are:

First, machine translation mechanisms such as “Google Translate”, various automatic speech recognition mechanisms, and various chatbot systems are all specific NLP mechanisms designed for different NLP tasks, rather than a comprehensive solution for all NLP problems. In contrast, for a complete natural person, semantic recognition, speech recognition, translation, and other language functions are integrated within one brain, each operating under a unified set of psychological and physiological principles.

Second, for humans, language capability itself is used to “do things”, such as helping language users organize complex information in decision-making activities or persuading someone to take a certain action. Thus, language ability is inherently intertwined with logical reasoning ability, theory of mind, and other cognitive abilities. However, in the current academic division of labor in the AI industry, the relationship between NLP research and modules for common sense reasoning, non-deductive reasoning, and other technologies is relatively isolated.

Third, the application of deep learning technology in the NLP field often relies on the internet to provide large amounts of corpus and learning samples, and these corpus and learning samples are ultimately provided by humans. Such technology finds it difficult to autonomously generate appropriate handling results for input corpus without the support of the internet—whereas humans with normal language intelligence can communicate fluently without relying on internet resources. Therefore, current mainstream NLP research lacks sufficiently powerful “local information processing capabilities”.

Fourth, because current mainstream NLP technology is closely linked to big data information collection, some language materials that are fundamentally difficult to obtain through big data techniques are also hard to be fully processed by current mainstream NLP technologies. In contrast, humans with appropriate linguistic intelligence can quickly understand specific puns, metaphors, and ironic meanings through contextual information extraction or learn a dialect with just a few teachers’ help over a certain period of effort. In this respect, the level achieved by current NLP research is still far from the average level of human linguistic intelligence.

I believe that the problems currently faced by NLP technology are not only based on various engineering issues but also have profound philosophical implications.

3. Why Does Natural Language Processing Research

Need the Intervention of Philosophical Perspectives?

Overall, the relationship between philosophy and NLP research is not fundamentally different from the relationship between philosophy and general scientific and engineering research planning. The task of philosophical researchers is to reveal the assumptions that NLP research has not articulated and to provide reflective judgments on them. Broadly speaking, since NLP research must presuppose certain views about the nature of language, the relationship between the branch of philosophy known as the philosophy of language and NLP research becomes particularly relevant. Among these, the following four questions are especially worth mentioning:

Question 1: What is the relationship between language and the world? Is language representation a modeling of the external world beyond the speaker, or a modeling of the speaker’s own internal world of ideas?

This question obviously involves a major controversy in the history of the philosophy of language. Philosophers with objective tendencies, such as Plato, Frege, Kripke, and Putnam, would tend to believe that the role of language is to become a signifier of external objective things; while philosophers with a subjective idealism color, such as Locke, Husserl, and Oshima Shozo, would argue that the primary task of language is to represent the internal thoughts of the speaker rather than refer to external objects. Such disputes have also led to relevant technical path divergences within NLP. The difference between the two research routes lies in the difference between the “God’s perspective” and the “human perspective”: the objective NLP research route based on the “God’s perspective” presupposes that programmers have sufficient knowledge of at least certain aspects of the external world; while the subjective NLP research route based on the “human perspective” presupposes that programmers only know the reasoning relationships between the representation symbols constructed within the NLP system—whether these reasoning relationships strictly correspond to the causal relationships between elements in the external world is “unknown”.

I support the “human perspective” NLP research route; otherwise, we would have to presuppose that certain knowledge stored in the NLP system about the external world is “immutable”, and due to this presupposition, the NLP system designed would lose its necessary flexibility. Unfortunately, the NLP research based on the “human perspective” is not currently the mainstream.Therefore, philosophers are particularly needed to carry out related “correction” work at the conceptual level.

Question 2: Are the rules in language a priori and immutable, or experiential and variable?

There are roughly three solutions to better define the boundary between “a priori” and “experiential”:

(a) Expand the scope of “a priori”, treating all experiential aspects of natural language grammar as a priori. However, this research approach is difficult to align with the evolutionary reality of empirical grammar and can only be seen as an abstract possibility.

(b) Expand the scope of “experiential”, believing that all a priori grammar can be digested through statistical data. This is the mainstream thought of current deep learning-based NLP research.

(c) Different from the first two, this approach divides “rules” into two categories: some rules are “experiential”, such as various surface grammars of languages; some rules are a priori, such as a certain “deep grammar” that runs through various surface grammars. The linguistic approach based on Chomsky’s concept of “universal grammar” and the NLP research influenced by this approach adopt this thinking.

I personally advocate a modified version of (c) that leans towards empiricism. My common ground with Chomsky is that we both believe that the ultimate explanation for the composition of all languages can appeal to a unified set of grammatical categories; however, the difference between me and him lies in that while he believes that since the linguistic phenomena to be explained vary in complexity, the grammatical categories used to explain them must leave enough “redundancy” at the “complex” end and thus become a “universal grammar” equipped with all grammatical switches, I do not agree with this judgment.

Question 3: What is the relationship between language and psychological architecture?

Currently, NLP researchers are concerned with establishing suitable mapping relationships between certain specific types of corpus input and output, rather than how these linguistic phenomena emerge from psychological cognitive architectures. In my view, “merely studying language at the level of verbal behavior” is superficial, as the phenomena at the level of verbal behavior are too complex, inevitably leading to expensive data collection and modeling costs; however, if we change our approach and view complex verbal behavior as “different corresponding outputs generated by a more general mental architecture under different external environmental stimuli”, we can greatly reduce our modeling costs and reserve logical space for related systems to automatically upgrade under specific external conditions.

However, this research approach will inevitably elevate the mainstream NLP research path to a grand general artificial intelligence research plan. This kind of research roadmap with a holistic thinking pattern might make some researchers feel despair, as the typical operational mode of AI research is to propose engineering developments targeting specific application scenarios and extend relevant research results to other application scenarios—while the research approach I advocate is to first suspend all technical application scenarios, clarify the general characteristics of intelligent reasoning at the philosophical and scientific level, and then consider the technical application issues.

Question 4: To what extent does the cognitive architecture theory required for natural language processing need to be “embodied”?

We will single out this question to clarify the following issue: Is this kind of “embodiment” work essentially significant for NLP research, or merely marginally significant? To be clearer, in NLP research, do architects need to pre-consider what sensory-motor devices the relevant AI body will be matched with and reserve some important “slots” in the NLP interface for such devices? Or: Do architects not need to care about what sensory-motor devices the relevant AI body will be matched with at all, and can completely assign such considerations to experts in other fields? This “either-or” question in modern philosophy manifests as: Can human rational ability operate relatively independently while suspending various sensory abilities? The answer “no” to this question represents the empirical viewpoint (whose engineering counterpart naturally emphasizes the continuity between NLP interface design and external device design of AI bodies), while the answer “yes” represents the rationalist viewpoint (whose engineering counterpart naturally emphasizes the separability between NLP interface design and external device design of AI bodies).

My answer to this question is neither purely rationalist nor purely empiricist, but carries a Kantian reconciliation flavor: In my view, there is an important intermediate layer between pure conceptual construction and underlying sensory information that both rationalists and empiricists overlook, which is the intuitive form of space-time relationships. On one hand, this intuitive form evidently possesses certain pre-conceptual qualities (for example, the perception of space in a room cannot be reduced to a geometric description of the related space), while on the other hand, this intuitive form has a certain abstractness targeted at various sensory pathways and is therefore closer to concepts (for example, the internal spatial form of a classroom perceived by a blind person still has a high degree of overlap with that seen by a normal person).

Conclusion

From the discussions completed in this “introduction”, it is indeed the case that the research on NLP issues has indicative significance for the entire AI research. However, the philosophical aspects of understanding this issue have not been fully recognized by the NLP academic community. Rather, it can be said that the topics of research in the current NLP academic community are entirely guided by incidental engineering or commercial demands, lacking overall planning at the philosophical (and even scientific) level. This “two skins that do not interfere with each other” state is clearly unsatisfactory.

END

This article is selected from “Communications of Natural Dialectics”, Volume 44, Issue 1, 2022

Scan to Follow

Leave a Comment Cancel reply