In the face of the incredible intelligence exhibited by the GPT model, we need to correctly understand the profound impact brought by this breakthrough in artificial intelligence.
Zhou Hongyi, founder of 360 Group, recently elaborated in a live stream on four incredible capabilities exhibited by the GPT model: emergence, hallucination, language transfer, and logical enhancement. These phenomena seem to herald the arrival of a true era of general artificial intelligence. Many criticize large models because when there is no definite answer to a question, they randomly generate a seemingly serious response. He believes that the ability to generate nonsense is precisely the watershed of intelligence, and many new GPT brains in the future will need to retain this ability to hallucinate to some extent.
To prove that GPT truly possesses intelligence, I will illustrate four inexplicable phenomena. Even the Google scientists who created the Transformer model and the OpenAI scientists who developed ChatGPT, including the experts from Microsoft who conducted tests, only know these conclusions but cannot explain why they occur.
The first is called emergence. This refers to the sudden exponential improvement in the model’s reasoning ability. During the small model phase, many AI companies trained some reasoning models with too few parameters and too small capacity; you could understand it as having a small brain that could only train its search ability without truly training its reasoning ability or the ability to form a chain of thought.
However, during OpenAI’s training of GPT, everyone suddenly discovered that at the stage when the model parameters reached 100 billion, the entire model’s reasoning ability suddenly had an exponential improvement, allowing it to start answering multi-step reasoning questions:
For example, how to put an elephant in a refrigerator. Humans would break it down into three steps: first, open the refrigerator door; second, shove the elephant inside; third, close the refrigerator. It requires multi-step logical reasoning to arrive at the answer, which is a crucial human thinking pattern. This powerful reasoning ability with a strong chain of thought is not based on search but on parameters.
From the perspective of evolution, from apes to humans, there was a leap in intelligence during human evolution, which must have occurred because at some stage, the number of neural units in the human brain increased. Just like how parameters increased and stimuli increased.
When we train large models, as the parameters miraculously increase, artificial intelligence gains new strong reasoning abilities, and this ability becomes increasingly strong, resembling the process of human evolution.
The second is hallucination. In essence, it refers to nonsense. Many criticize large models because when there is no definite answer to a question, they generate a seemingly serious response at random.
From a certain perspective, this is indeed a disadvantage in specific situations. Moreover, since search can only find existing things, this disadvantage can be technically resolved through further searches and knowledge base corrections.
But have you ever thought about why it produces nonsense? Doesn’t this precisely indicate its intelligence? I remember that in “Sapiens: A Brief History of Humankind,” it mentions that during human evolution, the watershed between humans and animals is that humans are capable of generating nonsense.
Humans can depict non-existent things. If you tell a gorilla about three apples, it can learn. But if you describe, “Tomorrow, I will give you three apples,” the gorilla cannot comprehend something that has not happened. It is precisely because humans have the ability to fantasize and predict non-existent things that we have communities, religions, and groups.
Including the creativity that humans take pride in, is it born from nothing? In fact, most of the products we create are a fusion of two unrelated concepts, resulting in innovation. Of course, perhaps 99% of the things that come from combining two unrelated concepts are nonsense, but maybe 1% is a typical innovation.
This ability to fantasize is something that will always be difficult to eliminate; it is unrelated to your training data. Just like tonight, when I decided to do a live stream, maybe an hour ago I was unwilling to do it. But an hour later, I changed my mind. This is quantum entanglement, where a random die is cast.
The ability to generate nonsense is precisely the watershed of intelligence. I believe that many new GPT brains will need to retain this ability to hallucinate to some extent in the future. Because when it answers factual questions, I do not need its hallucination. But when it writes novels or scripts for me, I need this ability.
By the way, every night when you dream, you generate nonsense. Because in dreams, your neural networks short-circuit, combining two concepts that would not normally meet during the day. For example, last night I dreamed that Luo Zhenyu was chasing me; this is the connection of the neural networks of Luo Zhenyu, chasing, and me that produced this dream.
The third inexplicable phenomenon is the transfer of language ability. Before the emergence of large models, we were all working on AI translation, but the translations worldwide were not very good. The reason is that the rules of various languages are different; for example, Chinese has word segmentation, Arabic is written from right to left, and the Latin language family cannot communicate with our pictographic characters, etc.
However, during the forging of the large model, OpenAI’s training used 95% Latin text and only about 5% Chinese corpus. As a result, a strange phenomenon occurred: the logical ability, reasoning ability, and knowledge ability learned while learning English were well reflected when applied to other languages. When you use ChatGPT, it responds in Chinese and often does a good job. This phenomenon is also quite interesting.
I speculate that within the large model, although Arabic, Chinese, Japanese, and Latin languages seem different, they are all symbolic things invented by humans to describe the world. Behind different human languages and representations, there must be some common rules. We language learners may not have discovered this, but ChatGPT has been trained to realize it, thus achieving language ability transfer.
The last one is logical enhancement. One very important function of ChatGPT is learning to write programs, which is its area of expertise. Because it is originally a symbolic system, a language model.
Computer languages are the purest. In contrast, our human natural language has ambiguity and polysemy, making it the most complex. In different contexts, based on different understandings, it can express different meanings. For example, when a bus announces, “The front door is about to arrive; please get off from the back door,” should I get off from the front or the back door? Such examples are numerous.
However, everyone has noticed that after allowing ChatGPT to learn billions of lines of GitHub code, the logical sense it learned while programming surprisingly applies to natural language as well. Its logicality rapidly improves when it answers questions in natural language.
Many parents ask me, “With ChatGPT being like this, should children still learn?” My answer is, of course, children should learn. If you do not learn, your brain will not grow new neural network connections; your brain will be brand new without any feedback.
Now, children learning programming may not necessarily work in programming when they grow up; even the profession of programmer may change in the future. But the improvement in logical judgment and expression ability through learning programming is certain. This has also been verified by ChatGPT.
Using GPT well involves an important aspect called prompting. When your prompt is poorly given, GPT may randomly pick a piece of text to dismiss you, but if the prompt is good, challenging, and critical, it will yield better results.
This ability to provide prompts also needs to be cultivated. Just like when I do live broadcasts, I like to find someone to interview me. If I talk endlessly by myself without input and prompts, my cortex becomes inactive as I speak. If there are audience members willing to give me some challenging and critical questions, it will stimulate my desire for debate or my thoughts on discussion, and I will talk more.
In summary, I have presented many viewpoints to help everyone have a correct understanding of the GPT large language model. Andy Grove, the founder of Intel, mentioned in his famous book “Only the Paranoid Survive” that any industrial revolution does not arrive with a bang; it appears as a weak noise signal.
If the GPT large language model symbolizes a massive revolution, you must not make a judgment error. If you think this thing is just a model from twenty years ago, just a Bayesian function statistic, or an insignificant “fill-in-the-blank machine” or neural network application, you may be making a cognitive error.
How GPT is specifically used is a technical issue, but the most important core strategic question is whether you recognize GPT as a strong artificial intelligence and whether its emergence represents the arrival of a super artificial intelligence era.
I want to add a viewpoint about general artificial intelligence. First, in the process of natural language processing, basically all other processing methods will be replaced by this large language model based on the Transformer decoder. GPT-4 has added multimodal capabilities; it can understand images and hear sounds.
In the past, speech recognition had unique algorithms, and image recognition also had its own algorithms. These algorithms are based on deep learning networks, CN, RN, DN, which are more like human visual neural networks and remain at the perception level.
However, today’s large language model simulates the working principles of human brain neural networks; it has reached the cognitive level, which is a completely different level.
It can understand the world because it can recognize. In the past, facial recognition merely ID-ified a photo; just like a facial recognition camera recognizes Zhou Hongyi, it merely compared it to a stored photo of Zhou Hongyi in the database, achieving image encoding without the cognitive ability to understand Zhou Hongyi’s 360 company, digital security, artificial intelligence, etc. The large language model will completely overturn these algorithms.
The chief scientist of OpenAI profoundly stated that when you have established a complete understanding of the world’s knowledge using the large language model, your ability to recognize photos and objects on this basis will be completely different. This is the second layer of meaning of general artificial intelligence.
As we know, artificial intelligence has encountered problems in many fields, such as controlling robots, humanoid robot walking, action control, and autonomous driving.
Why does autonomous driving encounter many problems? Because many of its algorithms are pieced together from many traditional AI algorithms in fragmented vertical domains, with rules, perception layer obstacle recognition, and object recognition; they are not unified, and there are always many problems that need to be learned and labeled. Once it encounters situations that cannot be labeled or self-learned, its capabilities will be greatly limited.
In the future, as the capabilities of large language models further improve, it can truly simulate the driver’s cognitive ability towards the world. It is possible that large language models will eventually overturn today’s autonomous driving algorithms, and perhaps using multimodal processing of large language models can make what we now consider L4 or L5 level true artificial intelligence driving a reality in a few years.
This is also why we define today’s GPT as general artificial intelligence; it changes the past fragmented approach of dividing artificial intelligence into 100 small tasks and solving them with 100 small models. It uses a large model to comprehensively encode, index, and reason all human knowledge, thus establishing a complete understanding of the world. This is the third layer of general artificial intelligence.
INFORMATION CHINA
Long press the above QR code to follow
“China Information Industry” magazine (CN11-4721/TN; ISSN1671-3370), founded in 2003, is supervised by the National Development and Reform Commission, guided by the National Expert Advisory Committee on Informationization, supported by the National Information Center, and hosted by the China Information Association. It is one of the few national-level reference publications for information decision-making, published bimonthly.
Strategic Cooperation Contact:
010-82893346