Recently, Zhang Bo, an academician of the Chinese Academy of Sciences and honorary director of the Institute of Artificial Intelligence at Tsinghua University, stated in a speech at the ISC.AI 2024 12th Internet Security Conference that current artificial intelligence lacks a theory; it has only developed models and algorithms targeted at specific fields. Both the software and hardware are specialized, and the market is small. Therefore, up to now, a large artificial intelligence industry has not yet developed, and the problem lies here.

Zhang Bo, Academician of the Chinese Academy of Sciences and Honorary Director of the Institute of Artificial Intelligence at Tsinghua University
Academician Zhang Bo is now 89 years old. Over the past few decades, he has trained a group of artificial intelligence talents at Tsinghua University and is one of the founders of the artificial intelligence discipline in China. Many popular “Tsinghua system” large model companies such as Shenshu Technology, Zhipu AI, Mianbi Intelligence, and Kimi have all benefited from the technical foundation laid at Tsinghua, with core technical talents either directly or indirectly trained under Zhang Bo.
In this speech, Academician Zhang not only pointed out the defects and problems of current artificial intelligence technology but also provided directions for future improvements.
When considering foundational models,
three main capabilities and one major defect must be considered
According to Academician Zhang, due to theoretical limitations, the previous stage of the artificial intelligence industry must develop in conjunction with specific application fields. Therefore, the artificial intelligence developed during this stage belongs to specialized artificial intelligence, or “weak” artificial intelligence. However, he also pointed out that current foundational models have achieved generality in language issues. “When we consider foundational models, we need to consider three main capabilities and one major defect. This is very important and serves as the starting point for our consideration of future industrial development.”
He explained that the strength of large language models lies in their powerful language generation capabilities, strong human-machine natural interaction capabilities, and strong reasoning abilities. “The language generation of large language models belongs to the open domain, capable of generating diverse results that all humans can understand. Even when it is ‘nonsense’, we can still understand what it is nonsense about. This is very important. Humans and machines can engage in natural language dialogue in an open domain. We previously thought it would take generations of effort to achieve this goal, but unexpectedly, this goal was reached in 2020.”
Academician Zhang stated that the defect of large models is ‘hallucination’. “Because we require diverse outputs, errors will inevitably occur. This error is very different from the errors that machines produce. The errors produced by machines are often controllable, while this error is inherent and will definitely occur, and it is uncontrollable. Therefore, this is also a problem we need to consider when thinking about its applications in the future.”
Combining the three main capabilities and one major defect, Academician Zhang summarized the current suitable application scenarios for large models: a high tolerance for errors is required. He stated that from an industrial perspective, the application of large models exhibits a “U” shape—diverse content is required in the planning and design at the front end, while the service and recommendation at the back end also require diversity, with a high tolerance for errors. However, the middle part needs to be considered based on the situation.
Despite the problems, Academician Zhang still stated that “models must be used” regardless of the circumstances, “because once we have the model foundation, the efficiency and quality of applications will definitely improve. In the past, we developed software to provide services on empty computers, and an empty computer is equivalent to being illiterate. Now with large models, the platform is at least a high school student, and development efficiency will definitely improve. This will be the direction in the future.”
Academician Zhang focused on analyzing the fundamental reasons for the occurrence of hallucinations, believing that the fundamental limitation of models is that all work done by machines is externally driven; humans teach them how to do it, rather than them doing it proactively. Additionally, the results they generate are greatly influenced by prompts, which is a significant difference from humans completing work under internal intentions.
Four Future Development Directions for Large Models:
Alignment, Multimodal, Agents, Embodied Intelligence
Academician Zhang introduced four important future development directions for large models that are crucial for improving them.
The first is alignment with humans, “Large models do not have the ability to judge right from wrong, cannot self-update, and are all driven by humans to update. Without breaking through this point, machines cannot self-evolve. Large models need external prompts, so correcting the errors of large models under human guidance is our first task.”
The second is multimodal generation, “Multimodal generation will be very important for the development of the industry in the future. Currently, we see that large models mainly generate text, but if we use the same method to generate images, sounds, videos, and code, the level of generation will be close to that of humans. The reason why we can generate images so well now is mainly because we link images with text. Therefore, the most essential breakthrough is in text processing.”
The third is the concept of AI Agents, “We need to integrate large models with the surrounding virtual environment to let the environment prompt their errors. Because we only know what is right or wrong after doing something, the concept of agents is very important, allowing the environment to prompt the agents and giving them a chance to reflect and correct errors.”
The fourth is embodied intelligence, “By adding robots, we enable large models to work in the physical world. In the future, how to develop general-purpose robots? I believe it should be ‘software general, hardware diverse’. While Musk promotes humanoid robots, I believe that in the future, it will not be limited to humanoid robots.”
In Academician Zhang’s view, to develop the third generation of artificial intelligence, it is essential to establish a theory. The existence of large models lacks a theoretical explanation, which leads to various confusions and misunderstandings. As the scale of machine development increases, the inability of theory to explain it will cause panic. Achieving safe, controllable, trustworthy, reliable, and scalable artificial intelligence technology is essential, and until this area is fully developed, artificial intelligence will always have safety issues.