Since the launch of ChatGPT, generative artificial intelligence has become a hot topic in the field of AI development. The fundamental form of generative AI technology involves training large AI models using massive amounts of data, allowing them to learn the statistical patterns of human language texts and images/videos, and automatically generate the required digital content when prompted by users. Compared to traditional AI, which is limited to specific functions like classification and recognition, generative AI focuses more on the generation of creative content, showcasing unprecedented capabilities in text dialogue, document drafting, code writing, image creation, voice synthesis, and video generation.
Generative AI technology integrates over 60 years of AI research achievements, especially the breakthroughs in deep learning over the past decade. Firstly, it can learn human knowledge from vast amounts of corpus data, memorizing the intrinsic rules and statistical patterns of text, speech, images, and videos, and automatically generate new text, images, sounds, and videos as required by users. Secondly, it realizes a learning mode for large models based on extensive data training to learn general features and fine-tune for specific tasks to optimize performance, while also being able to accept human feedback for further reinforcement learning. This means it can not only memorize general human knowledge but also continuously learn specialized knowledge and skills, outputting content that conforms to human ethical and legal constraints under human reward guidance. Finally, it possesses a multimodal data fusion mechanism, dynamically integrating text, speech, images, and videos, generating richer and more diverse digital content, and supporting AI to interact with humans like a person through recognition of voice, gestures, and facial expressions. In summary, generative AI has exhibited multimodal and generalized cognitive and interactive intelligence, demonstrating a powerful ability of “from specialized to generalized,” showcasing significant advantages in various fields such as automated document generation, automated programming, intelligent customer service, supply chain management, product research and development, smart education, and smart healthcare, thus becoming a popular application in many sectors.
Although generative AI has shown unprecedented cognitive and multimodal interaction capabilities in the past two to three years, its inherent limitations are becoming increasingly prominent. How to continue to promote the development of generative AI technology to enable its widespread practical application has become a focal point of concern.
Firstly, the large models on which generative AI relies have obvious limitations in precise cognitive understanding and logical reasoning. On one hand, due to the hallucination phenomenon of large models, they are prone to output factual errors. On the other hand, the content generation of large models essentially belongs to probabilistic statistical word prediction, making it unable to perform long-chain dynamic logical reasoning like humans. These two issues imply that it is challenging to directly embed generative AI into practical business scenarios. Therefore, addressing the need for practical applications, eliminating the cognitive hallucinations of generative AI, and enhancing or compensating for its logical reasoning capabilities become key issues for its further development.
Secondly, generative AI faces bottlenecks in scaling up efficiency. In the development of large models, there is a power-law relationship between model performance improvement and model parameter scale, meaning that the larger the model and the more training data input, the stronger the model’s predictive capability. Therefore, many companies believe that simply expanding the model scale will achieve general artificial intelligence in the near future. However, on one hand, over the past five years, the parameter scale of large models has exhibited an exponential growth trend, leading to an increasing demand for intelligent computing power. The largest model, GPT-4, has reached a trillion-level parameter scale, requiring a cluster of thousands of GPU cards for months of model training and parameter tuning. Some companies are still expanding GPU cluster scales, investing in the construction of clusters with tens of thousands or even millions of GPU cards. Building and operating such a large-scale intelligent computing cluster requires overcoming energy consumption challenges such as power supply and heat dissipation. On the other hand, high-quality and high-density data corpora will also become factors limiting the further expansion of model scale. It is well known that improving the performance of large models requires a large amount of high-quality training corpus, which currently mainly comes from the aggregation of publicly available internet data. Reports indicate that by 2028, large model training is expected to exhaust all publicly available internet data resources, inevitably leading to a data crisis for large model growth. Therefore, it is necessary to focus on vertical fields, deeply mine private data, and expand high-quality data sharing to better support large models in adapting to the needs of vertical fields.
It can be seen that if generative AI attempts to achieve an absolutely general intelligent model solely by expanding model scale, it is not a sustainable technical route in terms of both technology and economic cost. It is necessary to focus on the business logic and practical scenarios of vertical fields, purposefully leveraging the strengths of large models in business processes to compensate for their shortcomings, achieving a combination of general and specialized technical routes, and realizing the widespread landing and value empowerment of large models in the industry. Specifically, this means targeting the key and challenging demands of the field, constructing various specialized small models, and combining them with the base large model to create specialized intelligent entities adapted to the field, thus enabling and upgrading generative AI in existing systems.
The development of generative AI has led to increasingly prominent ethical and security risks, with the emergence of cognitive abilities and inherent flaws in large models posing more challenges for social governance of AI. Firstly, generative AI systems based on large models lack reliable safety barriers, making it easy for them to output sensitive information or content with erroneous values when subjected to attacks. Due to the “algorithm black box” characteristic of large models, the interpretability and transparency of their functions and behaviors are challenging issues that require in-depth research. Secondly, the various derivative risks arising from the widespread application of generative AI are numerous, with the most prominent being the governance issues of deep synthetic content. The digital content generated by large models often reaches a level where it is difficult to distinguish between true and false, providing new technical means for online fraud and the spread of false content. Therefore, there is an urgent need to systematically implement content identification, watermark verification, etc., and establish a feasible AI content traceability management platform to ensure a healthy digital ecological environment. Finally, the agile governance of generative AI research and application will move towards systematization and rule of law. Currently, both domestically and internationally, there is active development of legal and regulatory systems for generative AI, aiming to effectively prevent various potential and real risks brought by this emerging technology while promoting and regulating its healthy development. Therefore, tiered agile governance has become a hot topic of current research, focusing governance efforts on the development and service of ultra-large-scale neural network models while adopting simpler agile governance models for the development activities of massive small and medium-sized models, thus achieving a balance between the advancement of new technology and effective governance.
The author is a professor at the School of Artificial Intelligence, Beihang University
Statement:This content originates from the internet, and the copyright belongs to the original author. If copyright issues arise from reprinting, please contact us for removal!
(This content originates from China Information Security)
Public Account:Hangzhou Network Security Association
WeChat ID:hzswlaqxh