AIGC: Not Just a Byproduct of the Metaverse

“When did you start noticing humans?”

“When the first primitive man started gazing at the starry sky.”

The AI simians have long begun to gaze at humanity.

Scrutiny from Machines

In the past two years, algorithm practitioner Wang Chaoyue has been shocked by AI on two occasions.

The first was in March last year when OpenAI launched the AI painting product DALL·E. By simply entering a sentence into the computer, DALL·E could understand it and automatically generate a corresponding image, which was unique and a world premiere.

All cross-“tribe” communication is a mutation of civilization, and the response from unknown machine systems leaves humans feeling as shocked and curious as encountering a UFO. In a modern society where the distance between people is increasing, machines seem to be able to read a person’s inner thoughts.

“You can clearly feel its progress compared to GAN (a type of AI generative network that appeared in 2014). The technology behind DALL·E is revolutionary,” Wang Chaoyue told Lei Feng Network.

The second time was in April this year when Google released the 540 billion parameter model PaLM. With the increase in parameters, PaLM’s text understanding and logical reasoning capabilities have significantly improved, even being able to explain jokes in text and tell readers why a joke is funny.

Before this, the most common phrase used to mock AI was: this AI model’s reasoning ability is weak, like a 3-year-old child. But as large models have developed, they can now perform arithmetic and logical reasoning, their mental capabilities approaching or even exceeding those of humans. “For example,” Wang Chaoyue illustrated, “there are many jokes that I cannot understand at the moment, but it can explain them to me, indicating that in some language understanding tasks, it understands better than I do.”

Wang Chaoyue is a senior researcher in generative AI, having focused on AIGC-related research since the release of GAN in 2014. At that time, GAN was a hot topic in deep generative network research, but its popularity pales in comparison to the significant breakthroughs in AIGC over the past two years. The two technologies mentioned above have become the “catalyst” igniting the AI community’s celebration in the second half of this year:

The key technology CLIP behind DALL·E allows text and images to find a dialogue point, becoming the cornerstone of groundbreaking AI achievements like DALL·E, DALL·E 2.0, and Stable Diffusion; while language large models like PaLM, although costly, have made remarkable progress in understanding human language, which is a prerequisite for AI to comprehend humans.

“The breakthroughs in AI technology over the past two years have been incredibly rapid,” said Lan Zhenzhong, founder of Xincheng Technology (the Dream Thief team). He often feels excited and thrilled while reading papers: “After CLIP came out, MAE (a type of AI paradigm proposed by He Kaiming’s team that can transfer capabilities excelling in language tasks to visual tasks) followed, and then there was Stable Diffusion…”

Since the launch of Stable Diffusion in August this year, Lan Zhenzhong and his team quickly caught up, launching the AI painting product “Dream Thief” in less than a month, which rapidly gained popularity in China, producing images in as little as one second with very high quality, achieving a daily retention rate close to 50% (higher than 90% of mini-programs), and receiving large To B orders in less than two months.

AIGC: Not Just a Byproduct of the Metaverse

Images generated by “Dream Thief”

On the second day after the launch of “Dream Thief” (September 1), the first domestic “AIGC White Paper” was released at the World Artificial Intelligence Conference (WAIC) in Shanghai. Wang Chaoyue participated in the writing of this white paper and led the organization and outlook of the AIGC technical system.

The release of the AIGC white paper attracted the attention of many peers attending the conference. Not only researchers in the field of artificial intelligence but also practitioners in the metaverse field:

“At that time, Sequoia Capital’s article on generative AI had not yet come out, and everyone still didn’t know what AIGC was. This indicates that the importance of digital content generation is a consensus in the industry.”

Everything quickly followed: technological breakthroughs brought about application prosperity, with Midjourney gaining popularity overseas, and the wave of text-to-image generation made people see more previously niche AIGC branches, such as text generation, video generation, and music generation. Industry insiders were surprised to realize that companies like Jasper.ai overseas had already validated success in commercializing.

Following the previous generation of perceptual intelligence primarily focused on recognition and detection, the “creative intelligence” used for generation and editing has become the new darling of capital.

More surprisingly, this wave of AIGC craze has also attracted the attention of many outsiders, such as self-media KOLs, illustrators, and content creators. Some are panicking, writing manifestos; others are delighted, hoping to embrace cutting-edge technology.

But regardless of people’s acceptance, an irreversible trend is already happening.

The Great Age of AIGC Has Begun

In 1519, an expedition fleet set sail from Spain, heading west, marking the beginning of the great age of navigation for human civilization.

Latter global historians mention a navigator named Magellan, as well as his initial curiosity about sailing: is the Earth flat or round? — Magellan was an advocate of the spherical Earth theory; if the Earth is flat, it proves that navigation cannot succeed; but if the Earth is round, he would eventually return to the starting point.

In 1950, another scientific explorer named Alan Turing had a similar curiosity: can machines react consciously based on human behavior? — He proposed a famous test known as the “Turing Test,” which opened the era of artificial intelligence research.

Now, AI researchers seem to have gained a similar desire and enthusiasm for exploration in the technical exploration of AIGC. They want to know: can machines understand human thoughts and logic, creating from 0 to 1?

The answer is: after nearly a decade of technological development, they believe it is possible and are confident that the current exploration of AIGC has reached an engineering stage.

Like Magellan’s voyage, the purpose is now clear, and the map for navigation (theories and frameworks) is taking shape; next is to verify whether the technical route can reach the destination.

Taking text-to-image generation as an example. Although AI’s ability to paint based on textual descriptions is still not perfect — for example, different text prompts may yield varying quality images, and its understanding of long texts can be insufficient, leading to incomplete images due to missing keywords — these are specific research problems, and solving them is only a matter of time.

Why do we say that the map of AIGC has been outlined? This is mainly due to three aspects: large models, multimodal capabilities, and controllability.

In 2020, OpenAI launched the 175 billion parameter pre-trained language model GPT-3, which sparked a wave of research on large models both domestically and internationally. From that time, AI’s language expression and understanding capabilities began to make significant leaps. It was also from that time that AI began to write decent articles in a very short time.

In fact, a wave of companies focused on text generation, such as Jasper.ai and Copy.ai, emerged overseas around that time. These companies developed machine auto-writing platforms where users input keywords, and AI can write a long article in just a few minutes, logically and expressively comparable to human writing, substituting a large portion of labor in the writing process and generating commercial value.

However, since OpenAI does not provide access to the GPT-3 interface for Mainland China and Hong Kong, it has been difficult for domestic AI researchers to utilize it, and related applications for text generation have not gained traction domestically. In the past two years, although many large companies and universities in China have also begun researching Chinese large models, progress in open-source remains slow, leading to many AI developers being halted by high training costs, restricting the development of AI applications based on the Chinese language.

In this wave of AIGC, AI large models have played a key role in understanding human language. Thanks to the development of large models, not only has the effect of text generation improved significantly, but text-based image generation has also made great progress compared to the GAN era.

Wang Chaoyue told Lei Feng Network that while writing the “AIGC White Paper,” there was actually some internal debate: should the title be “AIGC” (AI-Generated Content) or “Generative AI”? In the end, Wang Chaoyue voted for AIGC because “generative model” is a proprietary academic term, generally describing how a model fits a specific distribution, such as GAN.However, what DALL·E 2.0 does has, in a sense, exceeded fitting a certain data distribution, demonstrating universal image generation capabilities.

For example, the most famous application of GAN is face generation: the model looks at a vast number of face photos, understands that faces are a type of distribution, and then learns the features of faces. In 2014, when there were no methods to generate high-dimensional data images, GAN was a strong generative method, but its limitations were also essential:

First, it requires a specific dataset (like faces), and its generalization ability is poor. For example, after GAN was released, it was used to train various facial effects, but a single GAN cannot train multiple facial effects; to change to another effect, a new GAN must be trained; second, GAN’s performance in controlling image generation through text descriptions has not been satisfactory, which greatly limits its potential to become a controllable universal structure.

In contrast, the DALL·E (and later DALL·E 2) released by OpenAI uses a universal model: a large language model capable of processing multiple language tasks, combined with the CLIP model that bridges text and image modalities, and a diffusion model controlling image generation, which can further produce combinations of concepts and elements while ensuring authenticity, generating more complex scenes.

For example, AI can edit images based on textual descriptions, considering factors such as shadows, projections, and physical surface textures when adding or moving image elements. For instance, if a human specifies to generate flamingos at the location shown in image 3, AI indeed generates two flamingos outside the glass, casting shadows:

When specifying to generate flamingos in the pool’s center (as shown in image 2), AI will automatically create an image of a flamingo swimming circle that adapts to the pool environment:

The multimodal research of text and images can generally be divided into three stages: 1. Image description (letting computers describe what is happening in a picture); 2. Image question answering (given an image, asking what items are on the table in that image. The robot needs to understand the question and then comprehend what items are in the image); 3. Generating images from a sentence (allowing the robot to paint based on a sentence description).

One important contribution of multimodality is the data source: it provides paired training data of text and images, which are also crucial materials for helping AIGC models learn cognition.

Previously, representatives of stages one and two included AI-generated movie commentary on short video platforms and intelligent dialogue robots, while in the third stage, machine systems must understand human language, common sense, and the laws of the physical world, or they cannot perform human-controlled cross-modal creation. However, products like DALL·E, Midjourney, and Dream Thief have already demonstrated breakthroughs in understanding humans and the world.

Numerous research experiments have shown that when models are large enough and training data is abundant, AI can gradually comprehend abstract concepts of human language (such as common sense and rules). Wang Chaoyue, during his doctoral studies under Tao Dacheng, and their team have repeatedly proven through capacity analysis of models that large models perform better in learning general knowledge and understanding generalization.

This is a capability that previous generative models did not exhibit. This also determines that AIGC is not just about generation but is an application ecosystem built on cognition and understanding. When AI possesses basic cognition and understanding, and machines think and create like humans, it is no longer a mirage but a reality that is happening.

Commercialization: Erupting in Silence

The composition of activities in modern society is essentially a series of digital content: voice, text, images, videos… and AIGC can provide basic elements for creating this content.

In fact, AIGC (Artificial Intelligence Generated Content) has always existed, but it was only this year that it was enthusiastically embraced by domestic capital. One reason is the maturity of technology, and the other is that capital, which previously focused on visual AI commercialization, turned around and discovered that overseas NLP companies like Jasper.ai have started to generate significant profits.

Due to the advantages of creating digital content, AIGC technology has also been listed as a tool for building the future metaverse by enthusiasts who followed the metaverse craze over the past year. However, behind the hype, many AIGC practitioners believe that AIGC can create the next generation of digital worlds faster than the metaverse, a completely new track belonging to AIGC.

The reason behind this is the essential difference between the technology AIGC relies on and the current metaverse: for example, in graphics (the key technology for creating digital humans), graphics focuses on simulation and replication in content generation, while AIGC emphasizes originality and creation. Creating digital humans from graphics requires a real person as a reference, but AIGC generates voice, text, and images from 0 to 1, unprecedentedly.

For instance, in the film “The Unbeatable,” AIGC is like Zhang Jingchu, while graphics are like Aaron Kwok.

Because AIGC in writing and painting, every word and every pixel is meticulously calculated, the images and articles created by users based on AIGC models are all unique in the world, absolutely original contenders.

The essence of digital content + the unique originality determines that the capacity of the AIGC track is sufficiently large, with the former suggesting it can be developed into standard products like internet content platforms or products, while the latter means it can achieve market recognition comparable to that of human creators.

Taking text generation as an example, NLP companies like Jasper.ai have incubated a new profession overseas called “AI soul writers”:

Human users input titles and keywords on AI text generation platforms, and AI generates a long article, which people then modify and sell to companies needing a large number of high-quality articles for search engine optimization, earning the difference between selling the works and subscribing to AI products.

The profit model for image generation is similar: for example, overseas, users subscribe to Midjourney’s membership, generate exquisite images with AI, and then sell the images to stock photo agencies like iStock, earning the difference.

Since Google Search favors original articles, and AI-written articles are entirely unique and not mere information compilations, Google gives more traffic to such articles, improving their search rankings.

This has also enabled companies like Jasper.ai to quickly gain market share. Jasper.ai claims that as of September this year, their revenue exceeded 40 million USD last year, and this year is expected to double, with paid users reaching 70,000 and a valuation of 1.5 billion USD. It has only been 18 months since Jasper.ai was established.

Lan Zhenzhong told Lei Feng Network that the articles written by AI now have high readability. They once opened a WeChat public account and used a Chinese large model to write horoscope predictions, receiving considerable readership, with even readers commenting, “Editor, you must be a Capricorn, you understand me so well.” Besides Dream Thief, their text generation tool “HeyFriday” has also gained several thousand paid users overseas in a short time.

Rapid growth is also seen in the image generation field. Data shows that within just three months of its launch, the overseas AI painting product Midjourney has registered over 3 million users. According to exclusive information from Lei Feng Network,the number of images generated by Dream Thief has reached 10 million in less than two months since its launch.

Many industry insiders have stated: “Simply put, the core of the internet is traffic, and the core of traffic is content. The essence of AIGC is a technology for producing content.”

This also means that compared to the previous generation of visual AI that needs to be combined with terminal hardware or the vast worldview of the metaverse, AIGC’s commercialization is more concrete, with lower investment costs and faster profitability. An even more radical viewpoint is that AIGC can emerge as a “content generation platform” comparable to or even surpassing existing internet content platforms (such as Xiaohongshu and Douyin) backed by traffic.

Images generated by “Dream Thief” based on user descriptions

In the current context of high content demand, the transformation in content production methods brought by AIGC is also beginning to change content consumption patterns. The market that respects originality is starting to revere AIGC.

The latest reaction comes from stock photo companies:

At the end of October, the well-known overseas stock photo agency Shutterstock announced its collaboration with OpenAI, allowing users to input text to instantly generate original images that meet their needs. (In fact, many industry insiders also believe that in the wave of AIGC, material libraries and image editing software are the first industries to be eliminated or replaced.)

This collaboration is not only a timely reflection of a traditional industry but also signifies that the commercialization of AIGC seems to be starting to take shape: creating a new content platform based on generation.

Many people do not know what this means, but in the eyes of some, the influence of AIGC has begun to shift from serving individual users to serving actual industries — current content platforms mainly rely on keyword search and recommendations, while with the introduction of AIGC, the content consumed by users will come from AI’s understanding of users. Content based on recommendations comes from a limited material library, while generated content is endless…

The authorities of AIGC, although they are the creators of this track, are still astonished by the speed and creativity of machines that can match or even surpass humans.

ZMO.AI’s founder Zhang Shiying said: “For example, in today’s short video platforms, recommendations present limited content created by creators to you, while generation means every consumer of content is a creator. Consumer feedback on content can help AI better understand what you want, what you like, and AI will generate what you seek. This will be real-time updated and infinite.”

ZMO.AI is one of the earliest established AIGC companies in China. Unlike products like Stable Diffusion that excel in artistic image generation, ZMO.AI has chosen the track of real-world image generation, such as design. They initially focused on going overseas, and their product imgcreator.ai has achieved rapid growth with 320,000 monthly active users.

They believe that AIGC is not only a production tool for digital entertainment content but also has a significant positive impact on many actual industries. This track is large enough, and there are many things that researchers and entrepreneurs can do. (Stability.AI’s founder Emad Mostaque has also said similar things, believing that the AIGC track is larger than new energy.)

Just in terms of images, current materials mainly rely on shooting, which is both inefficient and expensive. For example, in e-commerce platforms, the current model for clothing launches requires offline shooting, involving makeup artists, clothing stylists, photographers, models, etc. In the future AIGC world, they hope to use AI to directly generate images of models showcasing clothing. Currently, their AIGC product “YUAN” mini-program has achieved astonishing results in image editing:

Compared to artistic style generation, generating real, photographic-style images is more challenging but has a tremendous impact on actual production and life. For example, in the design industry, from posters, PPTs, and web pages to all product packaging and illustrations that require high originality, there is a place for AIGC.

Not to mention replacement, many designers are already using AIGC products to streamline their design process in the preliminary draft stage. Zhang Shiying shared an example of architectural design: they collaborated with an architect to design a concert hall approximately 25 meters high —

Before AIGC, architects would first sketch with a pencil, and once satisfied with the sketch, they would create a colored pencil version. After being satisfied with the colored pencil version, they would then create a 3D visual effect drawing for the client, and once the client was satisfied, they would design the building’s internal engineering structure, etc. With AIGC, they can now greatly save time from the first step, quickly generating the architect’s ideas with AI and sending them to the client.

“When AI writes a few sentences or edits a few images, you might think it’s nothing. But if one day, AI starts designing buildings, you’ll have to rethink its value.”

It’s Just a Matter of Time

Liu Cixin expressed the truth of technological development through the words of the alien “Risk Manager” in his novel “The Wandering Earth”:

“The starting point for humanity to gain the ultimate secrets of the universe began with the first ape gazing at the starry sky.”

Just like humanity’s exploration of the universe, AI is also continuously exploring humanity. The current AI simians (AIGC) have seen the vast starry sky. More and more researchers are participating in the exploration of AIGC, and AIGC is getting closer to higher-level creative thinking. Conquest seems to be just a matter of time.

The past decade has been a decade of soaring for AI. In the whirlwind of the past ten years, interesting technological points have emerged one after another, some becoming new tracks (like recognition in security), while others have “died in the womb” during the process of commercialization.

Amidst the sands of time, people are both eager and cautious about AIGC.

For example, regarding whether AIGC can yield results in domestic commercial landing, some investors are worried.

Taking text generation as an example, the commercialization of AIGC actually relies heavily on user-driven demand. However, the current domestic Chinese language large models lack high-quality corpus data in the open-source area, leading to varying writing quality on different topics in the Chinese version of AI; at the same time, domestic writers generally have lower labor costs than those in developed countries in Europe and America, making AIGC’s cost savings in content generation significantly lower than in overseas markets.

For AI to be accepted in scenarios involving competition with humans, the cost of services provided by AI must have a clear advantage over labor costs — this is almost an unspoken law. Industrial quality inspection is a good reference example: traditional factories prioritize cost in the quality inspection process. When the average monthly salary of an inspection employee is 6-7k, and an AI visual solution cannot match the cost or achieve high accuracy, it is difficult to persuade the industry.

Xincheng Technology admitted to Lei Feng Network that their text generation tool is currently priced at one-tenth of Jasper.AI, but the acceptance level among domestic users is still in the climbing phase, which also requires continuous improvement of Chinese large models (GPT-3, PaLM, etc. are all English large models).

However, more people believe that AIGC will change various aspects of modern production and life, because AIGC addresses existing, rather than hypothetical, problems. These problems are very specific, and in most scenarios, it can partially or completely replace heavy human labor, not only reducing costs and increasing efficiency but also lowering the creation threshold for content and inspiring people’s creativity and imagination.

For instance, in painting. Skills that previously required over ten years of training can now be utilized by beginners using AI to create works of quality comparable to those produced by professionals. This further reveals the essence of creation: thoughts and viewpoints have always been the soul of creation, not the methods and tools.

Although many companies have recently attached the AIGC label to their products and positioning, according to Lei Feng Network’s understanding, there are still technical barriers in both text generation and image creation.

Moreover, the choice of algorithms and data also determines the subsequent performance of various enterprises in different scenarios. Currently, in commercialization, choosing landing scenarios with high technical barriers and secure moats has become an urgent task for various AIGC practitioners.

AIGC entrepreneurs have told Lei Feng Network that they believe that future AI technology may change the outcomes of power struggles, and it is highly likely that a completely generation-based content consumption platform will emerge in the future. AIGC will become a key technology in the metaverse and Web 3.0, but before rushing to the future, they still need to cross one hill after another.

But at least, they already know the location of the hills.

In the next article, we will discuss the challenges and opportunities of AIGC entrepreneurship in the Chinese market. If you are an AIGC entrepreneur or are interested in AIGC, feel free to add WeChat (Fiona190913) for communication.

Reference Links:

“Artificial Intelligence Generated Content (AIGC) White Paper (2022)”
https://multimodal.art/news/1-week-of-stable-diffusion

Recent Popular Articles

History of China’s Advertising Engine

No More Reidi Ke in the Jianghu

PICO’s Love and Fear After One Year of Independence

Reference Links:

Leave a Comment Cancel reply