Editor|Shang En
Recently, the AI video track has become very popular, with Pika launching version 1.0 and announcing financing of ten million dollars, stealing the spotlight.
As the elder brother in AI video generation, the developer of the phenomenal products Gen-1 and Gen-2, Runway, suddenly announced the formation of a team to develop General World Models (GWMs), aiming to create an AI system that simulates the real world, distinct from large language models.
After Runway announced its intention to create GWMs, it immediately drew skepticism from many netizens.
Some expressed:
This is just a multimodal large model that incorporates video, audio, text, and images.
Others directly commented: “This is a nice video; Ruben (the dog in the video) is also very cute.” (But they completely ignored the new model).
△Image source: Twitter
What kind of world model does Runway want to create? Why choose to develop a world model at this time?
Using World Models to Simulate the World
For most users, the development speed of artificial intelligence in the past year has indeed exceeded our expectations and imagination. However, while we marvel at how large language models can engage in smooth conversations, the hallucination problem often causes these models to “speak nonsense” or “answer off-topic,” significantly diminishing the user experience.
This issue is not limited to large language models; it is also frequently seen in AI image generation and AI video generation, such as the classic six-finger problem in AI-generated images:
△Image source: Twitter
Even Runway’s own product, Gen-2, is not immune to this issue. In a newly released three-minute video, Runway attempts to explain the root of this problem — existing large models lack a comprehensive understanding of the real world.
Taking the most familiar LLM (large language model) as an example, although it can generate poetry, articles, and even movies, LLMs only understand the rules of the language domain. Therefore, when encountering unfamiliar questions, they often “seriously fabricate” responses.
The underlying paradigm is: large model + large data = more knowledge about the world. This paradigm also leads to widespread hallucination problems, which are similarly present in AI video generation tools.
In fact, the concept of the General World Model proposed by Runway aims to address and solve this problem. Runway defines a “world model” as an AI system that can construct an internal representation of the environment and simulate future events within that environment.
In short, Runway hopes the new model can closely resemble our real world, simulating various situations and interactions.
LeCun Supports, But Runway Wants to Do Something Different
The concept of a “world model” is not original to Runway. Turing Award winner Yann LeCun proposed this concept last year to describe his ideal AI that is closer to human-level intelligence.
He has publicly criticized GPT large models, arguing that autoregressive models generated based on probability cannot overcome the hallucination problem, even asserting that GPT models will not last five years.
LeCun hopes to create an internal model that learns how the world works. Based on this, he and his team released the “human-like” AI model I-JEPA in June this year, allowing the model to learn common sense knowledge about the world like humans.
△Image source: Twitter
However, it seems that despite receiving applause and expectations upon publication, LeCun’s world model has yet to find a smooth path for implementation six months later. This may also be a reason for the public’s reserved attitude towards Runway.
So, what kind of world model does Runway want to create?
Regarding how to develop the new model, Runway revealed some ideas in the video, stating that GWM aims to establish a mental map that allows the model to better understand the “why” and “how” of the world.
Achieving this idea seems to face several challenges, which the Runway team has already recognized. They mentioned that the two current problems that need to be solved for GWM are:
1. These models need to generate consistent environment mappings and the ability to navigate and interact within these environments.
2. The model needs to capture not only the dynamics of the world but also the dynamics of its inhabitants, including establishing realistic human behavior models.
△Image source: Twitter
Despite the lukewarm response from the outside world, Runway has clearly made up its mind to form a team and start recruiting. The company’s official website has opened a batch of new job postings, covering various fields including machine learning, application research, and data infrastructure.
△Image source: Runway official website
One More Thing
Looking back at the AI video generation track, the enthusiasm ignited by Pika 1.0 has not diminished but rather increased. From the feedback of the earliest users who qualified for Pika testing, the evaluation of the actual effectiveness and technical level of Pika 1.0 has also shown a polarized situation.
Some users praise Pika 1.0 as the best AI video generation tool they have used, while some Discord users found that the results were not significantly different from other similar tools after testing.
Domestic giants are also entering the AI-generated animation space, with Alibaba and ByteDance’s competition reaching a face-to-face level — Alibaba recently launched an AI project called “Animate Anyone,” claiming that it only requires a picture and a skeletal animation to create a video for anyone. ByteDance quickly followed up with “MagicAnimate” and achieved open-source implementation. Ultimately, the situation was temporarily halted when Alibaba swiftly released “DreaMoving” in response.
Interestingly, one of the motivations behind the establishment of Pika Labs was that the works of the two co-founders were disqualified in the first AI Movie Festival held by Runway. In a recent interview, founder Chenlin Meng also mentioned that the video quality levels generated by Runway, Genmo, Imagen Video, etc., are similar, all exhibiting various “artifacts,” which indicates that there is still much room for technological innovation and breakthroughs in this field.
Chenlin Meng compared the current video generation technology to the “GPT-2 era,” suggesting that there are still many variables in the future competitive landscape. Whether GWM can help Runway overtake competitors remains to be seen.


AI Public Account under 36Kr


