
“Huaxia Climate” welcomesbusiness/manuscript/advertisementcooperation
At the start of 2024, nothing in the tech circle is more exciting than the emergence of Sora.
Just like the LLM entrepreneurship wave brought by ChatGPT in early 2023, the release of Sora has similarly pushed video generation models to the forefront.
Tech giants are aggressively pushing products, while startups ride the wave.
On March 13, AI video model company Aishi Technology completed a series A1 financing of over 100 million RMB; on March 12, Shengshu Technology completed hundreds of millions in series A financing; on March 1, AI video generation SaaS service provider “Bool Vector” completed nearly 10 million in financing…
Sora has first implemented the DiT architecture, merging the previously independent diffusion models and large models, opening a new chapter in the history of video generation models.
Undoubtedly, a new technological storm is approaching. Overnight, various video generation large models in China are vying for the label of “Chinese version of Sora”.
To explore the answer to this question, “Zixiang” conducted practical experiences with existing video generation products in China, and combined public information, third-party testing data, and other dimensions to comprehensively evaluate the current mainstream video generation models.
We will explore from three perspectives: product design, practical testing effects, and industry analysis, to see who can become the “Chinese version of Sora”?
Innovation of DiT, Who Can Replicate?
The wind of Sora has just blown from across the ocean to China, but video generation is not a new topic.
Previously, this track has already experienced several waves of revolutionary waves with Runway’s Gen-2, Pika 1.0, and Google’s VideoPoet, finally arriving at the “Sora” moment with better generation effects, longer times, stronger logic, and more stability.
“Zixiang” has organized and illustrated the basic situation of domestic video large model companies and products.

▲Image:Overview of domestic and foreign video generation large model enterprises, visit volume as of February 2024
Abroad, companies like Google and Microsoft have long invested in research on multimodal video generation.Last year, Google released the multimodal large model Gemini and the VideoPoet video large model, allowing people to see the possibilities of multimodal generated videos from an intuitive effect perspective.
In China, we see more possibilities in the multimodal technology path, with not only large companies like Baidu with deep technical accumulation but also large model unicorn companies like Zhipu, and startups like Shengshu Technology and Zhixiang Future targeting multimodal large models.
The diffusion model route is the mainstream route for text-to-video, playing an important role in ensuring generation effects, so even the astonishing Sora is merely a transformation at the underlying architecture level, rather than a complete overhaul.
Whether domestically or internationally, this path is the most crowded, first with Stability AI, which created and open-sourced the diffusion model, followed closely by Runway and Pika, and then by giants like OpenAI, Meta, and Nvidia.
Returning to China, Tencent, Alibaba, and ByteDance have almost monopolized the research in the video generation field in the early stages, occasionally throwing out a demo to amaze. But when it comes to landing products, startups are clearly a step ahead, as companies like Aishi Technology, Morph Studio, and Youbrain Technology have already begun to open to users.
Known as the “Sora route,” DiT, short for Diffusion Transformer, essentially integrates the training large model methodology into the diffusion model. According to the results presented in the Sora technical report, significant efforts may yield effects akin to a world physics simulator.
Now, the underlying architecture of Sora has been thoroughly analyzed, and the training components and technologies are on the path to open sourcing, but this does not mean that a Sora for everyone is just around the corner; technology, data, computing power, and training scale are all hurdles.
Recently, a core team leader of Sora revealed in an interview: “Sora is still in the feedback collection stage and is not yet a product; it will not be open to the public in the short term.”
From the technical route perspective, Aishi Technology is one of the few enterprises that has adhered to the DiT route from the beginning. Its founder, Wang Changhu, mentioned in a public interview that Sora’s emergence verifies the correctness of Aishi’s video generation large model direction. Therefore, Aishi Technology has set a goal of “catching up with Sora in 3-6 months,” seizing the opportunity and striving to catch up.。
Product Testing, User