
Continuing from the previous article, we continue testing the Tongyi Wanxiang 2.1 model for large-scale action presentation.
[Test 5] Large-Scale Action Presentation
Below is an introduction from the news media. Next, we will conduct similar tests to see the performance of the new model during large-scale motion scenes. We refer to the media’s prompts, using diving and balance beam events from the sports competition for testing.
(The above image is taken from Beijing News)
Prompt: The video showcases the exciting moments of a young Asian female athlete in a women’s diving competition. She is dressed in professional gear, leaping from a high platform, completing complex flips in the air before elegantly entering the water. At the beginning of the video, the female athlete stands at the edge of the diving board, with a bustling audience in the background. She then gracefully leaps, spins in the air, executing precise and flawless movements, creating small splashes upon entering the water, showcasing her exceptional diving skills. The camera captures every technical move in detail, as well as the moment of contact between the athlete and the water, immersing the viewer in the competition atmosphere.
|
Prompt: The video showcases the tense moments of a balance beam competition at the sports meet, featuring a young Asian female athlete in professional gear, standing on the narrow balance beam, demonstrating extraordinary courage and skill. She first spreads her arms to maintain balance, then jumps, executing an impressive front flip, and lands steadily back on the beam. In the background, the blurred audience erupts in cheers, enhancing the competitive atmosphere. The camera meticulously captures every difficult move of the athlete, as well as the subtle vibrations of the balance beam during her jump, creating an extremely detailed scene that makes one feel as if they are at the competition, experiencing every breath and heartbeat of the athlete.
|
PS: At the beginning of the video, did you feel a sense of watching Channel 5?
Clearly, both videos of large-scale actions have flaws; the 2.1 model did not accurately generate corresponding reasonable content. However, compared to Keling’s performance, it is already a crushing victory (PS: the same set of prompts was used here, which may not accurately reflect the AI platform’s capabilities, so please be discerning; the comparison results here are for reference only).
Previously, the prompts from the media were also tested, and none met the expectations. After several attempts, some were slightly better, but none reached the expected results of the prompts.
A panoramic shot of a female figure skater performing on the ice. She is wearing a purple skating outfit and white skates, executing a spinning move. Her arms are outstretched, and her body is leaning back, showcasing her skill and elegance.
|
However, it is important to acknowledge that the new model has shown significant improvements in large-scale motion. The fluidity of the main character’s movements and the handling of corresponding physical feedback details are quite accurate. It is just regrettable that the video generation has not met expectations, but the content is partially correct, which is still helpful for editing.
Notably, the details in generating the audience seating at the sports venue are quite accurate, and the details of the audience’s movements are also relatively reasonable, with a proper perspective ratio.

Additionally, I must complain about the Tongyi review mechanism; just because the athlete was dressed a bit revealingly, it frequently triggered the backend result check mechanism, causing the newly generated video to be deleted, as shown below:

For more tests on the 2.1 model, please refer back to the previous article. Both articles will be published simultaneously.
(PS: The following content is consistent with the testing article (Part 1))
Let’s summarize and supplement briefly, first regarding the changes in settings:
(1) The APP and PC versions remain independent, and data does not synchronize;
(2) Currently, the APP version of text-to-video can no longer choose the model version, only the 2.1 version is available (Based on the generation time, it should be the 2.1 professional version)
(3) The video duration remains unchanged, 6 seconds for PC and 5 seconds for APP;
(4) The 2.1 ultra-fast version generates videos requiring only 3 credits.
For specifics, please refer to the table below.
In terms of capability, as reported by the media, this upgrade to 2.1 has significantly improved video generation capabilities. Through comparative testing, it can be said to be a tremendous leap forward. This is consistent with the improvements we mentioned at the beginning of the article:
(1) Support for Chinese and English text generation: Testing found that English generation is normal, while Chinese generation still has shortcomings, and not all texts can be generated.
(2) Enhanced effects for large-scale complex movements: The ability to present large movements of the main character has significantly improved.
(3) Improved adherence to physical laws in video: The physical feedback of the main character’s movements is more realistic.
(4) Enhanced artistic expression: The visuals are more realistic and vivid, with richer details, and the camera movements are more natural, achieving a cinematic level of shot quality.
After completing the tests, especially in comparison with Keling, my personal feeling is: Tongyi has indeed become so strong. Given that Tongyi is currently completely free, such impressive performance should attract more users in the future.
Alright, that’s it for today’s article. If it’s too long to read all at once, save it for later when you have time to go through it slowly. Remember to like, share, and revisit it, and feel free to follow us.
Statement: The content in this article is solely an explanation of the actual testing effects of the specified capabilities of the relevant AI creation platform. Due to individual limitations and the differences in the AI platform’s prompt requirements, the comparative tests mentioned do not represent the actual capabilities of the platform. The content is for reference only; please provide feedback if any inaccuracies are found.
|
Previous Works Recommendation
-
Is your AI New Year video too monotonous with just lines? How about turning you into the God of Wealth?
-
Can AI turn your photos into a New Year video that stuns your social circle?
-
Heard of [First and Last Frame] generation? What about [Last Frame Generation]? Keling 1.5 release upgrade, AI video long shot capabilities further enhanced.
Thank you for reading. Your support is our greatest motivation for creation. If you’re interested in the article, feel free to like, save, and share it. We also welcome you to follow us ⬇⬇⬇
