Four Domestic Sora AI Video Generators Reviewed

Source: Quantum Bit | Public Account QbitAI

Folks, let me tell you about this domestic Sora. In just the month of July, its “growth rate” has been nothing short of astonishing—

KeLing, PixVerse V2, QingYing, Vidu……

Faced with a plethora of AI video generation software, I believe you share my sentiments:

Four Domestic Sora AI Video Generators Reviewed

After some reflection, an idea immediately came to mind:

Can we bring them together for a comparative review to see which domestic Sora excels?

Let’s get started. First, we will briefly introduce the contestants:

Contestant 1: KeLing, produced by the Kuaishou AI team, officially launched on the web on July 6.
Contestant 2: PixVerse V2, officially released by Aishi Technology on July 24.
Contestant 3: QingYing, developed by Zhipu AI, launched on web, app, and mini-program on July 26.
Contestant 4: Vidu, a startup from Tsinghua University, released on July 30.

This battle will focus on image-to-video and text-to-video two major categories, including tests for generating landscapes, animals, characters, and even meme images.

For each contestant, we will not only assess the final quality of the generated content but also the consistency of the images and whether they adhere to the given prompts.

Next, we present the hands-on tests conducted by Quantum Bit.

Round 1: Image-to-Video

For the first round, we will test each contestant’s image-to-video capabilities.

As per usual, we start with just one image and a prompt, without any additional operations, to ensure authenticity~

Classic Meme Image

Speaking of memes, we must mention “Zhen Huan Zhuan”~

We first “feed” each contestant the following image:

Then we provide a simple prompt:

Zhen Huan slowly put on her sunglasses.

From the detail of putting on sunglasses, only KeLing successfully generated the sunglasses as a complete entity.

QingYing and Vidu generated transparent sunglasses, which are not what we typically think of as sunglasses.

As for PixVerse V2…… it generated nothing but loneliness.

Secondly, from the perspective of naturalness, KeLing’s Zhen Huan looking down to put on her sunglasses is quite logical and the most natural.

In QingYing and Vidu, Zhen Huan’s sunglasses were positioned a bit too high; additionally, QingYing generated a hand with six fingers.

Thus, in the meme image contest, KeLing clearly wins!

Imaginative Sci-Fi Blockbuster

Next, let’s test the AI’s imagination.

The operation is the same, first “feeding” an end-of-the-world scene image:

This time the prompt involves a slightly more complex “plot”, for example:

The last human spaceship escapes, and through the porthole, a rainbow-colored balloon floats by in slow motion.

Let’s see the generation results from the four contestants:

This round’s results are quite clear.

The one most aligned with the prompt is PixVerse V2, which not only continued the explosive scene but also featured a rainbow balloon floating by.

Next is KeLing, but the balloon appeared rather suddenly and strangely, and its color was merely a single purple.

QingYing inexplicably generated a rainbow cloud, without producing a balloon.

But the worst was Vidu, which generated nothing at all, though it vaguely seemed to have some rainbow color in the distant explosion clouds.

This round, PixVerse V2 wins!

Bringing Old Photos to Life

The last test for image-to-video involves reviving old photos:

The prompt is as follows:

A child turns around while clapping.

Let’s check the results:

Hmm, collective failure.

Those not clapping didn’t clap, and the hands of those who did are mostly distorted.

However, comparatively, Vidu’s result was slightly stronger, as it managed to complete the key “clapping” action (though some frames still had errors) and even included a rain effect, creating a lively scene of “children playing in the rain”~

It seems that AI has a tough time handling hands.

In this round, Vidu performed relatively better!

So far, let’s summarize:

Round 2: Text-to-Video

After image-to-video, we move on to the next major category—text-to-video.

In this segment, we have also set multiple sub-projects to test the capabilities of each AI contestant from various dimensions.

Same Prompt as Sora

First, we will compare the results using the same prompt presented on the official site of the text-to-video “ancestor” Sora.

For instance, the classic “Sora Girl”:

A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.

None of the effects matched Sora’s realism, but each has its unique stylistic features.

For example, KeLing and Vidu’s characters appeared more natural in their walking postures; PixVerse V2 had better consistency in facial features; while QingYing showcased a richer color palette overall.

One-Shot

Next, to test the AI contestants’ ability to generate consistent visuals, we have a complex prompt for a one-shot video:

Generate a video themed “Morning in the City Park”. Please use one-shot filming techniques, starting from the park entrance, slowly moving the camera to capture the scenes in the morning sunlight. The camera should move smoothly, sequentially showcasing the following elements: 1. The park entrance sign, sunlight filtering through the leaves creating dappled light and shadow. 2. A crowd of morning joggers, their faces radiating vitality and energy. 3. The children’s play area, where kids are playing on swings and slides, laughter fills the scene. 4. Finally, the camera returns to the exit on the other side of the park, ending the video.

From the results, all contestants performed well in generating the large scene, bringing the park and trees to life.

But!

All contestants made errors with the characters: distorted, missing, magical……

Close-Up

Following consistency, we will test the close-up shots to see if the AIs can handle it:

Animated scene showcasing a pink fluffy monster holding a large piece of cheese, eating it in a 3D style, focusing on the details of the scene, the monster’s expression full of joy, displaying a mischievous and innocent demeanor. Warm colors and ambient lighting.

In this round of testing, aside from the distortions from PixVerse V2, the other three contestants performed excellently in both consistency and richness of visuals.

Multiple Subjects

Finally, we will test whether the contestants can simultaneously handle multiple subjects in one video, for example:

In winter, a family of three, dad, mom, and daughter, sit on the sofa, with a cat sleeping beside them, a fireplace providing warmth, creating a cozy scene.

In terms of style, only KeLing interpreted “cozy” as a black-and-white style, while the other three contestants’ styles were more aligned with the prompt.

Vidu was the only one to generate a Chinese family.

However, all contestants failed to fully generate the four subjects in the prompt, namely three humans and a cat, all had some degree of missing elements.

Similarly, we will summarize again:

This concludes all evaluation content for AI video generation.

Now, the next question is:

Which Domestic AI Video Generator Is the Best?

In addition to the effects mentioned above, let’s look at another dimension of competition—generation speed:

We have recorded the generation times for each contestant and each case for both image-to-video and text-to-video categories:

Surprisingly, the newcomer Vidu took less than a minute in all categories, being the only contestant to enter the “second generation club”.

Among the remaining three, QingYing’s generation speed is higher than the other two. Additionally, it should be noted that QingYing’s video duration is 6 seconds, while the other two are 5 seconds.

As for KeLing and PixVerse V2, overall, PixVerse V2 is faster.

To summarize, the overall generation speed ranking is as follows:

Vidu > QingYing > PixVerse V2 > KeLing

However, in terms of functionality, there are some details worth mentioning.

For instance, PixVerse V2 allows free video length extension to 8 seconds; KeLing can extend to 10 seconds but does not support high-performance mode; QingYing and Vidu have fixed durations.

Regarding the limit on the number of generations, QingYing is quite generous, with no limits!

While the other three contestants all use a credit system:

KeLing: Free 66 inspiration points for daily login, requires 10 inspiration points to generate a single 5-second video.
PixVerse V2: 100 credits upon registration, with an additional 50 credits available daily, requires 15 credits for a single 5-second video.
Vidu: 80 points upon registration, more points require a subscription.

……

Finally, from the aspects of video generation effects, generation speed, and functionality, it is currently impossible to determine which domestic Sora is the strongest; we can only say that each has its strengths.

If friends want to dive deeper into the experience and evaluation, the experience addresses for the four contestants are provided below; feel free to come back and discuss after testing~

KeLing Experience Address: https://klingai.kuaishou.com/text-to-video/new

PixVerse V2 Experience Address: https://app.pixverse.ai/home

QingYing Experience Address: https://chatglm.cn/video

Vidu Experience Address: https://www.vidu.studio/