How AI Tool Sora Generates Videos from Text

On February 16, 2024, Open AI announced on X (formerly Twitter) the launch of its new text-to-video model – Sora.

This model cangenerate videos up to 60 seconds long, during which it can switch camera angles and even provide close-ups. Below are the video prompt translations and the “works” generated by Sora based on the original English prompts.

A fashionable woman walks down the neon-lit streets of Tokyo. She wears a black leather jacket, a red long skirt, and black boots, carrying a black handbag. She wears sunglasses and red lipstick. She walks with both confidence and casualness. The streets are wet, and the water on the ground reflects the colorful lights like a mirror, with many pedestrians coming and going.

Video source: Open AI official website

A 3D animation shows a small, round, furry creature exploring a vibrant, magical forest. This creature is a mix of a rabbit and a squirrel, with soft blue fur and a fluffy striped tail. It hops along a sparkling stream, its eyes filled with curiosity. The forest is full of magical elements: glowing flowers that change color, trees with purple and silver leaves, and floating lights resembling fireflies. The creature eventually stops to play with a group of fairies dancing around a mushroom. It looks up in awe at a giant glowing tree that seems to be the heart of the forest.

Video source: Open AI official website

At first glance, these videos might seem like they were produced by a professional filming team or an animation company. Many comments in the OpenAI community express similar concerns that Sora might take away jobs from animators.

How AI Tool Sora Generates Videos from Text

The image was machine-translated and taken from: community.openai.com

Some people are also worried about whether this technology could be used to fabricate videos or even be used as evidence in court.

The image was machine-translated and taken from: X

So how does Sora generate such videos? Is it really omnipotent and will it take away human jobs?

How Does Sora Generate Videos?

Since the second half of 2022, applications like Midjourney and Stable Diffusion have been able to generate corresponding images based on text prompts. In September 2023, the combination of GPT 4.0 and DALLE 3 also allowed us to generate and modify images in a conversational manner.

AI-generated videos are no longer a novelty. Before the release of Sora, there were already several video-generating AIs, such as Pika, Stable video, RunwayML, etc. However, compared to Sora, other models produce shorter videos and are much weaker in terms of camera movement and scene transitions.

Video source: Gabor Cselle’s post on X

So how does Sora generate videos?

Open AI released a technical report on Sora, mentioning that “Sora is a diffusion model“.

Sora is a diffusion model, image source: Open AI official website

Diffusion models are complex, and we won’t go into specific details, but we can understand the general idea through a simple example.

Suppose we have a photo of a dog, we can gradually add noise to this photo until it becomes increasingly blurry, eventually turning into a mass of chaotic noise.

Adding noise and removing noise, image source: Reference [3]

If we reverse this process, for a mass of chaotic noise, we can also gradually remove the noise to restore the target image,the key to diffusion models is learning to reverse the noise removal.

Of course,diffusion models can not only be used to generate images but also to generate videos. For example, in Sora’s technical report, it mentioned that Open AI processed video data in certain ways so that video data could be directly used to train the model, allowing Sora to generate videos based on prompts.

Sora processes video data, image source: Open AI official website

Sora’s Powerful Video Creation Capabilities

According to Open AI, Sora “inherits” Open AI’s understanding of text,capable of generating high-quality images and videos based on prompts, and can extend videos either forward or backward. For example, it can continue from the same video beginning to extend different endings. Or it can introduce from different beginnings and ultimately converge to the same ending.

These three video beginnings will eventually lead to the same ending, image taken from: Open AI official website

Additionally, Sora can not only generate videos based on text but alsocan directly input images or videos to edit and adjust them.

For instance, it can make a car driving on a regular road appear more “cyberpunk”.

Image taken from: Open AI official website

Moreover, Sora has shown some previously unimagined abilities, such as being able tofollow objects while moving the camera, and still maintain the surrounding scene’s reasonableness and completeness when changing camera angles.

Video taken from: OpenAI official website

The “Powerful Sora” Still Has Some Flaws

Although Sora has demonstrated powerful capabilities, it is still not perfect at this stage.

Not every time does Sora generate a satisfactory video. Will Douglas Heaven, a writer for MIT Technology Review, stated: “The videos released by Sora are already the cream of the crop selected from a large number of results.”However, even these “selected cream of the crop” are not perfect.

The technical report of Sora also admits that the videos generated by Sora have some flaws. For example, in the video clip of “archaeologists digging up a plastic chair”, the plastic chair clearly does not obey objective physical laws.

Additionally, the process of a glass cup breaking is also not very “scientific” – the liquid in the cup flowed out before the cup broke.

Thus, Sora still has many areas to improve. However, there is no doubt that the capabilities demonstrated by Sora indicate that this is a very promising avenue.

Is Sora Safe?

Will It Replace Humans?

Recently, the videos generated by Sora have been making waves on many people’s social media. People not only marvel at Sora’s capabilities but also express concerns, focusing on two main areas.

The first concern is: Sora’s ability to generate videos is so powerful that if such technology is used to fabricate, wouldn’t that be terrifying? How will we know if the videos we see in the future are real or fake?

The other concern primarily comes from video industry professionals; if models like Sora become widespread, will video industry workers lose their jobs?

First, let’s talk about safety issues. In fact, Open AI has also considered the potential safety issues that Sora might bring.Currently, Sora is only open to a few people, and it will not be made available to the public until it is ensured that it won’t be used for malicious purposes.

So, will Sora replace human video workers?

It is certain that the emergence of Sora may threaten some creators of animated materials.

For instance, in January of this year, The Hollywood Reporter conducted a survey of 300 entertainment industry leaders, finding that three-quarters of respondents believed AI would reduce future job positions, with about 200,000 jobs affected in the next three years. Sora’s excellent performance would exacerbate this impact.

However, from another perspective,the emergence of every new technology brings new opportunities alongside threats.

Video-generating AIs, including Sora, are just tools; the creativity for video still needs to come from humans. Sora may help humans produce videos more efficiently while also giving every ordinary person the chance to create their own creative videos.

References

[1]https://openai.com/research/video-generation-models-as-world-simulators

[2]https://openai.com/Sora[3]https://scholar.harvard.edu/binxuw/classes/machine-learning-scratch/materials/foundation-diffusion-generative-models

[4]https://www.hollywoodreporter.com/business/business-news/ai-hollywood-workers-job-cuts-1235811009/

Production Planning

This article is a work of the Science Popularization China – Starry Sky Project

Produced by | Science Popularization Department of China Association for Science and Technology

Supervised by | China Science and Technology Publishing House Co., Ltd., Beijing Zhongke Xinghe Cultural Media Co., Ltd.

Author | Xiaowei, Science Popularization Creator

Reviewed by | Qin Zengchang, Associate Professor, School of Automation Science and Electrical Engineering, Beihang University

Planned by | Xu Lai

Edited by | He Tong

Related Recommendations

1. Magic Revealed at Spring Festival Gala! We Found the Step Where Nigmatulin Flipped! 2. The “Dirtiest Fruits and Vegetables Ranking” First Place? Can This Fruit Still Be Eaten with Confidence? 3. Find a Partner! I Bet You Can’t Imagine the Benefits 4. This Common Kitchen Tool During the New Year Might Be Dirtier than a Toilet! 5. Eating This Way Really Slows Aging! 8 Antioxidant Foods That Not Only Fight Aging But Also…

The cover image and images in this article are from copyright stock images.

Reproducing may lead to copyright disputes. For original text and images, please reply “Reprint” in the background.

Leave a Comment Cancel reply