How AI Tool Sora Generates Videos from Text

On February 16, 2024, OpenAI announced its new text-to-video model on X (formerly Twitter) — Sora.
This model can generate videos up to 60 seconds long, and during this process, it can switch camera angles on its own and even provide close-ups. Below are the translated video prompts and the “works” generated directly by Sora based on the original English prompts.

A fashionable lady walks down the neon-lit streets of Tokyo, wearing a black leather jacket, a red long skirt, and black boots, carrying a black handbag. She wears sunglasses and has red lipstick on. She walks confidently and casually. The street is wet, and the water on the ground reflects the colorful lights like a mirror, with many pedestrians coming and going.

Video source: OpenAI official website

A 3D animation shows a small, round, fluffy creature exploring a vibrant, magical forest. This creature is a mix between a rabbit and a squirrel, with soft blue fur and a fluffy striped tail. It hops along a sparkling stream, its eyes filled with curiosity. The forest is filled with magical elements: glowing flowers that change colors, trees with purple and silver leaves, and floating lights resembling fireflies. The creature eventually stops to play with a group of fairies dancing around a mushroom. The creature looks up in awe at a giant glowing tree that seems to be the heart of the forest.

, time

Video source: OpenAI official website

At first glance, these videos might seem like professionally produced short films by a filming team or animation company. In the OpenAI community, many users have expressed similar concerns, fearing that Sora could take away jobs from animators.
How AI Tool Sora Generates Videos from Text
Image sourced from: community.openai.com after machine translation
Some people are also concerned about whether such technology could be used to forge videos, or even to commit perjury in court.
How AI Tool Sora Generates Videos from Text
Image sourced from after machine translation: X
So how does Sora generate such videos? Is it really omnipotent, and will it take away human jobs?

How Does Sora Generate Videos?

Since the second half of 2022, applications like Midjourney and Stable Diffusion have been able to generate corresponding images based on text prompts. In September 2023, the combination of GPT 4.0 and DALL-E 3 allowed us to generate and modify images through a conversational interface.
AI-generated videos are no longer a novelty. Before the release of Sora, there were already some video generation AIs, such as Pika, Stable Video, RunwayML, etc. However, compared to Sora, other models generate videos of shorter lengths and are much weaker in terms of camera movement and scene transitions.
Video source: Gabor Cselle’s post on X
So how does Sora generate videos?
OpenAI released a technical report on Sora, mentioning that “Sora is a diffusion model“.
How AI Tool Sora Generates Videos from Text
Sora is a diffusion model, image source: OpenAI official website
Diffusion models themselves are complex. We won’t delve into the specific details, but we’ll roughly understand the idea of diffusion models through a simple example.
Suppose we have a photo of a dog; we can gradually add noise to this photo, making it increasingly blurry until it becomes a mass of chaotic noise.
How AI Tool Sora Generates Videos from Text
Adding noise and removing noise, image source: Reference [3]
If we reverse this process, we can also gradually remove noise from a pile of chaotic noise to restore it to the target image. The key to diffusion models is learning to reverse the noise removal.
Of course, diffusion models can be used not only to generate images but also to generate videos. For example, in Sora’s technical report, it is mentioned that OpenAI has performed some transformations on video data, allowing video data to be directly used to train the model, enabling Sora to generate videos directly based on prompts.
How AI Tool Sora Generates Videos from Text
Sora transforms video data, image source: OpenAI official website

Sora’s Powerful Video Creation Capabilities

According to OpenAI, Sora “inherits” OpenAI’s understanding of text,capable of generating high-quality images and videos based on prompts, and able to extend videos forwards or backwards. For example, it can continue to extend the same video beginning to produce different endings, or introduce different beginnings that ultimately converge to the same ending.
How AI Tool Sora Generates Videos from Text
These three video beginnings will eventually lead to the same ending, image sourced from: OpenAI official website
Additionally, Sora can not only generate videos based on text but alsodirectly input images or videos to edit and adjust them.
For example, it can make a car driving on a regular road look more “cyberpunk”.
How AI Tool Sora Generates Videos from Text
Image sourced from: OpenAI official website
Moreover, Sora has shown some previously unimagined capabilities, such as being able tofollow objects while moving the camera, maintaining a reasonable and complete surrounding scene even during angle transitions.
Video sourced from: OpenAI official website

The “Powerful Sora” Still Has Some Flaws

Although Sora has shown powerful capabilities, it is not yet perfect at this stage.
Not every time does Sora generate a satisfactory video. Will Douglas Heaven, a writer for MIT Technology Review, stated: “The videos released by Sora are already the cream of the crop selected from a vast amount of output.”However, even these “selected cream of the crop” are not perfect.
The technical report on Sora also admits that there are some flaws in the videos generated at this stage. For example, in the video clip of “archaeologists digging up a plastic chair,” the plastic chair clearly does not adhere to objective physical laws.
Additionally, the process of a glass cup breaking is also not very “scientific” — the liquid in the cup had already flowed out before the cup broke.
Therefore, Sora has many areas that need improvement. But there is no doubt that the capabilities demonstrated by Sora so far indicate that this is a very promising path.

Is Sora Safe?

Will It Replace Humans?

Recently, the videos generated by Sora have gone viral among many people’s social circles. While people marvel at Sora’s capabilities, they also express concerns centered around two aspects.
The first concern is: Sora’s ability to generate videos is so impressive that if such technology is used for forgery, wouldn’t that be terrifying? How will we know if the videos we see in the future are real or fake?
The other concern mainly comes from professionals in the video industry — if models like Sora become widespread, will video industry professionals lose their jobs?
First, let’s talk about safety issues. In fact, OpenAI has also considered the potential safety issues that Sora may bring.Currently, Sora is only open to a limited number of people, and it will not be made available to the public until it is ensured that it will not be used for malicious purposes.
So will Sora replace human video workers?
It is certain that Sora’s emergence may threaten some creators of animation materials.
For example, in January this year, The Hollywood Reporter conducted a survey of 300 entertainment industry leaders, and three-quarters of respondents indicated that AI would reduce future job positions, with about 200,000 jobs likely to be affected in the next three years. Sora’s excellent performance will exacerbate this impact.
But looking at it from another angle,the emergence of every new technology brings new opportunities along with its threats.
Video generation AIs, including Sora, are merely tools; the creative ideas for videos still need to come from humans. Sora may help humans produce videos more efficiently while also giving every ordinary person the chance to create their own creative videos.

References

[1]https://openai.com/research/video-generation-models-as-world-simulators

[2]https://openai.com/Sora[3]https://scholar.harvard.edu/binxuw/classes/machine-learning-scratch/materials/foundation-diffusion-generative-models

[4]https://www.hollywoodreporter.com/business/business-news/ai-hollywood-workers-job-cuts-1235811009/

Produced by

This article is a product of the Science Popularization China – Starry Sky Project

Produced by | Science Popularization Department of China Association for Science and Technology

Supervised by | China Science and Technology Publishing House Co., Ltd., Beijing Zhongke Xinghe Cultural Media Co., Ltd.

Author | Xiao Wei, Science Popularization Creator

Reviewed by | Qin Zengchang, Associate Professor, School of Automation Science and Electrical Engineering, Beihang University

Planned by | Xu Lai

Edited by | He Tong

Author

Recommended Reading

● Pu’er Science Association warms up before the festival, building a warm home for scientific workers
● Two science associations in Pu’er and one science popularization information officer received praise from the China Association for Science and Technology
● Uniting hearts and minds for development, truly grasping practical work to open a new chapter — 2024 Pu’er Science Association work and business training meeting held
How AI Tool Sora Generates Videos from Text
How AI Tool Sora Generates Videos from Text

Scan to follow us

Disclaimer

How AI Tool Sora Generates Videos from Text

Welcome to submit articles

Submission content: Technology-related news, popular science knowledge, popular science images, and other various knowledge and anecdotes related to technology in and out of Pu’er City, but originality is required. Submission email: [email protected]

Leave a Comment