Innovative AI Model Tools Transform Film Production

Intelligent synchronization model and tools for video lip sync achieve innovative upgrades

The generative AI integration platform LTX Studio can intelligently realize film visualization previews

The AI tool LAVE utilizes large language models (LLM) for intelligent video editing

【Highlight】

In 2021, the globally popular Hollywood film “Free Guy” decided to change the actor’s dialogue after filming was completed. The creative team used deep neural network rendering combined with traditional visual effects production technology to modify the character’s facial movements in just 5 days to match the dialogue. Today, with the launch of generative AI large models for audio and video both domestically and internationally, it is now possible to generate matching videos from static images and audio in just a few seconds.

Effectively, reasonably, and feasibly applying existing AI model tools in film production practices is a key aspect of enhancing the intelligence level of the film industry. Currently, there is a global trend to utilize open-source models and production tools to build integrated application platforms, relying on AI agents to complete film production tasks, exploring the vertical application of existing language, visual, sound, and other AI large models in film visual previews, editing, visual effects, sound production, and other process links, seeking points of convergence between AI technology and film demand, thereby forming a new productive force in the film industry.

Intelligent synchronization model and tools for video lip sync achieve innovative upgrades

Recently, Alibaba launched the EMO (Emote Portrait Alive) video generation framework, which can intelligently generate videos of characters speaking or singing from a static image and an audio clip, capturing comprehensive dynamic changes in the character’s head and achieving natural transitions of expressions, facial features, and postures that match the audio. The video length corresponds to the length of the audio, and the character’s features remain consistent; currently, the longest generated case is about 1 minute and 30 seconds.

Innovative AI Model Tools Transform Film Production

▲Using Sora to generate characters as input images to create videos

EMO adopts a UNet architecture similar to the Stable Diffusion image generation model, with training divided into three phases: image pre-training, video training, and speed layer training. In the image pre-training phase, the network is trained with single-frame images as input; in the video training phase, a time module and audio layer are introduced to process consecutive frames; the speed layer training focuses on adjusting the speed and frequency of the character’s head movements. The training data is sourced from widely available Talking Head videos, adjusted and cropped to a resolution of 512×512 for training.

Innovative AI Model Tools Transform Film Production

▲Character: Audrey Kathleen Hepburn-Raston, voice source: interview clips

At the same time, the AI video generation tool Pika has launched a lip sync function that can achieve mouth animation synchronization with audio in generated videos, with audio technology support provided by AI voice cloning company ElevenLabs. Users can choose to directly input text to generate audio or upload their own audio, deciding both the content of what the video character says and customizing the speaking voice style.

Currently, this function can only generate lip sync video segments of up to 3 seconds in length and can only simulate lip movements.

The Generative AI Integration Platform LTX Studio Can Intelligently Achieve Film Visualization Previews

Recently, AI technology company Lightricks announced the launch of the generative AI film production platform LTX Studio, aimed at helping creators quickly visualize stories.

LTX Studio is a content generation platform that integrates a series of generative open-source AI models and tools, capable of creating videos, music, sound effects, and dialogues through text prompts. LTX Studio integrates these functions into a single interface, allowing users to complete the entire audio and video content creation process in one place.

Innovative AI Model Tools Transform Film Production

▲LTX Studio interface

Users input their creative intentions in text form, and LTX Studio first generates a set of scenes (Scene) containing photos, styles, names, and sounds based on the text prompts. Each scene contains multiple shots (Shot). Users can customize the style, weather, location of each scene, adjust the angles, characters, scene consistency, camera movement, lighting, etc., and can change elements within, such as changing characters in the generated scene or altering the color of vehicles in the scene. After refining the storyline and editing the shots, users can preview and export the video for sharing and feedback.

The generative case for LTX Studio is approximately 25 seconds long, with relatively simple camera movements, mostly consisting of camera pans, and character expressions appearing stiff with limited movement and minimal interaction with the surrounding environment. Compared to the previously released Sora generated videos by OpenAI, there is still a significant gap, but it meets the requirements for simple virtual previews.

Given the current functionalities and effects, LTX Studio allows users to flexibly adjust and preview film effects, which can be used by filmmakers to quickly create conceptual stories, reducing production costs and improving efficiency.

Lightricks previously developed image editing software Photoleap, video editing software Videoleap, and portrait retouching software Facetune. The launch of LTX Studio is a further enhancement of existing creative tools with generative AI technology.

The AI Tool LAVE Uses Large Language Models (LLM) for Intelligent Video Editing

Researchers from the University of Toronto, Meta (Reality Labs Research), and the University of California San Diego recently jointly proposed the video editing tool LAVE, utilizing the powerful language capabilities of large language models (LLM) for video editing.

Innovative AI Model Tools Transform Film Production

LAVE introduces an LLM-based planning and execution agent that can interpret user language commands, plan, and execute related operations to achieve editing goals. The agent can provide conceptual assistance, such as creative brainstorming and video material overviews, as well as operational help, including semantic video retrieval, storyboarding, and editing modifications.

To ensure smooth operation of these agents, LAVE uses a visual language model (VLM) to automatically generate language descriptions of video visual effects, enabling the LLM to understand video content and assist users in editing with their language capabilities. Additionally, LAVE offers two video editing interaction modes: agent assistance and direct operation. This dual mode provides users with flexibility while allowing for on-demand improvements to agent operations.

Innovative AI Model Tools Transform Film Production

The user interface of LAVE consists of three main components: a language-enhanced video library displaying video clips with automatically generated language descriptions, a video editing timeline containing the main timeline for editing, and a video editing agent that allows users to interact with the conversational agent for assistance.

When users interact with the agent, message records are displayed in the chat interface. When performing related operations, the agent makes changes to the video library and editing timeline. Additionally, users can directly interact with the video library and timeline using the cursor, similar to a traditional editing interface.

(All images in this issue are sourced from the internet)

Innovative AI Model Tools Transform Film Production

Edited by丨Zhang Xue

Proofread by丨Wang Jian

Reviewed by丨Wang Cui

Final review丨Liu Da

Innovative AI Model Tools Transform Film Production

Call for Contributions!

Whether you are a tech newcomer or a technical expert, if you have original film technology, systems, devices, or applications, please contact:[email protected].

The stage for technology awaits you!

Film technology dynamics will work with you to contribute to the high-quality development of Chinese film technology!

Innovative AI Model Tools Transform Film Production

Don’t forget to like + follow!

Leave a Comment Cancel reply