The Application and Impact of AIGC in the Audio Industry

Introduction

Reading

From ChatGPT to Sora, the multimodal switching and mutual generation of generative artificial intelligence across text, images, audio, and video is considered the closest way for humans to acquire knowledge. Before the explosion of AIGC technology applications, the main development path of the audio industry was to use intelligent voice interaction to reach various accompanying scenarios, including smart speakers, in-car systems, smartwatches, smart homes, etc. Currently, the audio field is rapidly integrating AIGC technology, widely applied in digital humans, content creation and distribution, the music industry, children’s education, and entertainment.

The Application and Impact of AIGC in the Audio Industry

1. Promoting the Integration and Innovation of Audio Content and Services

AIGC technology is driving the innovative integration of content and business in the audio industry. Through the combination of smart terminals and AI large models, the audio industry can provide high-quality and more personalized content services.

Empowering the entire chain of the digital audio field. The iteration of AI technology and the metaverse boom accelerate the further upgrading of the virtual digital human industry. According to relevant institutions, in 2022, the market scale of China’s digital virtual humans was 186.61 billion yuan, with a core market scale of 12.08 billion yuan, expected to reach 640.27 billion yuan and 48.06 billion yuan respectively by 2025. The content production of audio anchors presents a vertically segmented trend, including news broadcasting, companionship, audiobook reading, storytelling, online PIA drama, karaoke live streaming, etc. Major audio platforms divide a series of content areas based on this, creating leading content according to their positioning and analyzing audience preferences for recommendations. AI voice technology enables the conversion of audiobook text to voice, improving the production efficiency of audiobooks. The audiobook market scale continues to expand, and the long-tail effect is gaining attention.

Multi-scenario applications achieve more natural and intelligent virtual-real interaction. Through voice recognition and natural language processing, users can communicate more smoothly with audio platforms via voice, such as voice search and voice-controlled playback. New sound effects and voice roles can be created, adding diversity to audio content, including virtual voice assistants and character voiceovers in audiobooks. Digital human technology can be combined with smart hardware, such as smart speakers, in-car systems, and smartwatches, providing accompanying scene services like voice assistants and personalized content recommendations. The application of AIGC in the audio field has also prompted the exploration of new carriers. For instance, music works that combine physical and virtual elements not only have the collection value of physical items but also the convenience of virtual items, such as music cards that interact with smartphones via NFC technology. At the same time, it also brings innovation to music publishing methods, with some institutions developing new publishing models where musicians can upload their music and generate sales tools through a one-stop self-publishing solution, achieving copyright autonomy and direct connection with fans. In smart cockpit scenarios, AIGC technology can provide a more natural and smooth voice interaction experience. With the development of large language models, vehicles can better understand user commands and needs, such as seat adjustment, air conditioning control, and ambient light changes, thus providing more precise and intelligent responses.

Content creation and distribution efficiency greatly enhanced. AIGC technology can automatically generate music, stories, news, and other content, significantly improving the production efficiency of audio content. For example, through AIGC technology, the creative process of children’s content can be accelerated. A story that originally took 3-4 days to complete can now be done in 8 hours. This increase in efficiency makes content production faster and more economical. Through AI composition and arrangement technology, users can quickly create personalized music works. Additionally, through AI technology, automated content creation can be realized, such as AI-generated news briefs and co-creation story platforms for parents and children, providing users with richer and more diverse content choices.

Service model innovation enhances personalized service experience. AIGC technology can recommend customized audio content based on users’ listening history and preferences, improving user experience. For example, generating personalized playlists and customized news briefs. Personalized advertising content can be generated according to audience characteristics and preferences, enhancing the targeting and effectiveness of ads. In the field of music content, gamification, interactive, and community-based methods can engage listeners in the music experience process, such as using interactive games to help spread music works, enhancing the fun of the music experience. In the field of educational content, AIGC technology can be used to create interactive learning materials, such as language learning and historical storytelling. For children’s content, AIGC technology can help create more suitable audio content for children, such as stories, knowledge, and habits, while also cooperating with smart hardware to provide a more interactive and engaging learning experience.

The Application and Impact of AIGC in the Audio Industry

2. Challenges for the Audio Industry

While AIGC brings opportunities for innovative development in the industry, it also faces challenges in technology, law, safety, ethics, and other aspects, requiring joint efforts from government, industry, and society, as well as cross-domain cooperation, to promote the healthy development and rational application of technology.

Technical aspects. The emergence and application of AIGC bring convenience and efficiency but also introduce many new risks and challenges. Some of these risks and challenges stem from inherent technical limitations, such as the inability to guarantee the authenticity of generated content and the potential for harmful statements. Others arise from improper user use of technology, such as the misuse of ChatGPT-generated text in education, audiovisual content, and scientific research. For example, although digital human technology is developing rapidly, there are still issues with the accuracy of expression capture and the naturalness of voice synthesis. Providing a high-quality user experience is one of the challenges facing digital human technology, requiring continuous optimization of interaction design and content creation to meet user expectations and needs.

Legal aspects. The application of AIGC involves the collection and processing of a large amount of user data, and ensuring data security and user privacy is a significant challenge. Corresponding norms and standards need to be established to ensure the standardized and healthy development of the technology. The application of AIGC technology in content creation may also involve intellectual property issues, necessitating clarification of copyright ownership and protection of creators’ rights. Additionally, there is an industry call for greater emphasis on the safety and privacy protection of children’s content. A strict content review mechanism and data security measures should be established to ensure that while children enjoy the convenience and fun brought by AIGC technology, their safety and privacy are fully protected.

Ethical and social responsibility aspects. The wave of AIGC technology has sparked discussions on ethics and social responsibility regarding the development of digital human technology, including the authenticity and transparency of digital humans and their impact on real society. For instance, the experience of reuniting with deceased loved ones through “digital life” has become a reality and even evolved into a unique business. In addition to “reviving” the deceased, the application of AI digital human technology to create digital avatars for public figures and online influencers has matured and possesses significant revenue-generating capabilities. The function of AI “one-click resurrection” raises a series of ethical issues and legal risks, as its production process involves the extraction of private information, and the generated content may have a deceptive effect, potentially interfering with people’s real memories. The digital products made using the audio and video content of the deceased only resemble their likeness and sound. The industry of AI “resurrecting” the deceased urgently needs to clarify standards and norms and scientifically regulate the technology to prevent further chaos.

Balancing technology and humanity. Many industry insiders advocate that while enjoying the conveniences brought by AIGC technology, attention should also be paid to balancing technology and humanity. For example, audio practitioners should focus more on children’s offline activity capabilities, encourage children to interact with real people, and reduce excessive reliance on online activities. During the content creation and service process, attention should be paid to protecting children’s safety and privacy, guiding the balance of cognition between technology and humanity.

The Application and Impact of AIGC in the Audio Industry

3. Strengthening Deep Cooperation Between the Audio Industry and Related Fields

Audio industry professionals believe that the current application of AIGC technology is still in its infancy, and its application in the audio field is far below expectations. However, AIGC has already shown tremendous potential in audio creation and dissemination, as well as human-machine collaboration, with sound becoming an important bridge for digital connections between people and between people and machines.

Bringing more opportunities for digital audio. The forms of digital audio product services are diverse, covering innovative fields such as “audio + film and television,” “audio + live streaming,” “audio + publishing,” and “audio + smart devices.” AIGC technology will further empower audio platforms, audio anchors, audiobooks, and intelligent vehicle networks in terms of production and dissemination value, helping to develop the economic value of digital audio across all media and dimensions.

Promoting deep cooperation in smart hardware development. AIGC technology is promoting deep cooperation between the audio industry and smart hardware, such as smart speakers, in-car systems, and smartwatches, providing users with more convenient accompanying scene services.

Supporting the closure of business models and intelligent marketing. Through AI technology, audio platforms can better integrate with intelligent marketing to achieve closed-loop services across the entire chain, such as creating virtual product demonstrations and sales, virtual customer service, and after-sales support, allowing users to consult questions, provide feedback, and suggestions in the metaverse, thus providing users with more comprehensive products and services.

(Author’s unit: National Radio and Television Administration Development Research Center)

Supervised by: Yang Mingpin

Rotating Chief Editor: Zhang Miaomiao

Post-editing: Jiang Hui

The Application and Impact of AIGC in the Audio Industry

Leave a Comment