
Since the 18th National Congress of the Communist Party, China has attached great importance to the development of the digital economy and actively promoted the construction of a digital China.The vigorous development of the digital audio industry benefits from a series of policies that safeguard it. In 2017, the Ministry of Culture issued the “Guiding Opinions on Promoting the Innovative Development of the Digital Cultural Industry” to support the development of the digital cultural industry;In 2019, the National Radio and Television Administration, the National Internet Information Office, and the Ministry of Culture and Tourism jointly issued the “Regulations on the Management of Online Audio and Video Information Services” to further standardize the network audio industry environment;In 2021, the “14th Five-Year Plan for the Development of Radio, Television, and Online Audio-Visual” was introduced, pointing out the tasks and directions for the high-quality development of the digital audio industry;In 2023, the Central Committee of the Communist Party of China and the State Council issued the “Overall Layout Plan for Building a Digital China,” calling for vigorous development of online culture and strengthening the supply of high-quality online cultural products.The National Radio and Television Administration and the Ministry of Culture and Tourism have successively issued normative documents to combat digital audio piracy, protect intellectual property rights, standardize industry order, create a good competitive environment, and strengthen the supervision of online platforms to protect audience rights.With the development of the digital economy, the ecological system of the digital audio industry is gradually taking shape, promoting a virtuous cycle of the “ear economy.”
Digital audio platforms connect individuals in different spaces through auditory means, achieving “pan-connection” between individuals. Listeners with similar interests also form a strong connection circle. Additionally, digital audio has strong accompanying, personalized, and immersive characteristics, making it suitable for listening in private places like homes as well as in mobile spaces. Even in noisy public places like subways and stations, listeners can create a relatively independent listening space by wearing headphones. The auditory return of the audiovisual communication field breaks the hegemony of visual symbols, achieving a shift from “eyes” to “ears” and from “visual” to “auditory,” highlighting the communication value of digital audio. Under the combined influence of policies, technology, market, and user demand, the media forms of digital audio are becoming more diverse, and the industry model is continuously expanding, forming a new audio communication situation characterized by the simultaneous development of multiple tracks, flexible application of various scenarios, and deep integration of media.
1. The Landscape of China’s Digital Audio Industry
In recent years, the number of digital audio platforms has increased, and the market size has expanded. The “ear economy” has become a new consumption trend in the digital audio market. According to statistics from Cailixin, the number of online audio users in China reached 692 million in 2022, with over 300 million monthly active users. The arrival of the AIGC era has ushered digital audio production and dissemination into a new stage of intelligence. With the comprehensive advancement of media convergence strategies, radio stations at all levels are taking mobile as a priority, actively building intelligent digital audio platforms, expanding audio dissemination systems, and forming large-scale broadcasting audio platforms represented by Cloud Listening, Archimedes, and Big Blue Whale. They are exploring the use of new technologies such as virtual digital humans and intelligent language recognition models to open up new tracks in digital audio.
First, digital audio forms a unique industrial chain.The digital audio content industry is characterized by intelligence, scenario-based, interactivity, service orientation, and diversification. Digital audio products such as digital music, audiobooks, voice dating, and AI voice interaction are continuously innovating, gradually forming a mature industrial chain. The digital audio industrial chain is divided into upstream, midstream, and downstream. Upstream refers to creative production and content creation, providing original materials for content production, such as copyright resources; midstream refers to dissemination and distribution channels, such as online audio platforms like Himalaya and QQ Music; downstream refers to users and terminals, such as mobile phones, computers, smart wearable devices, and smart home products. The “ear economy” effectively fills the fragmented time in the audience’s daily life with its unique accompanying and presence, expanding the added value of digital audio beyond listening—connection and companionship.
Second, digital audio platforms develop both economic and social dissemination value.On one hand, digital audio dissemination has public and philanthropic characteristics, with platforms actively promoting mainstream values, disseminating positive social energy, and carrying out various public welfare activities, enhancing the social and cultural value of the platforms. For example, some music platforms conduct various public welfare activities around themes of musical care, music education, and cultural heritage, such as recording audiobooks for blind children. On the other hand, with the awakening of users’ copyright awareness and the increase in their willingness to pay for knowledge, digital audio is gradually opening up monetization pathways. Monetization methods for audio traffic include user payments, advertising, community economy, and IP economy. The forms of digital audio product services are diverse, covering innovative fields like “audio + film and television,” “audio + live streaming,” “audio + publishing,” and “audio + smart devices,” fully developing the economic value of digital audio across all media and dimensions.
2. AIGC Empowers the Dissemination Value of Digital Audio
Digital music, which is recorded, produced, transmitted, and played back in digital form, possesses music intellectual property rights. The digital transformation of China’s music industry has lowered the barriers to entry and production costs for music creation, showing characteristics of intelligent production, service-oriented products, and market descent. AIGC, or Artificial Intelligence Generated Content, is a technology that generates content through algorithms powered by artificial intelligence. The year 2022 was referred to as the “AIGC Year” in the industry, as large language models represented by ChatGPT were implemented. AIGC brings disruptive changes to digital audio dissemination with its multimodal and super-media content production approach, with digital music becoming one of the typical fields of AIGC creation. In May 2023, MyVocal.AI, which can clone human voices, became popular online, triggering a wave of creation across the internet. In addition to singing songs, AI has also performed well in the stages of composing, arranging, and mixing in digital music production. AI music production software such as Amper Music, Soundful, and OpenAI-MuseNet has also attracted global users’ attention.
AICG in the field of digital music is not a new phenomenon in recent years. In 2007, Yamaha developed a voice synthesis software called VOCALOID that allows users to synthesize human-like singing by inputting pitch and lyrics; in July 2012, China’s first independently created virtual singer, Luo Tianyi, officially launched her voice library, gradually becoming active in the public eye; in 2019, the second season of CCTV’s “Classic Recitation” launched an online interactive tool capable of turning poetry into songs. Due to the high production costs and entry barriers of these music production tools, the content produced was relatively niche and did not receive widespread application at that time. As AI digital music production technology continues to advance, more and more creators are beginning to use AI to assist in music production and complete their creative output. A rich sound source library, natural simulation effects, and easy-to-understand operational technology have propelled the intelligent development of digital music production.
For online audio platforms, optimizing platform services to enhance user acquisition and stickiness while ensuring audio quality is key to achieving good returns. Musical works encapsulate the subjective values of the creator’s mood, thoughts, and aesthetics, evoking emotional resonance in listeners. Platforms allow listeners to find like-minded friends through features like comment sections, listening together, private messaging, following, liking, and sharing, creating a “high mountain and flowing water meeting a friend.” According to iMedia Consulting data, over 70% of users are willing to meet friends with similar interests through online audio, making socially strong online audio communities popular among listeners. In addition to providing social services, many platforms are eager to plan unique activities, such as “Test Your Musical Temperament,” “My Annual Listening Report,” and “My Annual Musician,” which analyze user data to accurately push audio content, greatly enhancing the fun and comfort of platform use.
The prevalence of short video media has also driven the music market to descend. Relevant data shows that consumers in third-tier and below cities account for 54.4% on Kuaishou and 46% on Douyin. As an audiovisual medium, short videos require suitable background music during production to enhance dissemination effects. The rise of short video platforms has significantly influenced the evaluation indicators of digital music in China, with factors such as whether the melody is catchy enough, whether it has enough rhythm, and whether it resonates becoming important considerations during the creation process. Douyin and Kuaishou have gradually become breeding grounds for popular songs, with a number of internet singers emerging through short video media, injecting new vitality into the market. The broad development prospects of the descending market have attracted major music platforms and record companies to shift their focus from competing for top musicians to paying attention to grassroots musicians with viral hits. The prosperity of the descending market is silently reshaping the market landscape of digital music, creating a decentralized industry atmosphere that opens up upward channels for musicians, providing listeners with more music choices and promoting the prosperous development of the digital music market.
(1) Audio Hosts Create Value in the Live Broadcast Economy
Digital audio platforms integrate online economic resources, achieve traffic monetization and product promotion, giving rise to a large number of content producers—audio hosts. Compared to the booming video live streaming sales, voice live streaming has certain personalized and emotional dissemination characteristics, making it more suitable for professional and customized content, which determines its niche and vertical development path. Currently, audio hosts’ content production shows a trend of vertical segmentation, including news broadcasting, companionship and chatting, audiobooks, storytelling, online PIA plays, karaoke live streaming, etc. Major audio platforms have established a series of content zones based on this, creating top content according to their positioning and analyzing audience preferences for push recommendations. For example, NetEase Cloud Music recommends audio live streaming rooms based on listener preferences in the comments section of songs. Additionally, to align with audience media usage habits, many digital audio companies have developed dedicated audio apps that focus on specific content, such as Xijiing for online PIA plays, TT Voice for voice companionship and chatting, and Xiaoyuzhou for podcasts. Niche audio hosts can better leverage the social attributes of audio platforms, deepening their engagement in segmented markets and exploring the infinite possibilities of the “voice +” economy.
Moreover, the iteration of AI technology and the metaverse craze accelerate the further upgrading of China’s virtual digital human industry. According to iMedia Consulting statistics, in 2022, the market size driven by China’s digital virtual humans reached 186.61 billion yuan and the core market size reached 12.08 billion yuan, expected to reach 640.27 billion yuan and 48.06 billion yuan respectively by 2025. Currently, common virtual digital humans in the market include virtual employees, virtual hosts, and virtual idols. Based on the content production field, digital virtual hosts can be divided into media-type virtual hosts, social (entertainment) virtual hosts, and e-commerce virtual hosts. Media-type virtual hosts target news media, capable of replacing real hosts for news broadcasting, possessing core advantages of all-weather broadcasting, all-scenario broadcasting, and multi-language broadcasting, and continue to enhance self-learning and communication abilities with technological advancements. During the 2022 Winter Olympics, CCTV launched the virtual digital human “Dongdong,” AI sign language hosts, and weather hosts “Feng Xiaoshu,” showcasing China’s voice to the world through technological power. Social (entertainment) virtual hosts mainly engage in entertainment activities, enhancing user experience through exquisite appearances and real-time interaction capabilities. E-commerce virtual hosts can communicate with customers around the clock, saving labor costs.
(2) Audiobooks Highlight Cultural Dissemination Value
Audiobooks are media products and services created primarily using spoken language as the main dissemination symbol, supplemented by music, sound effects, images, and text. AI voice technology achieves text-to-speech conversion for audiobooks, improving production efficiency. The market size for audiobooks continues to expand, with the long-tail effect receiving attention. According to statistics from the China Audio-Video and Digital Publishing Association, the overall scale of China’s digital reading industry reached 35.16 billion yuan, 41.57 billion yuan, and 46.35 billion yuan from 2020 to 2022. Currently, China’s audiobook market is primarily platform-driven, divided into comprehensive reading platforms and vertical reading platforms. The popularity of the audiobook market has prompted media and enterprises to enter the field, with intense competition among market players further driving the high-quality development of the audiobook industry.
Mobile media is the primary channel for audiobook dissemination. Audiobooks serve functions such as knowledge dissemination, online entertainment, and emotional companionship, meeting the audience’s needs for knowledge information. According to a survey by the China Audio and Digital Association, in 2022, 45.8% of audiobook users preferred to listen to audiobooks during home leisure; 45.1% preferred to listen while using public transport; over 40% of users chose to listen while driving; and nearly 40% of users chose to listen while exercising or working out. The convenience of listening terminals determines that users have a high degree of freedom when listening to audiobooks. Audiobooks can be listened to in a “fragmented” manner, whether it is waking up in the morning, commuting, or relaxing before sleep, allowing for seamless listening. Audiobooks primarily rely on sound signal dissemination, directly affecting the brain’s perception after being received by the ears. Excellent audiobooks not only convey knowledge but also transmit emotional power, possessing strong infectiousness.
The Knowledge Value of Audiobooks is Also Being Recognized, as News, Literature Classics, Historical Stories, and Health-related Audiobooks are Favored by Listeners.According to the Cailixin “2022 Insights into User Behavior and Demand in China’s ‘Ear Economy'” report, knowledge and workplace-related content generates greater consumer interest, and users show a high willingness to pay. Data from the 2021 CCdata national audio user special survey shows that users who enjoy historical and literary audiobooks account for 40% and 38.1% of the total, with 59.92% of audiobook users listening to audio broadcasts for the purpose of learning knowledge.
(3) Intelligent Vehicle Networking Enhances Mobile Audio Value
With the comprehensive application of 5G technology and the expansion of the market share of new energy vehicles, the era of intelligent connected vehicles has arrived. According to iMedia Consulting statistics, in 2022, the size of China’s in-car music market reached 16.98 billion yuan, expected to exceed 35 billion yuan by 2025, with the continuous increase of audio applications in in-car terminals, driving the transformation of in-car audio dissemination.
Intelligent Connected Vehicles (ICV) refer to new-generation cars equipped with advanced in-car sensors, controllers, and actuators that integrate modern communication and network technologies, possessing functions such as complex environmental perception, intelligent decision-making, and cooperative control. They achieve information exchange and sharing among vehicles, roads, pedestrians, and the cloud, ensuring safe, efficient, comfortable, and energy-saving driving, ultimately replacing human operation. The Internet of Vehicles, leveraging next-generation information communication technology, integrates in-vehicle networks, inter-vehicle networks, and in-vehicle mobile internet into one, providing drivers with real-time voice interconnectivity, voice recognition, traffic analysis, and audiovisual entertainment services. Intelligent connected vehicles transform from mere transportation tools into new digital media terminals. The human-machine relationship in intelligent connected vehicles becomes closer and integrated, emphasizing collaborative operations between humans and vehicles, forming intelligent decision-making through real-time data management. Intelligent connected vehicles establish a new media environment for people, vehicles, and the environment, constituting a mediated living space.
The intelligent vehicle networking is one of the main tracks for the new development of broadcasting integration. In the technological environment of 5G and artificial intelligence, intelligent vehicle networking can achieve ultra-high-speed, large-link, and low-latency information services, generating radio programs such as news, information, traffic, weather, and music through intelligent capture, intelligent arrangement, intelligent broadcasting, intelligent monitoring, and cloud distribution. Broadcasting has inherent advantages in the field of audio dissemination. In terms of content production, broadcasting has cultivated a large number of professional audio workers, focusing on high-quality content and possessing rich audio content resources. In the AIGC era, new broadcasting urgently needs to seize opportunities, cultivate the audio industry of intelligent vehicle networking, and open up new paths for broadcasting transformation. CCTV has launched the mobile audio platform “Cloud Listening” in cooperation with domestic automobile manufacturers, integrating all broadcasting frequency content resources into the intelligent vehicle networking system. Anhui Broadcasting and Television Station has leveraged local industrial advantages to launch the in-car mobile platform “Eight Directions Radio Station,” which has reached cooperation with automakers such as Jianghuai Automobile and Chery Automobile, marking the beginning of a new journey for audio mobile dissemination.
(4) Super Media IP Produces Hit Audio
IP (Intellectual Property) originally refers to “intellectual property” but in the internet content industry, it refers to cultural industry copyright carriers with core creativity and a broad audience. China’s IP industry has gradually developed IP entertainment products in film and television, audio, games, animation, theme parks, etc. Major audio platforms compete and cooperate around IP. Professor Henry Jenkins of MIT proposed the theory of transmedia storytelling, suggesting that in the current fragmented communication environment, transmedia storytelling disassembles narrative content and publishes it through multiple media, with each medium playing a unique role in disseminating information. Based on this concept, audio dissemination creates hit content using the “OSMU” (One Source Multi Use) model, maximizing IP value by extending the industrial chain through the fusion of audio, film and television, animation, games, etc., continuously exploring new directions and paths for IP development during the multi-party collaborative development process, ultimately forming a super IP content matrix across platforms and dimensions.
A typical case of building a pan-entertainment industry chain through transmedia storytelling is the well-known media and entertainment company Walt Disney. Disney fully leverages the advantages of multi-industry collaborative development, not only producing many high-quality film and television programs but also creating a number of influential IP music pieces, such as the song “Let it Go” from the movie “Frozen,” which has gained global popularity. Major digital media in China are also gradually recognizing the importance of transmedia storytelling, extending the added value of their popular IP through audio creation. For example, the songs “Goddess Splitting the View” and “Water Dragon Chant” launched by miHoYo’s “Genshin Impact” and “Honkai: Star Rail” have ignited the internet, greatly increasing the recognition and appeal of the games. New media platforms such as the Communist Youth League Central Committee have also attempted to promote mainstream values and advocate for positive social energy through theme songs.
3. Development Trends of Digital Audio in the AIGC Era
With the rapid development of digital technology, AIGC shows great potential in audio creation, dissemination, and human-machine collaboration. Sound will become an important bridge for digital connections between people and between people and machines. Looking ahead to the future of digital audio, several issues are worth pondering. For example, what impact will the efficient and standardized production of AIGC have on the emotional value of digital audio? This remains to be explored; will AIGC squeeze the survival space of producers in the digital audio industry and challenge the human subject’s position, blurring the unique creativity and perception that belongs to humans in repeated data processing? Currently, AIGC is still in its infancy, and some infringement behaviors arising from the use of AIGC, such as “AI face-swapping” and “AI sound imitation,” are still not effectively regulated. The governance of AIGC needs to be further improved from the perspectives of regulations, technology, and ethics. AIGC is the crystallization of human wisdom and will bring significant changes to various industries, also serving as an important driving force for the development of digital audio. The digital audio industry will inevitably undergo a process of sifting through the sands, allowing high-quality audio content and services to gradually settle down, bringing blessings to people.
Author: Tong Yun (Associate Professor, School of Journalism and Communication, Anhui University), Li Ruomei (Graduate Student, School of Journalism and Communication, Anhui University)
Source: Modern Audiovisual
Editor: Zhong Qihua, Chen Gang, Chen Dan
