Artificial intelligence technologies represented by voice recognition and voice synthesis have been widely applied in various fields such as telecommunications, media, and government services.
01
Methods
News
Voice Synthesis Technology
Voice synthesis technology enables the conversion of text into speech, allowing audio to be read out fluently. It is also known as Text to Speech (TTS) technology.
The voice synthesis system comprises two main functional modules: the front-end module and the back-end module. The front-end module provides linguistic information related to the input text to the back-end module, which ultimately outputs the synthesized result. For example, in Chinese voice synthesis, the front-end module’s sub-modules such as regularization, part-of-speech prediction, and disambiguation of homographs analyze the text, and the analysis results are processed by the back-end module to generate speech waveforms. The back-end module includes two types of voice synthesis methods: statistical parameter modeling and waveform concatenation.
Parameter synthesis models language acoustic features and duration information contextually during the training phase and predicts acoustic feature parameters using duration and acoustic models during the synthesis phase.
News
Voice Recognition Technology
Voice recognition is another branch of pattern recognition, closely related to disciplines such as mathematical statistics, linguistics, and phonetics. The aim is to enable machines to “understand” human language.
On one hand, it can convert spoken language into written text; on the other hand, it can understand the meaning and make correct identifications.
Voice recognition technology is based on three fundamental principles:
1. The encoding method of speech signals, where language information is encoded according to the time-varying patterns of short-term amplitude spectra;
2. The readability of speech;
3. The inseparability of speech interaction.
Building a voice recognition system requires consideration of both training and recognition components. The front end of voice recognition primarily focuses on endpoint detection to remove unnecessary silence and non-speech sounds.
The back end extracts feature vectors based on the information generated by the front end using language and acoustic models for statistical model recognition. The adaptive feedback module within the back end can achieve self-learning, correcting errors in the voice model and optimizing the accuracy of voice recognition.
02
Design
Applying machine learning and artificial intelligence in the field of news communication can facilitate voice reading and interview transcription, as well as related voice model research.
News
Scenarios
Voice Recognition Application Scenarios.Voice recognition application scenarios enable the rapid organization of voice files, adopting a minimalist design with user experience as the top priority. It supports multiple file formats for upload, allowing users to pre-set dictionaries for uncommon names, places, and technical terms in the uploaded files to optimize detection accuracy.
For scenarios such as interviews and meetings, it supports filtering of filler words to enhance the effective information in the transcribed text. It also supports exporting transcription results and viewing and re-editing transcribed results in historical records.
Voice Synthesis Application Scenarios.Voice synthesis application scenarios enable editing and reading of reports. In these scenarios, users can customize configurations, selecting Chinese/English voice and tone based on text language, and configuring speech speed, pauses, and the reading of numbers and values. Areas of dissatisfaction with pronunciation can be reconfigured, and content can be exported in mp3/wav formats.
News
Architecture
The architecture design is divided into four layers: application access layer, application service layer, core capability layer, and basic support layer.
Application access layer: Interfaces with service connections, supporting network interaction adaptation and voice processing, providing services for voice recognition and synthesis;
Application service layer: Integrates the latest voice recognition and synthesis capabilities, providing scalable services and unified management and maintenance;
Core capability layer: Deploys the AI core engine for self-optimizing resource management, providing core service capabilities for voice recognition and transcription;
Basic support layer: Based on cloud computing architecture, it unifies the scheduling and management of distributed hardware and storage resources.
03
Applications
With the development and implementation of technology, intelligent voice technology will not only be applied in existing application models. Through technologies such as voice synthesis, voice recognition, semantic understanding, and image processing, machines can also make emotional judgments about news, allowing users to feel the warmth within the news.
In live video broadcasts, real-time voice transcription can be integrated, overlaying subtitles during the broadcast to expand the audience reach. In recent years, breakthroughs have been made in advanced text analysis technology, personalized recommendation technology, and predictive technology in artificial intelligence. Text analysis technology based on natural language processing can empower journalists, enabling machines to possess “creativity” and form templates that can “create”, thereby enhancing journalists’ writing capabilities through a human-machine collaboration approach.
As a national team in the field of artificial intelligence technology and big data technology research, Zhongke Wenge is the strongest team with the most accumulation in media intelligence direction. It has over a decade of research in media intelligence technology, with cutting-edge theories, core technologies, and big data applications at the forefront both internationally and domestically. Its media intelligent Q&A robots, intelligent writing assistants, voice broadcasting, and image/video analysis technologies are leading, widely used in the State Council’s government affairs client and China Daily client. In addition, it has independently developed over 60 intelligent analysis and mining components, and more than 3000 algorithms. In the 2017 evaluation of national center business selection by the Ministry of Industry and Information Technology, its data processing performance ranked first among all technology providers, and media intelligence must rely on these foundational algorithms and components.
Core technologies of Zhongke Wenge for intelligent audio and video processing
During the media industry’s transformation period of integrated development, with the advancement of big data and artificial intelligence and the emergence of industry application results, related applications will also deepen.
Mutual
Promotion
Specialization
Zone
Strong national media think tank, positioned to provide decision-making reference services for leaders of governments and enterprises at all levels.
Since its establishment, relying on the technology of the Institute of Automation, Chinese Academy of Sciences, and global media database resources, it continuously enhances data mining and information analysis capabilities to provide users with various consulting services, with some reports delivered to relevant decision-making departments through the internal publication “News Public Opinion Reference”.
If you have consulting needs in the fields of media integration, intelligent communication, public opinion monitoring, reputation management, policy assessment, and the 14th Five-Year Plan, please contact us. We will provide you with customized services combining “technology + experts”.
We welcome media experts and senior researchers to contribute to the internal publication “News Public Opinion Reference”. We will provide generous remuneration at our discretion and submit relevant reports to decision-making departments.

Light UpLookingfor Kindred Spirits!
