Personalized Voice Recognition: Solving Homophone Issues in Speech Input

Thanks to its convenience that liberates hands, voice input has been favored by the public since its inception. However, when one says “Cheng Zhi”, it is often recognized by the voice input method as “Cheng Zhi”; the desired crush named “Fan Tong” is often presented as “Fan Tong”; whether you are inputting “Zi Xuan”, “Zi Xuan” or “Zi Xuan”, the previous voice input was really “confusing”.

The voice input that “does not understand me” not only reduces communication efficiency but can also easily lead to awkward situations. It must be said that in the information age, an input method that can “understand” our speech and truly “know me” is particularly important! Recently, Sogou Input Method launched the innovative “Personalized Voice Recognition”, which can help everyone solve this problem.

Personalized Voice Recognition: Solving Homophone Issues in Speech Input

As long as you download the latest version of Sogou Input Method and log in to your account, the system can automatically create a personalized vocabulary just for you.Thus significantly improving the accuracy of voice recognition and reducing the manual modification rate during the voice input process.

For example, when a user wants to send the message “Cheng Zhi has arrived in Beijing” via voice input, they only need to log in to Sogou Input Method in advance, and its “Personalized Voice Recognition” can automatically and accurately recognize the word “Cheng Zhi” with obvious personalized characteristics of that user based on previous input data, rather than recognizing it as the commonly used “Cheng Zhi”.

The “Personalized Voice Recognition” is based on Sogou Input Method’s vast user data and Sogou’s leading voice technology, which greatly enhances the accuracy of voice recognition.

At the same time, “Personalized Voice Recognition” has achieved almost real-time recognition conversion speed, allowing users to enjoy a natural conversation-like fluency during voice input, greatly optimizing the user experience and liberating users’ hands.

For the currently prevalent voice recognition technology, the recognition accuracy in daily scenarios can basically “understand” the user’s expressions;however, once the accuracy reaches a certain height, every additional 1% improvement faces tremendous technical challenges.

Sogou Voice leverages the big data advantage accumulated by Sogou Input Method over the past decade, and based on this, Sogou utilizes big data mining and processing to significantly enhance voice recognition accuracy, which is a level that other companies and products find difficult to match.

While improving recognition accuracy, Sogou Voice’s innovative technical process ensures that the cloud system can greatly guarantee the automatic processing speed of user personalized characteristics, achieving the entire learning process of personalized features in “milliseconds” automatically.

At the same time, Sogou Voice Recognition has taken the lead in the industry by fully utilizing cutting-edge deep learning technology, including end-to-end acoustic models based on DTSS (Deep Transformer-based Sequence to Sequence model), neural network language models, and intelligent punctuation prediction, which are the cornerstones of Sogou Voice Recognition’s industry-leading status.

The “Personalized Voice Recognition” precisely optimizes for users’ voice input habits, resulting in a nearly 40% reduction in character error rates for commonly used words while ensuring general recognition accuracy.This greatly reduces modification costs and is a crucial step in overcoming the “technical barrier” of Chinese voice recognition.

Voice recognition has always been the first step in human-computer interaction. In various industries closely related to users’ daily lives, such as smart home, smart education, and smart healthcare, enabling smart devices to “understand” our speech is a prerequisite for achieving natural interaction.

It can be said that mastering voice recognition is equivalent to having a golden key to enter the AI world, which even directly influences the future development process of intelligent society.

This time, Sogou Voice’s pioneering “Personalized Voice Recognition” can be said to have once again broken through the technical bottleneck of voice recognition, increasing the industry’s confidence in achieving “one person, one face” in human-computer interaction.

With the continuous breakthroughs in voice recognition technology and the continuous enrichment of user personalized content, Sogou may form a “consumer-grade” voice personalization ecological resource, fully realizing customized voice input.

This allows every user to utilize a voice recognition technology that “understands them better”, significantly enhancing human-computer communication efficiency in life, travel, and work, helping people express and obtain information more easily.

Leave a Comment Cancel reply