At some unknown time, voice command red packets quietly sparked a craze on our QQ and WeChat. You often see classmates on the road, in the restroom, or in their dorms repeating the same phrases into their phones.
Examples include:
Blackening black fertilizer gray will evaporate gray black taboo for black gray flower

Another example:
There are big west several scratches in the zoo, and angry little brain axe.

And such:……@#$%&^*!~
Various obscure characters and tongue twisters emerge endlessly. Voice command red packets bring us a lot of joy. While everyone is keen on sending and grabbing red packets, have you ever wondered about the principles behind them? Today, I will introduce to you the secret behind voice command red packets—the increasingly mature speech recognition technology.
Speech recognition technology refers to converting the vocabulary content of human speech into a format readable by computers, such as keystrokes, binary codes, or character sequences. In simple terms, speech recognition is about converting human language into a language that computers can understand.
Therefore, how to make computers “understand” what we say is the problem that speech recognition technology needs to solve. As we know, sound is essentially a wave. Computers can process audio files into pure waveform files, then segment the sound (which can be understood as cutting the sound into small pieces, each referred to as a frame), transforming each frame’s waveform into a multidimensional vector. After that, the computer can extract parameters from existing acoustic models, determine the state of this multidimensional vector based on probabilities, and then combine the states into phonemes, and combine phonemes into words. At this point, the process of speech recognition is basically complete. The subsequent task of converting this into text or other output for the intended purpose is relatively simple work.
While the principle sounds simple, there are numerous challenges to overcome during execution. Thus, speech recognition technology has not yet reached perfection. For example, when receiving a voice command red packet, if it shows that recognition was unsuccessful, it does not necessarily mean that the “standard pronunciation” was poor; it may be due to the not fully matured speech recognition technology failing to accurately identify the command.
Moreover, there may be bugs in the voice recognition of QQ red packets for some obscure characters, as the model database may not be sufficiently rich. For instance, with the phrase “砯崖转石万壑雷” (a tongue twister), no matter how standard the pronunciation of “砯” is, one cannot receive the red packet.
Speech recognition technology has only begun to mature in recent years, but its applications in daily life are already widespread. In addition to voice command red packets, WeChat’s ability to convert speech to text, Apple’s Siri assistant, and language translation devices all rely on speech recognition technology.
Technological breakthroughs will promote the emergence of new technologies, and new technologies can bring changes to our lives. Technology changes life; we can never predict what surprises and changes the next step will bring, perhaps that is the charm of technology itself.
Image source: Network
Text source: School New Media Studio – Cai Xincheng
Layout design: School New Media Studio – Cong Zuoyu
Editor: Song Jing