How Siri Understands Your Voice Commands

Source from AI Light and Shadow Society
How Siri Understands Your Voice Commands

Currently, many smartphones have voice assistants installed, such as Apple’s Siri and Huawei’s HiAssistant. These software act like electronic assistants, enabling conversations with their users and helping them perform simple tasks like checking the weather or making phone calls.

So, how do voice assistants understand user commands?

How Siri Understands Your Voice Commands
How Siri Understands Your Voice Commands

Here, we need to discuss two technologies: speech recognition and natural language understanding. The former allows machines to “hear” commands, while the latter enables machines to “understand” them.

First, let’s look at speech recognition. Machines collect human speech through microphones, resulting in sound waves like the one shown in the image below, with time on the horizontal axis and sound intensity on the vertical axis.

How Siri Understands Your Voice Commands

This time-domain vibration signal is difficult to analyze, so scientists transform it into the frequency domain, discovering that different phonetic contents exhibit different representations in the frequency spectrum. As shown in the image below, in a short segment of speech, the spectral features remain constant, and the spectral features at different times vary. Generally, vowels display distinct coarse horizontal stripes, while consonants do not show these horizontal structures. By utilizing these spectral characteristics, different phonemes such as a, o, and e can be identified. Combining these phonemes allows for the recognition of words and sentences. Modern speech recognition systems are generally based on complex statistical models to handle various pronunciation variations and correlations between pronunciations. Additionally, linguistic knowledge is needed to constrain the recognition results; for instance, “I was choked by a fishbone” is more likely than “I was choked by shark fin,” so the recognition system tends to output the former as the correct result.

How Siri Understands Your Voice Commands

Speech recognition produces a sentence but does not truly understand its content, so it cannot be said to have “understood” it. Natural language understanding technology employs extensive linguistic knowledge to solve this issue.

Taking voice assistants as an example, language understanding mainly consists of two aspects: the user’s intent and the key information needed to realize that intent. For example, when a user says, “Please tell me tomorrow’s weather,” the intent of this sentence is “weather inquiry,” and the key information included in this intent is “tomorrow” (not “please” or “tell”).

Understanding the user’s intent is generally based on intent classification: first, specific intents are defined, and then a model is designed to assess the likelihood that the input sentence belongs to each intent category. The intent with the highest probability is considered the user’s intent. If none of the intents seem likely, the voice assistant sweetly informs you, “Master, I didn’t understand what you said.”

Based on the identified intent, the system attempts to locate the key information related to that intent. A simple method is to search for words in the sentence that serve the corresponding functions. For example, the intent of “weather inquiry” requires the key information of “time,” and “tomorrow” happens to be the word representing time, thus the system understands that the user’s goal is the weather for “tomorrow.”

This article only scratches the surface of speech recognition and natural language understanding technologies; the actual technologies used are very complex. In fact, enabling machines to understand human language has been the goal of scientists since the birth of artificial intelligence. Half a century has passed, and this goal is slowly becoming a reality. However, even today, the portion of human language that machines can understand remains very limited. To achieve the great ideal of chatting pleasantly with machines, countless scientists continue to work day and night.

ByWang Dong, Tsinghua University

How Siri Understands Your Voice Commands

Scan to Join Us

Individual Membership: 1000 RMB/session (5 years)

Student Membership: 50 RMB/session (5 years)

Lifetime Membership: 2000 RMB

How Siri Understands Your Voice Commands

Leave a Comment