Welcome to follow 【Cheyun Network】
BMW has always adhered to a principle of “no distraction” in the design of the iDrive (Intelligent Driving Control System). In 2011, Bernarhd Neidermaier, then head of BMW’s human-machine interface, mentioned in a discussion that the designers had a horizontal reference line in mind—display-related controls are on the upper line, while control-related ones are on the lower line. They strive to separate the two to minimize the time spent looking down at the screen and ensure that drivers can control functions with buttons without leaning forward or raising their arms.
▲This design, which separates display from control, has been consistently used.
Another way to help drivers avoid distraction is to introduce voice interaction in the car. After all, language is the closest to human communication habits; if the system speaking to you is smart enough, a single command can clearly point to a function, saving the time spent repeatedly checking the screen and manually operating it.
At the 2016 CES (Consumer Electronics Show), BMW introduced a natural language understanding (NLU) system developed for Chinese users. This technology was first applied in the 3 Series and 7 Series and later extended to more models under the brand. Over the past few days, Cheyun has experienced the natural voice recognition of a BMW 320i, gaining a more intuitive understanding of the performance of this feature. In addition to the regular performance of this voice system, we also set a few slightly challenging “additional questions” for it, so everyone can see how the system performs.

First, let’s get to know BMW’s natural voice system
Natural voice technology is not achieved overnight; BMW has introduced in-car voice technology for many years. Since voice recognition is part of the iDrive system, the voice control experience is closely tied to the entire in-car entertainment information system and some vehicle assistances (like air conditioning, etc.).
According to available data, the first generation of iDrive could control navigation and music via voice. Later, BMW added voice dialing in 2007 and text-to-speech email and SMS reading functions in 2011.
At this stage, BMW was still using“item-based voice”, where the control mode was quite similar to using a mouse to click through system pages, and every time you had to strictly follow the prescribed phrases. If you wanted to listen to ColdPlay’s Viva la Vida on your phone, you would have to say, “USB – by artist – ColdPlay – by song name – Viva la Vida” in order.
In 2012, the voice function received a significant update. BMW announced it would start using Nuance’s Dragon Drive. This is a“local + cloud” hybrid system. With the powerful computational capabilities of the cloud, voice interaction was no longer limited by local storage and computing power, resulting in significant improvements in voice recognition rates and speeds.
The voice interaction demonstrated in the video at that time had already begun to simplify; although it still required saying the command category “navigation” first, users could now input the full destination address in one go on the address page.
▲The iDrive system of the 2018 BMW 320i allows users to choose whether to enhance voice interaction experience through server-based voice recognition.
More importantly, cloud services have made semantic understanding much easier. When BMW unveiled the latest generation ofnatural voice systems in 2016, the interaction was already striving to approach conversations between people. During navigation, the system could now provide a very close list of addresses based on vague commands like “nearby gas stations”.
Moreover, during the interaction logic, multiple rounds of voice communication to complete a task do not require repeated wake-ups. As long as the voice indicator in the upper left corner of the interface remains lit, you can continue to follow the prompts and completely assign the task to the system. Therefore, in terms of user experience, the mechanical feel of BMW’s early voice interaction has been greatly diluted.
The natural voice recognition released in 2016 can be awakened using the voice button on the steering wheel and can be used for navigation, searching for POIs, opening music broadcasts, making phone calls, sending messages, querying vehicle and life information, etc. Within each category, voice can also control more detailed sub-functions.
The video above allows you to intuitively experience the entire voice interaction process. The system’s responses and reminders are voiced by a female voice that sounds quite pleasant, and the success rate of understanding tasks is relatively high. Under the condition of connecting to the server, the delay in voice recognition and semantic understanding is acceptable, and the system’s thinking time is almost imperceptible.

Four Voice Test “Additional Questions”
In addition to the regular experience, we prepared a few additional questions for this voice system, which are more like interesting extreme challenges, allowing everyone to discover more about the system.
1. Strong Noise
For in-car environments, noise reduction is a very important topic. The unique sounds of the engine while driving, wind noise when the windows are open, and the conversations of other passengers can all interfere with the final voice interaction effect.
The conventional test for the noise reduction capability of in-car voice systems is to test with the windows open while driving at high speeds. We chose a more stringent condition—using the in-car voice function while parked with the air conditioning at full blast. The noise produced by the air conditioning at maximum airflow is very loud, and the car’s system and air conditioning vents are in close proximity, which can cause significant interference.
The second half of the above full-function video (starting at 05:20) is a functional experience video with the air conditioning at full blast, filmed in one take, with cloud recognition turned on. Among them, turning the air conditioning on and off and filtering POI results by distance are functions that the in-car voice originally did not support, and the reason for the failed return results was not caused by noise. Overall, BMW’s in-car voice performance is good even in very adverse environments.
2. Interrupting and Modifying
After getting familiar with talking to the machine, the system’s voice replies seem overly lengthy. You may subconsciously interrupt it before it finishes speaking. BMW’s natural voice recognition supports interruptions in some instances, balancing the need for explanation and simplifying communication.
At the same time, when errors inevitably occur in voice recognition and semantic understanding, manually correcting erroneous results can be quite frustrating. The processes of deleting and re-entering are cumbersome, while using the BMW natural voice recognition system to modify voice inputs can improve communication efficiency.
In the task of making phone calls, we randomly tested the performance of continuous interruptions and modifications, with the test conducted while parked with the windows closed and cloud recognition enabled.
3. No Internet
This in-car voice product is a hybrid navigation system. Currently, the navigation POI (points of interest) data from this system is primarily sourced from the cloud. To avoid poor experiences caused by being in areas with weak signals, such as tunnels or parking lots, the vehicle will also store some navigation data locally. However, generally speaking, output results will take longer.
▲The performance of POI searches in a non-networked state is shown in the image above, which only displays the search steps and does not represent system response time (the entire process takes about 20s).
We turned off the network and tested the performance of POI searches in a non-networked state. The results showed that local processing time had significantly increased, with each step of the system’s thinking displayed on the screen. First, the delay caused by being offline is not likely to cause too much trouble for searches while parked, but if operated during driving, it could lead to missing an intersection and needing to re-plan the route. As for the approach of displaying every step to users, we believe it helps users understand the system. After all, in a local state, the extended voice recognition process can easily lead to misunderstandings of “function unavailable”.
4. Multi-intent Commands
For single-intent voice commands, BMW’s system already shows good recognition performance. However, this does not satisfy our curiosity. According to our habits, we often use multi-intent voice commands, such as “remind me to call Cheyun tomorrow afternoon”; the challenge for the system is to distinguish whether to “call Cheyun” or “add a reminder to the memo”.
In the video below, we tried a set of commands using different combinations of place names, prompting the system to discern our true intent. For this question, the system failed to provide logically correct results. Semantic understanding is a key breakthrough area for current voice technology; to enable the voice system to truly recognize the driver’s intent, more data and more powerful neural network architectures are needed.
– Navigate to Tiananmen – Navigate to Xidan – I want to go to Tiananmen, not Xidan – I want to go to Xidan, not Tiananmen – I don’t want to go to Tiananmen, I want to go to Xidan – I don’t want to go to Xidan, I want to go to Tiananmen – I don’t want to go to Tiananmen or Xidan – I want to go to both Tiananmen and Xidan

Cheyun Summary
BMW is one of the first car manufacturers to use voice interaction in mass-produced vehicles, and this feature has been extended to many models under the brand. The continuous development of voice technology creates a more intelligent, situational, and proactive experience for drivers. As the role of smart cars evolves, the role of voice will undoubtedly become increasingly significant.
Cheyun in collaboration with 30 industry experts
Crafted with care over 18 months
Li Jun, An Qingheng, Chen Anning contributed the preface
Xu Heqi, Xu Liuping, Li Shufu, Wang Chuanfu, Feng Xingya, Li Bin, Tang En, Zhao Fuquan, Wang Xiaochuan, Wu Sheng jointly recommended
Cheyun Recommendation