In-Car Intelligent Voice Systems

In-Car Intelligent Voice Systems

“Hi, XXX, help me turn on the music,” “Hi, XXX, help me turn on cruise control”… Now, various intelligent voice interaction systems have been widely applied in people’s production and daily life, and cars, as an important means of transportation for people, are naturally no exception. Through voice assistants or integrated voice recognition systems, drivers can execute multiple tasks using voice commands, such as navigation, adjusting audio, making phone calls, etc. This is especially beneficial during driving, as the driver’s eyes and hands can focus more on the road and safe driving operations, enhancing the vehicle’s safety and convenience.

1. How Is Voice Interaction Achieved?

In-car voice interaction is achieved through an in-car voice recognition system and voice processing technology. The following steps outline the basic implementation logic of in-car voice interaction and the key applications of voice technology in vehicles.

Voice Capture: Vehicles are equipped with microphones to capture voice input from passengers inside. The microphones are typically placed in suitable positions within the car to effectively capture voice signals.

Voice Recognition (ASR): The in-car voice recognition system converts the captured voice signals into text. This involves voice recognition technology that uses deep learning and other methods to analyze and decode the voice, transforming it into understandable text.

Command Interpretation (NLP): The text generated from voice recognition is interpreted as specific commands or requests. This step usually requires Natural Language Processing (NLP) technology to understand the user’s intent and map it to the corresponding actions or functions.

Executing Actions: The interpreted commands are sent to the in-car system to execute the corresponding actions. This can include navigation instructions, adjusting audio, making calls, sending messages, etc.

Feedback and Confirmation: The system typically provides feedback to the user via voice or screen display, ensuring that the user’s commands are correctly understood and executed. This helps improve user experience and the reliability of interactions.

Voice Synthesis (TTS): For situations requiring voice feedback to the user, the in-car system can also use voice synthesis technology to convert text into natural, fluent voice output.

To achieve these steps, in-car voice interaction systems typically rely on hardware and software components, including high-quality audio capture tools, audio processors, voice recognition engines, and natural language processing models. Additionally, considering the variability, uniqueness, and complexity of the vehicle environment, the system needs to possess certain noise resistance and environmental adaptability to ensure accurate recognition of user voice commands while driving and to capture sounds from different positions (such as the driver and passenger seats).

Throughout the implementation of voice interaction, multiple suppliers and roles may be involved, including providers of big data models, voice recognition entities, cloud service providers, and automakers directly facing consumers. Our practical experience in assisting automakers to ensure compliance with voice functionality indicates that the products and services of different entities have gradually begun to influence each other. Sometimes, from the perspective of vehicle consumers, it is challenging to clearly distinguish these functionalities, leading to a more cooperative and co-constructive relationship among parties.

2. Executors? Or Decision Makers?

In-Car Intelligent Voice Systems

(Image Source: Baidu Images)

Looking back at the long history of automobile development, the world’s first horse-drawn three-wheeled car was developed by Karl Benz in 1885. However, the development of cars equipped with voice functions is relatively young, having only emerged recently, and occupies a small portion of the entire automotive development history, yet it has progressed rapidly. The Jaguar S-Type sedan from 2000 can be considered the first car that allowed voice control of the car’s radio, CD player, mobile phone, and air conditioning system. Although it is impossible for every driver to converse with their car, the basic components of this technology can be applied to safer communication between drivers and the new generation of in-car electronic devices. Initially, voice functionality required users to input detailed content and information, and the voice assistant was essentially just a modular output of fixed programs, capable of only meeting specific purposes (such as voice navigation announcements) with very limited functionality; this type of voice functionality cannot truly be called genuine voice interaction.

In 2011, the first intelligent automotive voice assistant concept emerged in China, capable of voice interaction through cloud databases for matching recognition, significantly improving the accuracy and reliability of voice systems. Although the application of big data algorithms has not been fully popularized, there are still certain limitations in understanding issues, marking the beginning of intelligent voice interaction applications in the automotive industry.

If earlier “voice assistants” were merely executors of voice commands from drivers and passengers, as in-car systems apply cloud databases, algorithms, and computing power continue to improve, is it possible for intelligent voice interaction systems to develop self-learning capabilities and self-awareness, becoming decision-makers for vehicles? For instance, when the voice assistant detects an unavoidable accident, can it independently make judgments and decisions? Or, after long-term learning, can it assess your preferences and automatically play suitable music when you enter the car?

In August 2021, the Ministry of Industry and Information Technology of China issued the “Opinions on Strengthening the Access Management of Intelligent Connected Vehicle Production Enterprises and Products.” This document proposed a series of safety requirements for autonomous driving systems, including but not limited to design operating conditions, backup measures to minimize risk states, human-computer interaction functions, data recording, functional safety, and network security.

When formulating and implementing compliance for autonomous driving system algorithms, it is essential to carefully consider the safety thresholds for setting autonomous driving algorithms, in addition to the compliance of general artificial intelligence algorithms. From the perspective of protecting user and public safety, determining the required safety level for autonomous driving systems is crucial. How to determine the safety threshold for autonomous driving algorithms involves testing, verification, and validation processes. Due to the non-explainability, uncertainty, and technical quality uncertainties of autonomous driving algorithms, the results produced by the algorithms are probabilistic and non-repeatable, making it challenging to assess the accuracy and reliability of autonomous vehicle testing results. Such assessments may require simulation testing, closed scenario testing, real road driving tests, measurement paths based on road driving techniques, and post-accident measurements based on accident outcomes.

On the other hand, there are technical safety risks in human-computer interaction. For Level 3 autonomous vehicles and those transitioning from Level 3 to Level 4, the issue of vehicle takeover when the autonomous driving function and voice interaction system act as decision-makers is crucial and can affect driving safety. When the voice interaction system, autonomous driving system, and human driver share driving responsibilities, timely vehicle control remains essential. The mode of voice communication and the driver’s reaction time may impact the safety of transitioning control from the autonomous driving system to the human driver and other in-car activities. Furthermore, autonomous driving algorithms also face cybersecurity challenges, such as cybersecurity risks and attacks, as well as ethical safety challenges, such as how algorithms should make decisions and take actions in the event of an unavoidable accident, leading to dilemmas like whether to prioritize a child on the tracks or multiple children.

Therefore, under the existing regulatory framework, it may not be possible for in-car intelligent voice interaction systems to become the masters of vehicles in the field of autonomous driving, but the future remains uncertain.

3. Voice Interaction Systems of Leading Automotive Players

NIO – NOMI

In-Car Intelligent Voice Systems

(Image Source: Baidu Images)

NOMI, smarter and more fun. Based on powerful in-car computing capabilities and cloud computing platforms, it integrates a voice interaction system and intelligent emotional engine, allowing it to continuously learn and grow. According to the user manual of the ES8, the functionalities of the “NOMI In-Car Intelligent Partner” include basic functions, media, phone, entertainment, navigation, air conditioning, window control, seat, steering wheel heating, lights, and central control screen control. However, there is not much detail provided regarding the basic functions, only stating that “functionalities to be realized (Easter eggs to be discovered…)”. Additionally, when children are present in the car, there is a child intelligent scenario dialogue feature.

Xiaopeng Motors – Xiao P

In-Car Intelligent Voice Systems

(Image Source: Baidu Images)

The Xiaopeng G9’s integrated voice interaction solution deploys a relatively complete local voice dialogue system, capable of performing voice interactions even in weak or no network conditions. The local dialogue service covers nearly all functionalities except online resource retrieval commands, aiming to achieve over 600 functional points, with a very high degree of freedom in supported expressions. By deeply exploring the potential of the Qualcomm 8155 chip and pairing it with a fully optimized in-house voice engine, Xiaopeng is able to leverage the chip’s design performance for faster computation speeds in voice recognition and understanding while consuming less computing power and resources.

Li Auto – Li Xiang

In-Car Intelligent Voice Systems

(Image Source: Baidu Images)

The Li Auto L9’s intelligent cockpit is based on five-screen linkage and full-car voice interaction. The screen design considers every passenger’s position, while the car is equipped with six digital silicon microphones, employing a fully self-developed spatial positioning algorithm to achieve independent recognition across six sound zones, providing the best voice interaction experience.

4. Regulations on Generative Artificial Intelligence Services Domestically and Internationally

From the introduction of voice interaction systems by leading automotive companies, it is evident that in-car voice interaction systems are evolving towards more “intelligent,” “algorithmic,” “human-like,” and “self-learning” directions. These services, represented by large language models like ChatGPT, exhibit astonishing capabilities in understanding human language, human-computer interaction, text writing, code generation, and logical reasoning, with their generative results often rivaling or even surpassing human levels.

However, the application of generative artificial intelligence has also raised a series of potential risks, including privacy infringement, leakage of trade secrets, dissemination of false information, formation of information cocoons, and the potential for misuse in cybercrime. These issues have garnered widespread attention from regulatory agencies worldwide. For instance, the Italian Data Protection Authority (Garante) banned ChatGPT in its country for privacy violations. Meanwhile, the EU proposed the “Artificial Intelligence Act” for the first time in 2021, which applies to any product or service using artificial intelligence systems. The act categorizes applications based on risk levels, from minimal to unacceptable. High-risk applications, particularly in fields such as aviation and automotive, are classified as high-risk applications under the act.

In this context, in July 2023, the National Internet Information Office (Cyberspace Administration) released the “Management Measures for Generative Artificial Intelligence Services.” This regulation will, to some extent, standardize the development of intelligent in-car systems in China.

Conclusion

Ultimately, regardless of how advanced in-car intelligent voice systems become, whether they remain as executors of human commands or evolve to possess independent learning and decision-making capabilities remains uncertain. Currently, the efficiency, accuracy, and logicality of some generative artificial intelligence services can rival human abilities and even surpass them to some extent. In the future, the application of such artificial intelligence could potentially break through existing regulatory constraints, leading to a new generation of algorithmic revolutions, further propelling the development of in-car intelligent voice systems.

Author: Chen Meiyu, Full-time Lawyer at Zhejiang Kenting Law Firm

For Original Submissions

In-Car Intelligent Voice Systems

Leave a Comment