Voice Recognition Technology: A New Era of Medical Informatization

Click the blue text above to follow us, and click ‘Write a Comment’ at the end of the article to express your views.

e-Medical Pang Tao

　　The medical industry is gradually evolving from informatization to intelligent construction due to the continuous deepening of informatization and strong policy promotion. As a technology that makes information production more effective and efficient, voice recognition is undoubtedly the foundation of intelligence.

　　In fact, voice recognition technology has long been applied in the medical industry abroad, playing an important role in scenarios that require a lot of writing—whether in admission and discharge records, nursing reports, surgical records, or in the work settings of medical technology departments such as radiology and pathology. The development of voice recognition technology is driving the medical industry’s informatization towards intelligent development.

　　In China, a number of enterprises with independent intellectual property rights in voice recognition technology have keenly captured the vast prospects of the medical industry, providing customized models and professional corpora specifically for the medical field, making its application effect in medical environments even better. Voice recognition technology is deeply integrating into various nodes of medical informatization, and as these companies delve further into the medical industry, more application scenarios that combine with medical processes will emerge.

Starting Point of Voice Recognition Technology in the Medical Industry

*It is recommended to browse this image in landscape mode on your phone

　　Voice recognition technology was first applied in the healthcare field by Philips Electronics in 1994, mainly used in voice dial appointment, voice command control, and report entry.

　　Philips Electronics’ voice recognition system terminal SpeechMagic, aimed at the medical field, has now been seamlessly integrated by over 250 medical solution providers, achieving voice-to-text conversion in the medical field. Siemens Medical Solutions Group provided a personalized solution for Vienna Hospital, successfully optimizing the hospital’s medical text entry process with the voice recognition system, reducing the time for medical staff to submit reports from 10.5 hours to 6.5 hours, saving 38% of the workload.

　　In 2010, IBM deployed the Watson system in doctors’ offices. Watson can “understand” doctors’ natural language questions while quickly analyzing a mountain of medical research data to provide answers.

　　It is worth mentioning Nuance, the largest company specializing in the research and sales of voice recognition software, image processing software, and input method software, holding nearly 80% of the global voice technology market and over 1000 related patents. Nuance provides medical voice transcription services for the U.S. healthcare industry, utilizing patented internet voice and data distribution technology, customer base technology, and medical language experts to convert doctors’ voice recordings into electronic medical record files. Unfortunately, the company currently does not have Chinese voice recognition products in the medical field.

　　According to statistics, the proportion of voice recognition entry used in American clinics has reached 40%~60%, mainly in departments such as radiology, pathology, and emergency rooms, significantly improving work efficiency. Traditionally, American doctors used recording devices to record verbal medical orders and condition records, and secretaries manually entered the data into computers based on the recordings, which were then modified and signed by the doctors.

Recent Research on Voice Recognition Technology in China

　　With advancements in computer performance and improvements in voice recognition technology, recent years have seen research and applications of voice recognition technology in the medical field in China.

　　Central South University researched the application of isolated word voice recognition algorithms in medical instruments, making the use of medical instruments (such as portable pulmonary function testers and home cardiovascular devices) more convenient and natural for the elderly and disabled. However, this research only recognizes isolated words with independent meanings, reducing the difficulty of voice recognition and making it suitable for simple application scenarios.

　　Huazhong University of Science and Technology conducted research on voice processing in electronic medical records, focusing on voice compression and storage with electronic medical records. The research only required identifying which voices needed compression and which did not, as its main purpose was to retain voice recordings as legal evidence in case of medical disputes, without recognizing the content of the voice.

　　Zhoushan Third People’s Hospital explored the application of voice recognition in medical equipment, achieving experiments to control high-pressure injection pumps using voice recognition, aiming to study the operation of medical devices through voice recognition. This research requires high accuracy in voice recognition; otherwise, incorrect actions of medical devices could endanger patients’ lives. However, the voice recognition processing only needs to provide feedback on fixed commands corresponding to the actions of medical devices.

　　Shanghai Medical Instrumentation College researched and designed an imaging department diagnostic report generation system based on voice recognition technology. The design combines a voice recognition engine and diagnostic report system, utilizing Microsoft Speech SDK for development. However, this is only a design, and we have not yet seen the final implementation and application of the design.

　　Until 2015, Peking Union Medical College Hospital collaborated with Beijing Yunzhisheng Information Technology Co., Ltd. (hereinafter referred to as “Yunzhisheng”) to sort out the business of various departments in the hospital, organizing over 40GB of medical text data, and processing these materials for classification and retrieval, allowing the customized voice model to cover key information such as common diseases, drug names, and operational steps used in various departments, achieving a voice recognition accuracy rate of over 95%, which provides an opportunity for voice recognition technology to be widely implemented and applied in various departments of the hospital.

Expert Opinions

Director of the Computer Technology and Information Management Department at Wenzhou Medical University First Affiliated Hospital, Vice Dean and Chief Technology Officer of Wenzhou Medical University Lenovo Smart Medical Research Institute

Pang Chuan Di

Voice recognition technology has been proposed and applied in the industry, but due to the limited development level of voice recognition technology at the time, this technology was not applied in the medical industry.

Recently, with the development of voice recognition technology and related technologies, as well as the needs of hospital informatization construction, this technology is showing new application trends in the medical industry. However, facing the highly specialized medical industry, the widespread application of voice recognition technology faces three major challenges: recognition speed, recognition accuracy, and intelligent recognition of special symbols brought by medical professionalism. Since the informatization construction of hospitals, significant progress has been made in terms of humanization and convenience, allowing medical staff to input and combine text through selection. If the voice recognition speed is not high, its actual effect may be worse than manual operation.

Director of the Information Center at Yunnan Cancer Hospital

Lu Jian

Currently, the application of voice recognition technology in hospitals is a new input method, a new input method. When Yunnan Cancer Hospital trialed the voice recognition system, there were communication and cooperation issues with the electronic medical record vendors.

To avoid the emergence of homogeneous medical records, electronic medical record systems usually prohibit copy-pasting. The voice recognition system from Yunzhisheng used a copy-paste principle in the background during its trial at our hospital, for example, the process of voice recognition input is to copy the corresponding content from the corpus and then paste it into the medical record. This mechanism led to the inability to input into the electronic medical record system after voice recognition. After much effort, it was finally ensured that even if the electronic medical record system could not copy-paste, the voice recognition could still be input normally, so we need to integrate various systems with the voice recognition system to maximize the liberation of doctors’ hands to achieve efficient and accurate input.

Director of the Information Center at Jiangsu Cancer Hospital

Gu Hui

Jiangsu Cancer Hospital’s collaboration with Yunzhisheng originated from the dean’s hope that the voice input function could be applied to some scenarios in the hospital to improve efficiency. After understanding and trialing, we ultimately chose to cooperate with Yunzhisheng.

From the results, there are indeed advantages to using voice input for medical records. First, for surgical records, surgical records are special records written by the surgeon reflecting the course of the surgery, intraoperative findings, and handling of situations, which is an important part of medical record data. According to regulations, surgical records should be written by the surgeon, but sometimes the first assistant writes them, and the surgeon only signs without carefully reviewing the content. This can lead to insufficient detail in the records of the surgery course, findings, and handling. Voice input allows the surgeon to record the entire surgical process of the previous patient during breaks in the surgery and temporarily store it. Later, the assistant can edit this document to form a document that meets writing norms and standards, effectively avoiding the phenomenon of “the surgeon not recording, and the recorder not performing surgery,” greatly improving the quality of surgical records.

Second, in medical technology departments, such as endoscopy, ultrasound, and CT rooms, the voice recognition system can be used to complete the writing of report-related content while reading images. Third, during the clinical writing process, the voice recognition system effectively improves efficiency. Fourth, it is used for voice transcription records of speakers at ethics committees in drug trial institutions, etc.

In the next development steps, we need to strengthen the automatic recognition of special units and symbols in medical measurements to further improve efficiency and accuracy.

Customized Support for the Application of Voice Recognition Technology

Voice Recognition Technology: A New Era of Medical Informatization

Dr. Liu Shengping

Senior AI Technology Expert at Beijing Yunzhisheng Information Technology Co., Ltd. Dr. Liu Shengping: The special nature of the medical environment must be given great attention; general voice recognition systems cannot meet the application requirements in medical institutions.

　　In the face of voice recognition technology, the key emphasis for hospitals is: accuracy, speed, solving accent issues, and effective noise reduction. Whether voice recognition technology can have deeper applications and promotion in the medical industry depends on its satisfactory performance in these areas.

　　Regarding how voice recognition technology can better serve the medical industry, Dr. Liu Shengping, a senior AI technology expert at Yunzhisheng, stated that due to the special nature of the medical industry, a lot of customization and optimization work needs to be done.

　　“Yunzhisheng has made customized optimizations in four aspects for the medical industry: language model, acoustic model, hardware noise reduction, and post-processing of recognition results.

　　First, in the language model, we have captured about 40GB of medical record-related corpus and over 300,000 medical professional terms from various data sources, specifically training a language model for the medical field;

　　Second, in the acoustic model, we specifically collected various noises in the hospital’s specific usage environment and added these noises to the voice training data. Additionally, we used a dataset of over 5,000 hours of Mandarin data with heavy accents from various regions to train our acoustic model, allowing it to recognize Mandarin with various accents.

　　Third, we adopted professional microphones with noise reduction features, which are widely used abroad; they use medical-grade antibacterial materials to effectively eliminate environmental noise, suppress background noise, and support multiple people speaking simultaneously without interference.

　　Fourth, to ensure high accuracy, Yunzhisheng has begun to implement backward error correction. Currently, Yunzhisheng’s voice recognition system has achieved an accuracy rate of over 95%. When doctors write medical records, they may use colloquial expressions like “uh,” “ah,” “that,” and “then” during voice input; Yunzhisheng uses some natural language processing techniques to achieve post-processing capabilities, converting these colloquial expressions into written language through backward error correction and combining medical professional knowledge to reach an accuracy rate of 99%.”

　　Dr. Liu Shengping emphasized: “The special nature of the medical environment must be given great attention; general voice recognition systems cannot meet the application requirements in medical institutions.” At the same time, Dr. Liu Shengping also acknowledged that the continuous promotion of applications and the development of technology are mutually reinforcing. Taking Yunzhisheng as an example, the accuracy of the voice recognition engine continues to improve through the accumulation of user data on the cloud platform.

　　Looking at the development of voice recognition technology, earlier technologies were based on PC, while now they are cloud-based. According to Dr. Liu Shengping, response speed is a very important consideration. Through powerful cloud computing and algorithm optimization capabilities, the response time for voice recognition results can be controlled within about 100 milliseconds, effectively solving the speed issue.

　　When training voice models, Yunzhisheng uses the accents from various regions in China as the basis, gathering accent data from thousands of people. “By training a large model with this data, we can basically solve the adaptability to accents.”

　　Although the information systems in hospitals are becoming increasingly automated, improving the quality of documents and input efficiency through templates, when facing the data of each patient, the system needs to reflect as much personalized content as possible, and this is where voice recognition technology will play a huge role.

　　Voice recognition is an interdisciplinary field involving signal processing, pattern recognition, probability theory and information theory, vocal mechanism and auditory mechanism, artificial intelligence… It is not a new phenomenon, but its entrance into application scenarios has only been in the last twenty years. While there have been achievements, it cannot be considered a “perfect score”—especially in the medical industry.

The medical industry is undergoing a transformation—using informatization methods to enhance productivity. There is a market and a demand; this is a great opportunity for voice recognition technology to enter the medical industry. However, facing this special industry and the high threshold of voice recognition technology, those companies with a “nail spirit” willing to delve into industry needs have a better opportunity to grow alongside the industry.

e-Medical August 2016 original article, please indicate the source when reprinting.

Leave a Comment Cancel reply