Hong Kong Tests Medical Multimodal Large Model

Hong Kong Tests Medical Multimodal Large Model

Hong Kong Tests Medical Multimodal Large Model

Better integration and exploration of industry data is expected to provide new possibilities for the development of multimodal large models in vertical fields. How can the traditional research advantages of Hong Kong connect with industrial opportunities, and how can international channels collaborate with local resources?

By|《财经》special correspondent in Hong Kong, Jiao Jian
Editor|Su Qi
The industry multimodal large model is gaining momentum, and in the Hong Kong Special Administrative Region, which is actively seeking new economic drivers, the related industrial ecosystem is also trying to seize opportunities.
A recent notable micro-case is related to the biomedicine industry, which the SAR government has identified as a key breakthrough direction. On March 11, the Artificial Intelligence and Robotics Innovation Center of the Chinese Academy of Sciences Hong Kong Innovation Institute (hereinafter referred to as the “AI Center”) released its AI multimodal large model for the medical vertical field—CARES Copilot 1.0.
According to the international medical large model evaluation rankings, it currently ranks first in several metrics, indicating the potential for Hong Kong to seize new opportunities in vertical fields.
Compared with unimodal models, multimodal large models can simultaneously process various types of information, including text, images, audio, and video. Because they integrate more closely with the real world and align better with human habits of receiving, processing, and expressing information, they are considered capable of interacting more flexibly with humans and executing a broader range of tasks. For this reason, multimodal large models are believed to help advance technology towards the goal of general artificial intelligence (ACI).
From a commercial perspective, emphasizing the integration of technology and business to promote digital transformation and intelligent upgrades in the industry enhances the business value of multimodal large models. The expectations and imaginative space of the industry are also continuously expanding.
Compared to overseas companies, domestic large model manufacturers are also striving to enhance their multimodal large models’ capabilities to achieve “multiple specialties and multiple abilities” in diverse information environments, while seeking application scenarios and market value in vertical fields. For example, in the medical field, what industrial soil and specific advantages does Hong Kong possess?
On a macro level, cultivating emerging and future industries such as life sciences, healthcare, and new materials is an important part of the Hong Kong SAR government’s recent development of “new quality productivity.” However, there are also multiple bottlenecks in planning and micro-level implementation, making breakthroughs challenging. As the first national-level new R&D institution established by the Chinese Academy of Sciences in the Hong Kong SAR in 2019, the aforementioned AI Center is looking for new paths rooted in Hong Kong’s innovation and industrial soil.
For example, its structure aims to strengthen research cooperation between Hong Kong and mainland China. It attracts global talent and cultivates young researchers and students in Hong Kong. Specifically, the development of the CARES Copilot 1.0 large model covers several medical institutions across borders, including Peking Union Medical College Hospital, Sun Yat-sen University First Affiliated Hospital, and Prince of Wales Hospital in Hong Kong.
“Although there are various voices, such as R&D not aligning with industry and being difficult to translate, Hong Kong’s traditional research advantages still exist. What can be felt is the strong originality of its research system and its continued international status,” said Liu Hongbin, Executive Director of the AI Center, in an exclusive interview with 《财经》reporters in Hong Kong. Regarding the development of vertical industry large model technology and challenges, as well as how Hong Kong can seize opportunities, the following are the core points of his views:
《财经》: Can you briefly introduce CARES Copilot and the fields it covers?
Liu Hongbin: CARES Copilot is a large model system designed specifically for the medical field, capable of highly integrating with smart medical devices, effectively supporting functions such as real-time image intelligent recognition during surgery, multimodal registration of MRI/CT/ultrasound, scene understanding under endoscopy, instrument and anatomical structure segmentation, instrument detection and counting, and monitoring doctor behavior under operating room cameras.
In short, it can go beyond general teaching requirements, applied in clinical, surgical, and research settings, directly assisting frontline medical staff in dealing with emergencies, supervising, warning, and preventing surgical step hazards. In addition, the surgical large model combined with surgical navigation can provide real-time anatomical location information to improve surgical safety.
《财经》: The medical field involves multiple links and diverse information forms. Does this necessitate the use of a multimodal approach to develop large models?
Liu Hongbin: That’s right. The data in the medical field exists in multimodal forms. In addition to text-based medical records, there are images, such as EEG and ECG. The data forms are diverse, and the sources are vast. Previous technical explorations have made some progress in certain areas. However, to meet the demand for extensive auxiliary diagnosis and even interventional treatment, all information must be integrated and processed to draw conclusions. Even doctors do not rely on a single source of information to make surgical decisions; they will certainly consider all information comprehensively.
《财经》: What do you mean by previous technical explorations? What attempts or technical reserves have been made in the industry before the concept of large models became popular?
Liu Hongbin: Medical technology has always been in a state of continuous experimentation and evolution. For instance, digitalization and intelligence in the medical field have been ongoing. A well-known “failure” case is IBM’s Watson, which was launched over a decade ago. Although it did not bring a so-called “revolution” to the medical community, it was still a beneficial attempt.
To improve efficiency in the healthcare industry, digital innovation is essential. For example, in surgery, to enhance a doctor’s abilities, the intervention of technology requires standardization. Compared to a production line, surgery is undoubtedly more complex, with more uncontrollable factors, requiring the wisdom and problem-solving abilities of doctors. However, with the continuous iteration of information and technology, the involvement of artificial intelligence will increase.
《财经》: What role will large models play in the digital enhancement and transformation of this field?
Liu Hongbin:Large models provide breakthroughs for previously seemingly unsolvable problems. For instance, they can combine multimodal data and generate conclusions. At least in terms of data connectivity and output, there is hope for digitization.
We are not starting from scratch. Many team members have years of in-depth experience. For example, proprietary image algorithms and human recognition technologies. Because of the vertical domain expertise, we can better integrate algorithms and large models.
Taking CARES Copilot as an example, the current version 1.0 primarily focuses on recognition and assistance for doctors. By ensuring accurate recognition, it becomes possible to determine whether the judgments made during surgery are correct. Based on this, we are also leveraging the abstract understanding capabilities of large models to assess the surgical stage. Additionally, we can identify key anatomical structures, improving the recognition rate of certain critical anatomical structures to over 80%.
《财经》: A recognized issue with large models is that they sometimes exhibit what is called “hallucination.” Given the high accuracy requirements in the medical field, how do you ensure accuracy?
Liu Hongbin:What is referred to as large model “hallucination” is essentially a matter of probability. To understand the current framework of large models fundamentally, their operating principle is based on statistical probabilities. However, once probability is involved, there is always a possibility of error.
At the application level, accuracy must be measured from different dimensions. For example, regarding the responses of large models, we have achieved an accuracy rate of over 95% for related questions by incorporating enhanced knowledge retrieval techniques and integrating them with expert knowledge bases. This is already a basic usable and acceptable state. After all, clinical doctors in reality are also human and cannot achieve 100% accuracy.
《财经》: How do you understand this additional technology and its combination with expert knowledge bases?
Liu Hongbin:It can be likened to creating an artificial “tightening spell.” For instance, one of the foundations of CARES Copilot is based on the large language model Llama 2.0 developed by Meta, combined with technologies from various large model products in mainland China. These foundational layers build a “novice” that can “understand language,” and to train it into a vertical field expert, relevant knowledge and techniques must be infused into it.
In this sense, large models can be understood as a tool for compressing vast amounts of information. Relevant knowledge and different types of information are compressed, categorized, and organized before being stored in a relatively controllable model. When needed, the large model acts as a bridge covering different data types, outputting the required information to doctors.
On this basis, to avoid the “black box” of the large model generation process and the state of not knowing the reasons behind it, we aim to incorporate a new logical framework. Its processing of information is from simple to complex and is traceable, so this framework itself is explainable. Based on this, we have also established an expert knowledge database, combined with an enhanced retrieval technique, to ensure accuracy through a multi-faceted approach.
《财经》: What difficulties exist in this process of inputting, compressing, organizing, and accurately outputting data? For instance, is computing power a constraint?
Liu Hongbin: Computing power is indeed a common issue faced by all companies developing large models, especially as competition for computing power intensifies. Even in the Hong Kong SAR, we cannot collaborate with companies like NVIDIA; we are currently using Huawei’s computing infrastructure. It is important to note that the domestic industry is rapidly iterating and upgrading.
Moreover, the core issue in developing vertical field large models also includes how to effectively integrate multimodal data and make judgments through multimodal fusion. There are many research attempts now, but they are not very perfect.
Another issue is data sourcing; vertical fields cannot provide as much data as the internet does. How to train a large model with a relatively small amount of data requires combining specific domain knowledge with general large model training methods, in which the role of experts is crucial. However, compared to ChatGPT and Sora, the performance may not be as impressive due to the smaller data volume.
Whether to intervene structurally (for example, consciously injecting standardized knowledge and formulas from the medical field) rather than relying on vast amounts of data for self-evolution is a current divergence in technical evolution routes. It can be purely data-driven or combined with human knowledge to make the model interpretable. Our team’s experience is that effective injection can significantly enhance model performance.
《财经》: The development and testing of CARES Copilot involves multiple hospitals in both mainland China and Hong Kong. Does this present certain obstacles to data acquisition?
Liu Hongbin:Actually, it’s manageable. Our approach is to complete training the model at one hospital without taking the data, then move it to another hospital for further training. This way, we can utilize data from multiple hospitals while preventing data leakage. It enhances data volume while protecting privacy.
《财经》: Returning to the development of CARES Copilot, why choose to establish a center in Hong Kong? What support does Hong Kong’s industrial environment provide for the development of large models?
Liu Hongbin:The Hong Kong SAR is an international city. The Chinese Academy of Sciences established the AI Center here with the goal of internationalizing scientific research development. Prior to this, the external perception of Hong Kong’s research was that professors had advantages in original innovation, and the research system was relatively internationalized. However, there often lacked a bridge to convert the achievements of researchers into industry. After we truly established ourselves here, we hope to serve as that bridge, so we began organizing connections with professors and resources in the Greater Bay Area. The ideas of professors can be described as “wildly imaginative,” and indeed there are many differences from the mainland, some of which can be considered quite “crazy,” but they are also very inspiring.

Hong Kong Tests Medical Multimodal Large Model

Editor | Tian Jie

Cover Image | 《财经》reporter Jiao Jian

Hong Kong Tests Medical Multimodal Large Model

Hong Kong Tests Medical Multimodal Large Model
Hong Kong Tests Medical Multimodal Large Model

Leave a Comment