Revolutionary Applications of Google's Gemini in Medical AI

Paper Title:

Capabilities of Gemini Models in Medicine

Compiled by: Sienna

Reviewed by: Los

Healthcare is undoubtedly one of the fields that urgently needs disruption and innovation today.

It accounts for nearly 10% of the Gross Domestic Product (GDP) on average, and in some specific contexts, such as the United States, this proportion even rises to over 17% (based on data from 2022 and 2023). However, despite significant investment, public satisfaction with the healthcare system has been declining year by year.

For a long time, artificial intelligence has been seen as a crucial breakthrough to alleviate this dilemma and optimize the quality of healthcare services. Unfortunately, previous attempts and expectations have rarely produced substantial results to support this view.

Now, Google has launched Med-Gemini, a foundational model meticulously fine-tuned for the healthcare industry. This innovative technology has achieved remarkable results in multiple key tasks, not only breaking industry records but also setting a high benchmark for the future development of healthcare.

However, in the face of technological innovations like Med-Gemini, we can’t help but ask: Does it aim to replace the professional skills of doctors, or will it become an indispensable, powerful tool in the hands of doctors that can save more lives?

Next, let us explore the answer to this question together.

Google’s Med-Gemini is indeed a leader in the current healthcare + technology trend. It employs Google’s most advanced foundational model – “Gemini” – and has been carefully fine-tuned to handle various medical tasks. Med-Gemini not only enhances the functionalities of the healthcare system but also brings many critical additional values.

Specifically, Google optimized Med-Gemini from three aspects: prior knowledge, self-training, and uncertainty management during inference.

■2.1 The Importance of Pre-training

In the new era of artificial intelligence, a striking anomaly is pre-training.

Researchers from the University of Cambridge, Flatiron Institute, and Princeton University have confirmed that extensive pre-training is key to achieving outstanding performance for advanced models like ChatGPT or Gemini.

In simple terms, rather than directly training a randomly initialized model on data for a specific task, it is wiser to first pre-train using global data, even if the relevance of this data to the target task is not significant.

Currently, the best practice for model training is to first provide the model with massive amounts of data, regardless of its background, and then fine-tune it for specific tasks. This approach allows the model to build a broad understanding of the world before facing actual learning tasks, thereby forming a kind of prior knowledge.

For example, in the case of a text model, if you want to build a machine translation model, you should first show the model a vast amount of random text to master the grammatical rules of the language before gradually delving into specific translation tasks.

To validate the importance of pre-training, the team trained a group of scientific models and compared those trained from scratch with pre-trained models. Surprisingly, despite the model background seemingly having no relation to the scientific task, such as models trained on cat videos (shown in orange in the figure), their performance surpassed that of models specifically trained for the task (shown in green). All pre-trained models, including those pre-trained on scientific data (which clearly performed better), exhibited outstanding performance.

▲Figure 1｜The lower the score, the better ©️【Deep Blue AI】 Compilation

In summary, regardless of the specific nature of the pre-training data, this step always significantly enhances model performance. It helps the model learn and understand broad world concepts, allowing it to be more effectively applied to downstream tasks.

Although researchers cannot pinpoint the main reasons why the pre-training effect is so significant, they believe that even seemingly unrelated cat videos contain world information that is extremely valuable for any specific scientific field, such as the physics of motion, object permanence, shapes, and basic concepts of time.

Given the above findings, although the Gemini series models are not specifically designed for the medical field, choosing a pre-trained model like Gemini, with its rich global knowledge and local reasoning capabilities, multimodality, and long context processing abilities, is undoubtedly a wise decision.

However, the most captivating part of the training process is undoubtedly the self-training and uncertainty management heuristic methods employed. These innovative techniques provide a new perspective for model optimization, allowing the model to continually improve its performance through ongoing iterations.

■2.2 Improved Training Strategies to Enhance the Accuracy of Med-Gemini

To maintain its cutting-edge status and continue to function effectively, continuous updates and iterations are crucial. However, for LLMs (Large Language Models), an inherent challenge is their knowledge cutoff point, as the model does not have the ability to learn in real-time during inference (at least not through data compression or weight updates).

“The lack of lifelong learning capabilities is one of the most significant limitations of LLMs, leading many to question whether they can serve as a solid foundation for AGI.”

In contrast, Bayesian AI, as another branch of deep learning, aims to create models with “active inference” capabilities. These models, based on a Bayesian inference framework, can continuously learn and are supported by outstanding researchers like Karl Friston. They largely simulate biological neural networks by continuously updating their beliefs about the world to adapt to new observations (sensory inputs), achieving “lifelong learning”.

Fortunately, for LLMs, we can effectively update their information by training the model to use search APIs. During the fine-tuning phase, utilizing search for self-training has become a primary heuristic method. By integrating tools like the Google Search API, the model can generate responses to questions as needed, regardless of whether these questions appeared in the previous training data.

This strategy endows the model with two core capabilities:

●Efficiently utilize APIs to obtain the latest information;

●Determine when to actually use the API, such as learning to recognize whether a question involves knowledge beyond the training phase.

▲Figure 2｜The self-training and tool usage process ©️【Deep Blue AI】 Compilation

When the model realizes it lacks critical information, it will browse the internet via the API, update its context, and provide more relevant and less ambiguous responses. This is particularly important in the medical field, as inaccurate answers can lead to serious consequences at critical moments.

Therefore, during the inference process (i.e., runtime), the model enters an uncertainty-guided search loop. In this loop, the model continuously evaluates the credibility of its current answer, and if confidence is low, it will add additional search steps to gather more information.

To quantify this uncertainty, researchers employed Shannon’s formula to measure the entropy of the response probability distribution. Intuitively, the higher the probability assigned to a particular answer by the model, the lower the entropy (i.e., lower uncertainty), thus the higher the accuracy of that answer.

It is worth noting that a variant of Shannon’s formula – cross-entropy – is core to LLM training methods. It measures the degree of divergence between the model’s predictions and the true distribution (i.e., the actual distribution of the training data), thus assessing the model’s “bias” during predictions. In simple terms, it reflects how well the model’s language statistical distribution matches the true distribution and the model’s ability to replicate the training data.

“For example, if you ask the model where the capital of France is, the output distribution might look like this: [{Paris: 0.30}, {London: 0.25}, {Washington: 0.25}, {Madrid: 0.20}], the model would think it is ‘Paris’, but not with high certainty. In other words, the entropy is high.”

In other words, how similar the model’s language statistical distribution (from which we extract the next word in the sequence) is to the true distribution (the actual distribution of the training data) is equivalent to saying how well the two distributions match, and therefore, how well the model can replicate the training data.

In summary, the Med-Gemini model does not respond to questions immediately; instead, it evaluates and iterates its answers multiple times until it reaches a preset “certainty threshold”. By adopting the above two heuristic methods, Med-Gemini has significantly improved its performance in multiple areas.

▲Figure 3｜The impact of self-training and uncertainty-guided search on the accuracy of Med-Gemini-L 1.0 on MedQA. It can be seen that both self-training and each round of search contribute significantly to performance improvement ©️【Deep Blue AI】 Compilation

However, we still need to explore how Med-Gemini can further improve to better adapt to the ever-changing demands of the healthcare field.

With the continuous advancement of technology, we are witnessing the birth of a brand new concept of a medical assistant – an intelligent medical assistant that can update itself and provide help at any time. This is precisely what companies like Google are striving to create, aimed at improving healthcare services. And Med-Gemini is the latest and most outstanding product in this vision.

■3.1 Cutting-edge Models in the Medical Field

The Med-Gemini model series, derived from Google’s DeepMind’s multimodal large language model (MLLM) Gemini, has been meticulously fine-tuned to focus on the medical field. It employs advanced reasoning, multimodality, and long-context capabilities as its foundational model and has been deeply trained on data and use cases related to healthcare.

The performance of Med-Gemini is remarkable, achieving leading performance in numerous industry-based benchmark tests, significantly surpassing models like OpenAI/Microsoft’s GPT-4 and GPT-4V.

It is worth mentioning that due to the contextual window limitations of ChatGPT, direct comparisons in some benchmark tests are not possible, which undoubtedly showcases an advantage for its competitors:

Revolutionary Applications of Google's Gemini in Medical AI

However, more strikingly, in certain scenarios, Med-Gemini’s performance even surpasses that of human experts. For instance, in the NEJM CPC dataset, which contains complex diagnostic case challenges, its performance is outstanding:

Additionally, in tasks such as medical summarization, referral generation, or medical simplification, Med-Gemini’s responses consistently outperform those of human experts:

In light of Med-Gemini’s exceptional performance, some technology enthusiasts may question: Will artificial intelligence replace doctors? However, this question is far more complex than it appears at first glance.

■3.2 A Valuable Assistant in the Medical Field

Med-Gemini is not intended to replace doctors or nurses but is a valuable assistant meticulously crafted for healthcare professionals.

Comparing AI models like Med-Gemini directly with healthcare personnel is inaccurate, as these models are not tangible entities and cannot directly provide diagnostic services to patients. However, their practicality for healthcare practitioners far exceeds what simple data charts can measure, especially when dealing with long-context tasks.

The Med-Gemini model has a context capacity of up to one million tokens, combined with its multimodal capabilities, enabling it to easily handle hundreds of thousands of words of content or hours of video analysis, providing comprehensive support for clinicians. This feature makes Med-Gemini particularly outstanding in video analysis, offering critical support to surgeons, aiding them in making more informed decisions and assessing complex surgical situations in real-time:

Moreover, Med-Gemini can assist doctors in reviewing vast amounts of patient electronic health records (EHR), quickly identifying potential causes behind symptoms, conditions, or surgeries, greatly enhancing work efficiency:

The above is just the tip of the iceberg of the many transformations Med-Gemini is about to bring to the medical field. As an intelligent assistant, it aims to make the work of doctors and nurses easier and more efficient while alleviating their mental burden.

It is worth emphasizing that Google explicitly positions Med-Gemini as a beneficial complement in the medical field, rather than a replacement. This concept not only reflects respect for healthcare professionals but also showcases the vast potential of artificial intelligence in the medical sector.

However, we should not limit the impact of artificial intelligence to the role of a medical assistant. In fact, AI has shown immense potential in various fields such as drug development and worker training. Although we are still in the early stages of AI development, it has already become a powerful tool for addressing major societal issues.

[References]

https://research.google/blog/advancing-medical-ai-with-med-gemini/

https://arxiv.org/html/2404.18416v2

https://medium.com/@ignacio.de.gregorio.noblejas/med-gemini-googles-new-ai-powerhouse-for-medicine-2e789c2e81cb

https://newatlas.com/technology/google-med-gemini-ai/

✨New Reports in AGI Section🙋

Discussions about AGI are becoming increasingly heated in 2024, not only are industry insiders pondering the issue of AGI implementation, but other sectors are also focusing on the concept of AGI. Since the emergence of Sora, and with Open AI, Google, and Microsoft competing to see whose multimodal large model is “smarter” and “more useful,” we realize that AI is gradually beginning to roll into the “general”.

Therefore, 【Deep Blue AI】 has launched the AGI section, aiming to chat with everyone from all walks of life about the applications of AI in various fields, gaining inspiration and helping everyone gain a deeper understanding of “AI applications”.

Whether it’s “AI + Healthcare”, “Large Model Accelerated Deployment”, “AI for Science”, “AI + Education”, “AI + Finance”, or “AI Ethics”, “Strong vs. Weak AI”, they could all be topics of our discussions.

We look forward to: Readers can tell us the topics and reasons they want to discuss in the 【comment section】 or 【private message section】, and provide your insights during each column’s Q&A session. (Perhaps there will be occasional surprises)😋

Readers can also scan the code to join the 【Deep Blue AI】 AGI discussion group, where everyone is welcome to chat about AI applications, and perhaps you can find opportunities for cooperation and communication👇 (No advertising allowed ⚠️)

Revolutionary Applications of Google's Gemini in Medical AI

This Issue’s Q&A:

Will surgical robots replace clinical doctors? Feel free to leave your answer in the comments section.

【Deep Blue AI】‘s original content is created with the personal dedication of the author’s team. We hope everyone respects the original rules and cherishes the hard work of the authors. For reprints, please privately message the backend for authorization, and be sure to indicate that it comes from【Deep Blue AI】WeChat official account, otherwise, we will pursue copyright infringement.

Revolutionary Applications of Google’s Gemini in Medical AI

■2.1 The Importance of Pre-training

■2.2 Improved Training Strategies to Enhance the Accuracy of Med-Gemini

■3.1 Cutting-edge Models in the Medical Field

■3.2 A Valuable Assistant in the Medical Field

Leave a Comment Cancel reply