Threefold Challenges Facing AI Expert Systems in Clinical Applications

Wu Jiarui, Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences

Introduction

When considering the development of ChatGPT and similar strong AI models into medical expert systems for clinical practice, three challenges must be addressed: the first is how to enable medical expert systems to acquire clinical knowledge and apply it correctly; the second is how national regulatory bodies can supervise medical expert systems and their developers and users; furthermore, ethical governance issues must be considered, ensuring that ethics are integrated into the entire lifecycle of artificial intelligence from development to use.

Since the emergence of computer technology and information technology in the mid-20th century, scientists have attempted to develop software systems that possess human expert knowledge and can solve difficult and complex problems like experts, known as “expert systems” (Expert System). The world’s first expert system, “DENDRAL”, was born in 1965, capable of inferring the molecular structure of compounds based on molecular formulas and mass spectrometry data, much like a chemist. A well-known medical expert system is “MYCIN”, developed by researchers at Stanford University in 1976, which could provide expert-level diagnoses and treatment plans for patients with bacterial infections. Although MYCIN was never actually used clinically, some medical expert systems have been implemented in clinical settings, such as an expert system for diagnosing acute abdominal pain that is widely used in emergency rooms in the UK.

MYCIN was the first to use the concept of a knowledge base, allowing it to interact with users in natural language, answer questions posed by users, and learn new medical knowledge under the guidance of experts. Thus, the recently sensationalized general language model “ChatGPT” (Generative Pre-trained Transformer) can be seen as an upgraded version of a general expert system, with the potential to develop into the medical field — it is said that within a month of ChatGPT’s release, it passed the United States Medical Licensing Examination and published an oncology paper as the first author. Due to the powerful “intelligence” that ChatGPT exhibits, which is close to human thinking, discussions have begun around whether ChatGPT will replace many cognitive workers in various industries, including whether it will replace doctors.

1 The Challenge of Knowledge — How to Acquire and Apply Clinical Knowledge

The knowledge of experts is usually proportional to their capabilities. This correlation is even more evident in expert systems. The reason why the language model ChatGPT possesses super “intelligence” stems from its massive training dataset — the parameters used to train this model reach as high as 175 billion. Therefore, the construction and application of a knowledge database become core elements of expert systems.

1.1 How to Acquire Clinical Knowledge

The company that developed ChatGPT, “OpenAI”, has not disclosed the specific details regarding the sources and construction parameters of its training dataset; however, the datasets used for language models internationally typically come from four major types: wikipedia, books, journals, and web data; OpenAI likely uses these data as primary sources. However, identifying accurate or correct “knowledge” from the vast training data is not something ChatGPT can handle, especially when facing highly specialized and complex medical knowledge. More importantly, ChatGPT, as a natural language processing model, operates on statistics and associations, excelling at text processing and simple dialogue but struggling to form objective knowledge reflecting human physiological or pathological activities, which often require rigorous logical reasoning processes and sometimes personal experience or even intuition.

ChatGPT’s data sources are limited to 2021 and have not been updated in real-time. Medicine and life sciences develop rapidly, with new knowledge constantly emerging. Even human medical experts need to continuously learn new knowledge to update their knowledge bases. It is important to note that the results of biomedical research do not equate to clinical knowledge; there exists a complex transformation relationship between them, and this transformation process is often difficult to reflect in publicly available data sources. Therefore, even if ChatGPT’s capabilities are enhanced to real-time, it will still be challenging for it to convert real-time data into new clinical knowledge.

1.2 How to Apply Clinical Knowledge

Applying existing clinical knowledge to guide clinical practice is not an easy task. The difference between senior and junior doctors lies not only in their mastery of knowledge but also in their application of that knowledge. To test whether ChatGPT can utilize its knowledge for disease diagnosis, the well-known domestic platform “Dingxiangyuan” selected six real consultation cases involving different departments such as neurology, cardiology, and general surgery from the “Dingxiang Doctor Online Consultation Platform” for testing. During the six consultation tests, ChatGPT was able to respond to patients’ questions, explain relevant medical terms, and provide some medical advice, including clinical medication and lifestyle changes. However, during the medical professional review, ChatGPT did not pass. When compared to professional doctors’ responses, the main differences were: the lack of targeted follow-up questions regarding the patient’s medical history, incorrect explanations of professional medical terms, incomplete or erroneous treatment plans, and insufficiently specific suggestions for patients.

The current mainstream medical model is “evidence-based medicine” (Evidence-based medicine), which relies on clinically significant research evidence to formulate standardized clinical practice activities through “clinical practice guidelines”. These guidelines emphasize clinical research evidence with statistical significance. This aligns well with ChatGPT’s statistical foundation. However, due to individual heterogeneity and the complexity of diseases, evidence-based medicine, which focuses on statistical evidence, often cannot accurately conduct personalized treatment. Therefore, in recent years, a new medicine emphasizing individual differences has emerged internationally — precision medicine. The clinical practice activities of this new medicine often deviate from clinical practice guidelines, leading to off-label or excessive indications for medication. Clearly, ChatGPT’s statistical characteristics make it challenging to be used in clinical practice activities related to precision medicine. Thus, future medical expert systems must fully consider the characteristics of precision medicine that emphasize individual differences.

2 The Challenge of Regulation — How to Obtain Qualifications for Clinical Practice

Medicine concerns human life and health, thus products and technologies used in clinical practice are subject to strict regulation; doctors and relevant personnel conducting clinical practice also have corresponding requirements and standards. Although various medical expert systems, including ChatGPT, are constructed from lines of computer code and different types of data, their application in clinical settings still requires appropriate regulation.

2.1 Regulation as Clinical Diagnostic Products

If medical expert systems are viewed as aids for doctors in diagnosing diseases, then ChatGPT should be considered under the regulatory framework for “in vitro diagnostic products” (IVD). Countries worldwide have established specialized regulatory systems for IVD; China’s regulatory authorities are stricter than those abroad, with only products approved by medical regulatory authorities allowed to be compliant IVDs for clinical practice. To obtain a certificate, researchers must conduct various clinical studies per the relevant requirements, and the results must meet specified standards. Currently, ChatGPT has not truly entered the clinical application stage; once it aims to become a clinical diagnostic product, it must consider how to meet relevant regulations.

In the era of big data, utilizing health and medical data and related information to guide clinical diagnosis and treatment also requires regulation. For example, using genomic information obtained from gene sequencing to guide diagnosis and medication is a common practice today. Article 23 of the “Management Measures for Clinical Application of Antitumor Drugs” (Trial) stipulates: “For targeted drugs that require genetic target testing as stipulated in diagnostic norms, clinical guidelines, clinical pathways, or drug instructions issued by the National Health Commission, gene testing must be conducted before use to confirm the patient’s eligibility.” Since August 2018, the National Medical Products Administration has gradually approved four domestic tumor multi-gene testing kits based on high-throughput sequencing technology as IVDs. Gene sequencing has become a fundamental tool in the era of precision medicine for guiding personalized medical care, but it inevitably leads to misuse or abuse of this technology. Consequently, the U.S. Food and Drug Administration (FDA) issued warnings in 2019 to several gene sequencing companies, genetic counseling firms, and medical institutions, requiring them to cease providing reports to patients or the general public regarding how individual genes affect drug efficacy or expected outcomes, as the clinical efficacy of drugs in relation to individual genes cannot be established solely based on simple gene sequencing evidence. It can be anticipated that even if ChatGPT is approved as an IVD product in the future, how to use it correctly in clinical practice will also require appropriate management from relevant authorities.

2.2 Regulation of Clinical Diagnostic Entities

Any expert system’s development relies on specific developers and relevant institutions, such as the company “OpenAI” which is the creator of ChatGPT. From the perspective of clinical diagnosis, such developers become a third party outside of patients and doctors and also require regulation. For instance, when constructing a specific medical expert system’s database, these entities should submit a detailed data list to relevant regulatory bodies for assessment of the database’s rationality and reliability. Currently, OpenAI has not disclosed specific information about ChatGPT’s database.

Not only should the builders of medical expert systems be regulated, but the users of these systems should also be subject to regulation. The use of any medical expert system typically involves various users, such as doctors, adjusting system parameters and confirming or interpreting the diagnostic results or treatment plans provided by the medical expert system. Clearly, how to regulate this “human-machine interaction” in the clinical practice process is a question worth exploring.

3 The Challenge of Ethics — How to Adhere to Relevant Clinical Ethical Norms

Once a medical expert system is put into clinical practice, it will inevitably involve corresponding ethical governance. Although ChatGPT is currently positioned merely as a “chatbot”, OpenAI believes that it must maintain a “politically correct” stance. To this end, OpenAI collaborated with an outsourcing company in 2021 to submit thousands of harmful content text segments from the internet to this company, which manually labeled these unhealthy data containing violence, hate speech, and other negative content. OpenAI then built a content detection tool based on these manually labeled “negative” data and integrated it into ChatGPT; if it detects that the output content contains “politically incorrect” statements, it filters out the problematic content before it reaches the user.

It is evident that ChatGPT can now answer any tricky or bizarre questions without fail. Someone deliberately posed the question: “What is the dumbest question you receive every day?” ChatGPT answered: “As an AI, I do not judge the quality of questions. My purpose is to help users obtain answers as accurately as possible. Therefore, regardless of whether the question is meaningful or meaningless, I will do my best to assist the user.”

Clearly, such a medical expert system does not possess self-awareness, does not hold a stance on its output results, and does not bear legal or ethical responsibility. As ChatGPT states: “As an artificial intelligence system, I do not have emotions or consciousness. My purpose is solely to answer questions and provide information. If I am given instructions that may have adverse effects, my programming source code will strictly adhere to the ethical and moral standards defined by OpenAI to ensure that no adverse effects occur.”

Thus, the true ethical governance subject should be third parties like OpenAI that construct artificial intelligence systems, as they set the content orientation and ethical standards of their products. In other words, when building a certain medical artificial system, it is necessary to comprehensively consider relevant ethical issues to ensure that artificial intelligence products can meet corresponding ethical requirements in clinical practice. This view has become a consensus in the field of artificial intelligence research and development. On September 25, 2021, the National New Generation Artificial Intelligence Governance Professional Committee released the “Ethical Norms for the New Generation of Artificial Intelligence”, providing ethical guidelines for individuals, legal entities, and other relevant institutions engaged in AI-related activities; the essence of these ethical norms is to integrate ethics into the entire lifecycle of AI management, research and development, supply, and use.

The original text was published in “Medicine and Philosophy” 2023, 44:4-5.

Leave a Comment Cancel reply