
Source: Zhiyuan Community
This article is approximately 3558 words long and is recommended to be read in 7 minutes.
This article discusses Assistant Professor Yuan Yang from Tsinghua University's School of Interdisciplinary Information about the interpretability of AI in healthcare.
Interpretability is a longstanding challenge for deep learning researchers.
Unexpectedly, throughout history, traditional Chinese medicine (TCM) practitioners have faced similar issues. TCM lacks clinical data based on modern statistics, forcing doctors to diagnose in a black-box manner, using data from observation, inquiry, and palpation as inputs, with the final treatment plan as output, fitting it based on actual clinical data.
In this sense, its black-box nature seems to resonate with that of neural networks.
Yuan Yang, Assistant Professor at Tsinghua University’s School of Interdisciplinary Information. He graduated from Peking University with a degree in Computer Science in 2012 and obtained his PhD in Computer Science from Cornell University in 2018 under the supervision of Professor Robert Kleinberg, an expert in online learning and reinforcement learning theory. From 2018 to 2019, he worked as a postdoctoral fellow at MIT’s Institute for Data, Systems, and Society (IDSS), under the guidance of Professor Piotr Indyk, the creator of the Locality-sensitive Hashing algorithm, and Professor Aleksander Madry, an expert in machine learning robustness. Dr. Yuan Yang’s main research areas include smart healthcare, AI interpretability, and large AI systems.
01. Interpretability Issues in Diagnostic Assistance
Currently, artificial intelligence technologies are widely applied in scenarios such as surgical robots, medical image analysis, medical decision-making, and personal consultation assistants. Achieving more intelligent computer-aided diagnosis (CAD) and providing more accurate decision-making support for doctors is the dream of many practitioners in the “AI + healthcare” field. However, since healthcare applications are related to user safety, the requirements for model interpretability are very high. People expect AI applications not only to make decisions but also to provide corresponding evidence and explanations, which poses a significant challenge to current deep neural network models.
Compared to medical image analysis (segmentation, registration, classification, etc.), the requirements for interpretability are higher in diagnostic assistance and decision-making. The purpose of intelligent medical image analysis is often to enhance the efficiency of doctors. For example, helping doctors quickly process dozens or hundreds of layers of CT images is more about interpreting the images, which is relatively easy.
However, in the context of diagnostic assistance and decision-making, many aspects involve interpretability issues, such as “which treatment plan to adopt,” “which medication to prescribe,” and “what dosage to use.” Unfortunately, providing interpretability for this type of multimodal data remains a significant challenge in both academia and industry.
In recent years, attribution analysis techniques have become popular in the field of deep learning interpretability research. These techniques study the interpretability of input data dimensions. For example, recognizing a cat in an input image by seeing its “ears” and “tail” is more interpretable because these pixel-based segments provide clearer explanations.
However, for diagnostic decision-making results concerning patients, not only is interpretability needed at the data input dimension level, but understanding the interactions between different dimensions is also crucial. The interpretability provided at the input level is often limited; we often need to consider more abstract correlations. For instance, after medication, if blood sugar rises while another indicator falls.
Thus, the interpretability of AI systems becomes a limiting factor in diagnostic assistance, involving difficulties in data collection, algorithm design bottlenecks, and more. Researchers are trying various methods to provide interpretability to AI systems, such as explaining the high-level semantics of neural networks and constructing interpretable concepts. Associate Professor Zhang Quanshi from Shanghai Jiao Tong University and others have also attempted to explore the interactions between dimensions. However, there are few technologies that can be practically applied.
Some of our experiments based on clinical data indicate that in medical decision-making scenarios, approximately 30% can be explained by attribution methods, while 70% relies on higher-dimensional factors, necessitating consideration of higher-order correlation explanations.
02. Starting from Traditional Chinese Medicine, Discussing Treatise on Febrile and Miscellaneous Diseases
The diagnostic thinking of TCM and Western medicine differs to some extent. We choose to start from TCM because, compared to Western medicine, TCM lacks tools like microscopes for pharmacological analysis at the cellular and molecular levels, and it does not have clinical data supported by modern statistics, resulting in a lack of interpretability in clinical diagnosis. These problems sound quite severe and have been criticized for a long time, so why do we think the data it produces is a good starting point?
Because most data obtained through modern statistical techniques are based on linear functions, which, while having very intuitive interpretability and verifiability, are not the complex data that deep learning algorithms excel at processing. We say that if the problem itself is simply a regression on linear data, no powerful learning algorithm can outperform linear regression. Precisely because TCM historically lacked tools like microscopes, doctors had to diagnose in a black-box manner, using data from observation, inquiry, and palpation as inputs and fitting the final treatment plan based on actual clinical data. This thought process closely resembles the framework of machine learning.
Of course, we cannot say that because TCM diagnoses are black boxes and neural networks are also black boxes, only magic can defeat magic, so TCM is very suitable. We find TCM interesting because, due to its black-box nature, historical TCM practitioners have faced many of the same issues that deep learning researchers encounter today, such as interpretability.
An experienced TCM practitioner may possess high skills, but all their treatment methods are stored in their mind. How should these methods be passed on to apprentices? Because of its highly nonlinear characteristics, TCM cannot produce statistical conclusions based on modern statistics like Western medicine; since knowledge must be transferred from one mind to another, TCM cannot simply copy parameters to another computer like neural networks.
So what should be done? TCM does not have a perfect solution, but practitioners have attempted to propose many abstract concepts, such as the often-criticized concepts of Yin-Yang and the Five Elements, as well as kidney deficiency, dampness, phlegm stagnation, etc. These abstract concepts are essential for the nonlinear black-box experience inheritance. If we use the language of neural networks, these abstract concepts are equivalent to the intermediate layer nodes within the network. Thus, TCM practitioners open up their neural networks, passing not only the final answers to their apprentices but also the “problem-solving approach.” The labels of such intermediate layer nodes are very meaningful for the learning of neural networks, as they can reduce data requirements and improve training effectiveness.
Currently, we are working on constructing a TCM knowledge engine. There is a classic work in the field of TCM called “Treatise on Febrile and Miscellaneous Diseases,” which rigorously defines many TCM concepts that are very suitable to be used directly as inputs for learning models. In fact, Zhang Zhongjing attempted to promote this work over 1000 years ago. This book is concise, and doctors can achieve good results simply by following its prescriptions. Approximately 60% of the prescriptions in TCM textbooks come from this work.
“Medical Sage” Zhang Zhongjing
03. Expert Systems: Evidence-Based and Broadly Inclusive
Currently, the “Qianfang Medical” team is attempting to construct an intelligent clinic. Compared to hospitals, clinics are much smaller in scale. TCM is comprehensive and does not require dozens of departments, aligning with the business scenario of our clinic. Our clinic mainly addresses common diseases that are relatively troublesome for Western medicine, such as menstrual disorders, insomnia, and nausea.
In the past year, we have mainly completed the front-end and back-end development and framework construction of the clinic system, launching 40 microservices and building 200 code repositories, and we are continuously refining the system. Based on this, we will ask doctors for experiential feedback to help us collect data. Currently, we have streamlined the basic process from patient registration to doctor communication and prescription issuance.
Now, we have developed a prototype of an expert system that possesses the simplest recommendation function based on books like “Treatise on Febrile and Miscellaneous Diseases.” Next, we will focus on improving the expert system and developing a more advanced recommendation system. The difference between the two lies in that the expert system acquires knowledge from books, while the recommendation system obtains knowledge from the clinical diagnosis data of different doctors.
Additionally, we hope to establish a more standardized follow-up process. After patients receive treatment, they can provide us with feedback about their status before and after treatment, allowing them to participate in the process of improving the system. Currently, both AI services in TCM and Western medicine do not perform particularly well in follow-up. In most cases, our data only contains simple patterns, reflecting only the prescriptions given by doctors without including the specific effects after patients take the medication. Therefore, follow-up data is crucial.
Currently, our team consists of 16 members, and we have also recruited some interns with backgrounds in TCM. TCM students help us formalize the concepts in “Treatise on Febrile and Miscellaneous Diseases,” transforming the original 300 pages of classical text into machine-readable labels, using specific domain language to describe the content of the book, which can be utilized as data for the expert system.
The effectiveness of the expert system is essentially “evidence-based and broadly inclusive.” “Evidence-based” means being able to find the sources corresponding to all the suggestions made by doctors. “Broadly inclusive” indicates that the final results will encompass cases discussed in the book, as well as the modifications suggested by experts.
TCM prescriptions are often quite complex, but they can draw on and reference the foundational work of predecessors. TCM often addresses combinatorial optimization problems, focusing on the relationships between medicines. In TCM, the effects of taking two medicines separately and taking them together often differ.
Combinatorial optimization is something computers excel at; it is essentially an NP problem. For example, copying homework is certainly easier than coming up with the answers independently, which reflects the P vs. NP problem. Similarly, imitating existing recommendations is much simpler than generating recommendations from scratch.
04. For AI + X, How to Build Planet-Scale Systems
Previously, Michael Jordan mentioned that AI applications are the trend of the future, but what may be truly important is super-large AI systems (planet-scale systems) capable of processing planet-scale data.
I strongly agree with his viewpoint. I believe to effectively integrate AI with interdisciplinary applications, it is essential to build large-scale, type-safe systems that align with specific business logic. Many people hear about large-scale systems and immediately think of supporting large volumes of data processing. While this is indeed very important, I believe another even more crucial goal is to support the collection of high-quality data that can easily be processed and utilized by AI algorithms. To achieve this goal, type safety becomes very important.
Thus, this summer, I plan to offer a course at the academy tentatively titled “Type-Safe Front-End and Back-End System Practice.” Taking Health Treasure as an example, if we can build a better type-safe large system for epidemic management, it would not only support larger business scenarios but also allow for more comprehensive and in-depth analysis of the epidemic based on type-safe big data.