Ethics and Governance of AI in Health: A Guide to Multi-Modal Large Models

The following content is reproduced from Chinese Medical Ethics.

Editor’s Note

With the rapid development of artificial intelligence technology, the health sector is continuously attempting to improve medical quality and enhance work efficiency by introducing AI. Due to the ability of large model technology to process complex tasks and large-scale data, significantly enhancing the generalizability, versatility, and practicality of AI, this technology has broad application prospects and potential in disease prediction, diagnosis, treatment, and drug development, but it also brings numerous ethical challenges and risks that urgently need to be addressed.

On January 18, 2024, the World Health Organization (WHO) released the English version of the Ethics and Governance of Artificial Intelligence for Health: Guidance on Large Multi-Modal Models, aiming to assist countries in planning the benefits and challenges related to multi-modal large models in the health sector and to provide policy and practical guidance for the appropriate development, provision, and use of multi-modal large models.

Given the significant strategic importance of multi-modal large model AI for our country to gain new advantages in future strategic competition and promote the health of the general public, this journal has organized experts engaged in related research to translate the guide into Chinese for researchers’ reference, in order to promote research and guidance on the ethical governance of medical large models in our country, achieving a virtuous interaction of high-quality innovative development and high-level safety.

This article was first published on CNKI, the reference format is as follows:

Wang Yue, Song Yaxin, Wang Yifei, et al. Ethics and Governance of Artificial Intelligence for Health: Guidance on Large Multi-Modal Models [J/OL]. Chinese Medical Ethics: 1-58 [2024-03-14]. http://kns.cnki.net/kcms/detail/61.1203.R.20240304.1833.002.html.

Ethics and Governance of Artificial Intelligence for Health: Guidance on Large Multi-Modal Models

Ethics and governance of artificial intelligence for health. Guidance on large multi-modal models. Geneva: World Health Organization; 2024. Licence: CC BY-NC-SA 3.0 IGO.

This translation is not derived from the World Health Organization (WHO), and the WHO is not responsible for the content or accuracy of the translation. The English original should be regarded as the authoritative text.

Original Version Number: ISBN 978-92-4-008475-9 (eBook); ISBN 978-92-4-008476-6 (Print)

Translator

Wang Yue1, Song Yaxin1, Wang Yifei1, translated; Yu Lian2, Wang Jing3, reviewed

(1 School of Law, Xi’an Jiaotong University, Xi’an, Shaanxi 710049; 2 School of Public Health, Xi’an Jiaotong University, Xi’an, Shaanxi 710061; 3 Beijing Traditional Chinese Medicine Hospital, Capital Medical University, Beijing 100010)

Abstract

Artificial Intelligence (AI) refers to the ability of algorithms integrated into systems and tools to learn from data, enabling them to perform automated tasks without explicit programming for each step. Generative AI is a type of AI technology where algorithms are trained on datasets available for generating new content (such as text, images, or videos). This guide focuses on one type of generative AI, namely, Large Multi-modal Models (LMMs). These models can accept one or more types of data inputs and produce various outputs that are not limited to the types of data input. It is predicted that multi-modal large models will be widely used in healthcare, scientific research, public health, and drug development. Multi-modal large models are also referred to as General-purpose Foundation Models, although it has not yet been confirmed whether multi-modal large models can complete various tasks and purposes.

The rapid adoption of multi-modal large models exceeds that of any previous consumer application. They are notable for facilitating human-machine interaction, mimicking human communication, and responding to queries or data inputs in a seemingly human-like and authoritative manner. With rapid consumer adoption and acceptance, and considering their potential to disrupt core social services and economic sectors, many large tech companies, startups, and governments are investing in and competing to guide the development of generative AI.

In 2021, the World Health Organization (WHO) published a comprehensive guide on the ethics and governance of AI in health. The WHO consulted 20 top experts in the field of AI, who identified the potential benefits and risks of using AI in health and published six principles reached through a consultative process for governments, developers, and providers using AI to consider when formulating policies and practices. These principles should guide a wide range of stakeholders, including governments, public institutions, researchers, businesses, and implementers, in developing and deploying AI in health. The six principles are: (1) protect human autonomy; (2) promote human well-being, safety, and the public interest; (3) ensure transparency, explainability, and understandability; (4) foster accountability and implement accountability; (5) ensure inclusivity and fairness; (6) promote responsive and sustainable AI (Figure 1).

Figure 1: WHO Consensus on Ethical Principles for AI in Health

The purpose of the WHO in releasing this guide is to assist member states in planning the benefits and challenges related to multi-modal large models in health and to provide policy and practical guidance for the appropriate development, provision, and use of multi-modal large models. This guide provides governance recommendations for corporate, governmental, and international cooperation that align with the guiding principles. The foundation of this guide is based on the unique ways in which humans use generative AI in health.

Applications, Challenges, and Risks of Multi-Modal Large Models

The potential applications of multi-modal large models in health are similar to those of other forms of AI; however, the access and usage of multi-modal large models are novel, presenting both new benefits and new risks that social and health systems and end users are not yet prepared to handle. Table 1 summarizes the main applications of multi-modal large models and their potential benefits and risks.

Systemic risks associated with the use of multi-modal large models include the following potential risks that may affect healthcare systems (Table 2).

The use of multi-modal large models may also introduce broader regulatory and systemic risks. One concern is whether multi-modal large models comply with existing legal or regulatory frameworks, including international human rights obligations and national data protection regulations (some data protection agencies are investigating this). Due to the ways in which training data for multi-modal large models is collected, the management and processing of collected data (or data input by end users), the transparency and accountability of multi-modal large model developers, and the possibility of multi-modal large models exhibiting “hallucinations”, algorithms may not be applicable under current laws. Multi-modal large models may also violate consumer protection laws.

As the use of multi-modal large models continues to grow, developing these models requires more massive computational power, data, human resources, and financial resources. A broader social risk associated with the use of such algorithms in health is the fact that multi-modal large models are predominantly developed and deployed by large tech companies, which may strengthen these tech giants’ dominance in the development and use of AI compared to smaller businesses and governments, including guiding the AI research priorities of the public and private sectors. Other concerns regarding the potential dominance of large tech companies include insufficient corporate commitments to ethics and transparency. New voluntary commitments between companies and between companies and governments may reduce some risks in the short term but cannot replace the government oversight that may ultimately be implemented.

Another social risk is the carbon footprint and water footprint of multi-modal large models. Like other forms of AI, multi-modal large models require significant energy and produce an increasing water footprint. While multi-modal large models and other forms of AI can bring significant social benefits, the growing carbon emissions may become a major factor in climate change, while the increasing water consumption may further negatively impact communities facing water scarcity. Another social risk associated with the emergence of multi-modal large models is that, despite providing seemingly plausible responses, they are gradually being regarded as sources of knowledge, which may ultimately undermine the authority of human knowledge, including in healthcare, scientific, and medical research fields.

Ethics and Governance of Multi-Modal Large Models in Healthcare and Pharmaceuticals

Multi-modal large models can be seen as the product of a series (or chain) of decisions made by one or more actors regarding programming and product development (Figure 2). Decisions made at each stage of the AI value chain can have direct or indirect impacts on downstream entities involved in the development, deployment, and use of multi-modal large models. Governments can influence and regulate these decisions by enacting and enforcing laws and policies at national, regional, and global levels.

Figure 2: The Value Chain for Developing, Providing, and Deploying Multi-Modal Large Models

The AI value chain typically begins with a large tech company, referred to as the “developer” in this guide. Developers can also be universities, smaller tech firms, national health systems, public-private partnerships, or other entities with the resources and capabilities to utilize several inputs. These inputs constitute the “AI infrastructure” (a term used by governments in legislation and regulation to describe multi-modal large models), such as data, computational power, and AI expertise used to develop general-purpose foundation models. These models can be directly used to perform various tasks, often unforeseen, including those related to healthcare. Several general-purpose foundation models are specifically trained for use in healthcare and pharmaceuticals.

Third parties (“providers”) can use general-purpose foundation models for specific purposes or applications through proactive programming interfaces. This includes: (i) fine-tuning new multi-modal large models, which may require additional training of the foundation model; (ii) integrating multi-modal large models into applications or larger software systems to provide services to users; or (iii) integrating components known as “plugins” to guide, filter, and organize multi-modal large models in a formal or standardized format to generate “digestible” results.

Subsequently, providers can sell products or services based on multi-modal large models to clients (or “deployers”), such as health departments, healthcare systems, hospitals, pharmaceutical companies, or even individuals, such as healthcare providers. Clients purchasing or obtaining licenses to use products or applications can directly use them for patients, other entities in healthcare systems, non-professionals, or their own businesses. The value chain can be “vertically integrated”, meaning that companies (or other entities, such as national health systems) that collect data and train general-purpose foundation models can modify multi-modal large models for specific uses and provide applications directly to users.

Governance is a means of embodying ethical principles and human rights obligations through existing laws and policies, newly drafted or revised laws, guidelines, internal codes of conduct, and developer programs, as well as international agreements and frameworks.

One approach to building a governance framework for multi-modal large models is to integrate it into the three stages of the AI value chain: (i) designing and developing general-purpose foundation models or multi-modal large models; (ii) providing services, applications, or products based on general-purpose foundation models; and (iii) deploying healthcare services or applications. This guide reviews each stage from three aspects:

1. What risks should be addressed at each stage of the value chain (as described above)? Which actors are best positioned to address these risks?

2. What can relevant actors do to address the risks? What ethical principles must be adhered to?

3. What is the role of government, including relevant laws, policies, and regulations?

Some risks can be addressed at various stages of the AI value chain, while certain actors may play a more significant role in mitigating various risks and maintaining ethical values. Although there may be disagreements and tensions regarding the allocation of responsibilities among developers, providers, and deployers, in some clear areas, actors may each be in the most favorable position to respond or may be the only entities capable of addressing potential or actual risks.

Design and Development of General-Purpose Foundation Models (Multi-Modal Large Models)

In the process of designing and developing general-purpose foundation models, the responsibility lies with the developers. Governments are responsible for establishing laws and standards that require certain practices to be adopted or prohibited. Chapter 4 of this guide provides suggestions to help address risks and maximize benefits in the development of multi-modal large models.

Provision of General-Purpose Foundation Models (Multi-Modal Large Models)

In the process of providing services or applications, governments have a responsibility to define the requirements and obligations for developers and providers to address specific risks associated with AI-based systems used in healthcare settings. Chapter 5 of this guide provides suggestions for addressing risks and maximizing benefits when using multi-modal large models to provide healthcare services and applications.

Deployment of General-Purpose Foundation Models (Multi-Modal Large Models)

Even when relevant laws, policies, and ethical practices are applied in the development and provision of multi-modal large models, risks may arise during use, partly due to the unpredictability of multi-modal large models and their responses, as users may apply general-purpose foundation models in ways that developers and providers did not anticipate, and the outputs of multi-modal large models may change over time. Chapter 6 of this guide offers recommendations on the risks and challenges to be addressed when using multi-modal large models and applications.

Accountability for General-Purpose Foundation Models (Multi-Modal Large Models)

With the widespread use of multi-modal large models in healthcare and pharmaceuticals, errors, misuse, and ultimately harm to individuals are inevitable. Therefore, accountability can ensure that users harmed by multi-modal large models receive adequate compensation or other forms of redress, reducing the burden of proof on harmed users and ensuring they receive full and fair compensation.

Governments can achieve this by introducing a presumption of causation. Governments may also consider introducing strict liability standards to address harm caused by the deployment of multi-modal large models. While strict accountability can ensure compensation for those harmed, it may also hinder the use of increasingly complex multi-modal large models. Governments may also consider establishing no-fault compensation funds.

International Governance of General-Purpose Foundation Models (Multi-Modal Large Models)

Governments must work together to establish new institutional structures and rules to ensure that international governance keeps pace with the globalization of technology. Governments should also ensure enhanced cooperation and collaboration within the United Nations system to address the opportunities and challenges of broader deployment of AI applications in health and social and economic fields.

To ensure that governments are accountable for their investments and participation in the development and deployment of AI-based systems, and to ensure that governments enact appropriate regulations that uphold ethical principles, human rights, and international law, international governance is essential. International governance can also ensure that multi-modal large models developed and deployed by enterprises comply with appropriate international safety and efficiency standards and adhere to ethical principles and human rights obligations. Governments should also avoid enacting regulations that offer competitive advantages or disadvantages to enterprises or governments themselves.

To make international governance meaningful, these rules must be developed collectively by all countries, not just by high-income countries (and the tech companies that collaborate with high-income country governments). As the UN Secretary-General proposed in 2019, international governance of AI may require all stakeholders to collaborate through networked multilateralism, allowing the UN family, international financial institutions, regional organizations, trade groups, and other aspects, including civil society, cities, businesses, local authorities, and youth, to work together more closely, effectively, and inclusively.

Introduction

This guide addresses emerging uses of multi-modal large models in relevant applications in health. It includes the potential benefits and risks of using multi-modal large models in healthcare and pharmaceuticals, as well as governance approaches that best ensure compliance with ethical, human rights, and safety standards and obligations. This guide is based on the WHO’s June 2021 guide on Ethics and Governance of Artificial Intelligence for Health. The Ethics and Governance of Artificial Intelligence for Health guide discusses the ethical challenges and risks of AI in health and identifies six principles to ensure public benefit for all countries utilizing AI in health while proposing recommendations to strengthen governance of AI in health to maximize the technology’s prospects.

Artificial intelligence refers to the ability of algorithms integrated into systems and tools to learn from data to perform automated tasks without explicit programming for each step. Generative AI is a type of AI technology in which machine learning models are used to train algorithms on datasets to generate new outputs, such as text, images, videos, and music. Generative AI models learn patterns and structures during the training data process, enabling them to predict and generate new data based on learned patterns. Generative AI models can undergo reinforcement learning through human feedback for improvement, where trainers rank the responses provided by generative AI models to train algorithms to give the responses deemed most valuable by humans. Generative AI can be applied across various fields, including design, content generation, simulation, and scientific discovery.

Large Language Models (LLMs) are a special type of generative AI that receives text-type input and provides responses of the same text type, thus drawing significant attention. LLMs are exemplars of large unimodal models and serve as the operational foundation for early versions of chatbots integrating these models. While LLMs engage in conversation, the models themselves do not understand what they are generating. They merely predict the next word based on preceding words, learned patterns, or combinations of words.

This guide explores the increasingly widespread uses of multi-modal large models (including LLMs), which are trained on highly diverse datasets that include not only text but also biosensors, genomics, epigenomics, proteomics, imaging, clinical, social, and environmental data. As such, multi-modal large models can accept multiple types of inputs and produce outputs that are not limited to the types of input data. Multi-modal large models can be widely applied in healthcare and drug development.

Multi-modal large models differ from previous AI and machine learning approaches. While AI has been widely integrated into many consumer applications, most algorithms’ outputs neither require nor invite customer or user participation, except for the early forms of AI integrated into social media platforms that attract attention by curating user-generated content. Another distinction of multi-modal large models from other types of AI is their versatility. Previous and existing AI models, including those for medical purposes, are designed for specific tasks, thus lacking flexibility. They can only perform tasks defined within their training set and labels, and cannot adapt or perform other functions without retraining with different datasets. Therefore, although the U.S. Food and Drug Administration has approved over 500 AI models for clinical medicine, most models have only been approved for one or two narrower tasks. In contrast, multi-modal large models are trained on different datasets and can be used for multiple tasks, including some that have not been explicitly trained.

Multi-modal large models typically feature interfaces and formats that facilitate human-machine interaction, mimicking human communication to guide users in injecting human-like qualities into the algorithms. Thus, unlike other forms of AI, the usage of multi-modal large models and the content of their generated and provided responses appear “human-like”, which is one reason for the unprecedented public adoption of multi-modal large models. Additionally, due to the authoritative appearance of their responses, even though multi-modal large models cannot guarantee the correctness of their responses and cannot incorporate ethical norms or moral reasoning into their generated responses, many users still uncritically regard them as correct. Multi-modal large models have been used in numerous fields, including education, finance, communication, and computer science, while this guide illustrates the different ways multi-modal large models are used (or envisioned to be used) in healthcare and pharmaceuticals.

Multi-modal large models can be seen as the product of a series (or chain) of decisions made by one or more actors regarding programming and product development. Decisions made at each stage of the AI value chain can have direct or indirect impacts on downstream entities involved in the development, deployment, and use of multi-modal large models. These decisions may be influenced and regulated by governments enacting and enforcing laws and policies at national, regional, and global levels.

The AI value chain typically begins with a large tech company that develops the models. Developers can also be universities, smaller tech firms, national health systems, public-private partnerships, or other entities with the resources and capabilities to utilize several inputs. These inputs constitute the “AI infrastructure”, such as data, computational power, and AI expertise used to develop general-purpose foundation models. These models can be directly used to perform various tasks, often unforeseen, including those related to healthcare. Several general-purpose foundation models are specifically trained for use in healthcare and pharmaceuticals.

The WHO recognizes that AI can bring significant benefits to healthcare systems, including improving public health and achieving universal health coverage. However, as stated in the WHO Ethics and Governance of Artificial Intelligence for Health guide, AI poses significant risks that can harm public health and threaten individual dignity, privacy, and human rights. Although multi-modal large models are relatively new, their acceptance and dissemination speed has prompted the WHO to provide this guide to ensure that they have the potential for successful and sustainable use worldwide. The WHO acknowledges that at the time of releasing this guide, there are many competing perspectives on the potential benefits and risks of AI, the ethical principles applicable to the design and use of AI, and governance and regulatory approaches. Since this guide was released shortly after the initial applications of multi-modal large models in health and before more powerful models were released, the WHO will update this guide to adapt to the rapid technological advancements, the social handling of its use, and the impact of using multi-modal large models outside of healthcare and pharmaceuticals on health.

1.1 Importance of General-Purpose Foundation Models (Multi-Modal Large Models)

Although multi-modal large models are relatively new and untested, they have already had a significant impact on society across various fields, including healthcare and pharmaceuticals. ChatGPT is a large language model developed by a U.S. tech company, which has released multiple versions. It is estimated that in January 2023, just two months after its launch, the model reached 100 million monthly active users. This made it the fastest-growing consumer application in history.

Currently, many companies are developing multi-modal large models or integrating them into consumer applications, such as internet search engines. Large tech companies are also rapidly integrating multi-modal large models into most application software or developing new application software. With the support of millions of dollars in private investment, startups are also racing to develop multi-modal large models. Due to the availability of open-source platforms, the multi-modal large models developed are faster and cheaper than those developed by giant companies.

The emergence of multi-modal large models has facilitated new investments and the continuous launch of new products in the technology sector, but some companies also acknowledge that they do not fully understand why multi-modal large models generate certain responses. Despite reinforcement learning based on human feedback, the content generated by multi-modal large models remains unpredictable and uncontrollable, potentially generating responses that make users uncomfortable or producing incorrect yet highly convincing content during “conversations”. Nevertheless, support for multi-modal large models often stems not only from enthusiasm for their functionality but also from unconditional and uncritical claims about their performance in unpublished literature.

The datasets used to train multi-modal large models have not been made public, but multi-modal large models have been rapidly adopted, making it difficult or impossible to know whether these data are biased, legally obtained, and compliant with data protection rules and principles, and whether the execution of tasks or queries reflects that it has been trained on the same or similar issues and has acquired problem-solving capabilities. Other issues regarding the data used to train multi-modal large models, such as compliance with data protection laws, will be discussed below.

Individuals and governments are not prepared for the release of multi-modal large models. Individuals may not understand effective use of multi-modal large models without training, even if multi-modal large model chatbots give an impression of accuracy and reliability, their responses are not always accurate or reliable. One study found that while the large language model GPT-3 “can generate more easily understandable accurate information compared to humans, it can also generate more convincing false information”, and humans cannot distinguish between content generated by multi-modal large models and that generated by humans.

Governments are also largely unprepared. Laws and regulations designed to govern the use of AI may not address the challenges or opportunities associated with multi-modal large models. The EU has reached an agreement on enacting an AI Act applicable across the EU, but had to modify its legislative framework at the final drafting stage considering multi-modal large models. Governments in other countries are also rapidly drafting new laws or regulations or enacting temporary bans (some of which have been lifted). It is expected that in the coming months, various companies will successively launch more powerful multi-modal large models, which may bring new benefits but also new regulatory challenges. In this dynamic environment, this guide builds on previous guidelines, including ethical guidelines, to propose opinions and suggestions for using multi-modal large models in healthcare and pharmaceuticals.

1.2 WHO Guidelines on Ethics and Governance of AI in Health

The first edition of the WHO guidelines on the ethics and governance of AI in health reviewed various approaches to machine learning and various applications of AI in health but did not specifically review generative AI or multi-modal large models. At the time of formulating the guidelines and when the guidelines were published in 2021, there was no evidence indicating that generative AI and multi-modal large models would soon be widely applied in clinical care, medical research, and public health.

However, the fundamental ethical challenges, core ethical principles, and recommendations proposed in the guidelines (see Box 1) remain relevant for assessing and effectively and safely using multi-modal large models, despite the emergence and ongoing challenges of governance gaps in this new technology. These challenges, principles, and recommendations also form the basis for the expert group’s approach to multi-modal large models in this guide.

Leave a Comment Cancel reply