Ethics and Governance of AI in Health: Multi-Modal Model Guide (Part Three)

Ethics and Governance of AI in Health: Multi-Modal Model Guide (Part Three)Ethics and Governance of AI in Health: Multi-Modal Model Guide (Part Three)

Editor’s Note

Ethics and Governance of AI in Health: Multi-Modal Model Guide (Part Three)

With the rapid development of artificial intelligence technology, the health sector is continuously attempting to improve medical quality and enhance work efficiency by introducing AI. Due to the capability of large model technology to process complex tasks and large-scale data, significantly enhancing the generalization, versatility, and practicality of AI, this technology has broad prospects and potential in areas such as disease prediction, diagnosis, treatment, and drug development, but it also brings many ethical challenges and risks, necessitating strengthened governance.

On January 18, 2024, the World Health Organization (WHO) released the English version of the “Ethics and Governance of Artificial Intelligence for Health: Guidance on Large Multi-Modal Models,” aimed at assisting countries in planning the benefits and challenges related to multi-modal large models in the health sector, and providing policy and practical guidance for the appropriate development, provision, and use of multi-modal large models.

Given the significant strategic importance of multi-modal large model AI for our country to gain new advantages in strategic competition and promote the health of the general public, this publication has organized experts engaged in related research fields to translate the guide into Chinese for reference by researchers, hoping to promote the research and guidance of ethical governance of medical large models in our country, achieving a positive interaction of high-quality innovative development and high-level safety.

This article was first published on CNKI, reference format as follows:

Wang Yue, Song Yaxin, Wang Yifei, et al. Ethics and Governance of Artificial Intelligence for Health: Guidance on Large Multi-Modal Models [J/OL]. Chinese Medical Ethics: 1-58 [2024-03-14]. http://kns.cnki.net/kcms/detail/61.1203.R.20240304.1833.002.html.

Ethics and Governance of AI in Health: Multi-Modal Model Guide (Part Three)

Ethics and Governance of Artificial Intelligence for Health: Guidance on Large Multi-Modal Models

Ethics and governance of artificial intelligence for health. Guidance on large multi-modal models. Geneva: World Health Organization; 2024. Licence: CC BY-NC-SA 3.0 IGO.

This translation is not derived from the World Health Organization (WHO), and the WHO is not responsible for the content or accuracy of the translation. The English original should be regarded as the authoritative text.

Original Version Number: ISBN 978-92-4-008475-9 (electronic version); ISBN 978-92-4-008476-6 (print version)

Ethics and Governance of AI in Health: Multi-Modal Model Guide (Part Three)

Translators

Ethics and Governance of AI in Health: Multi-Modal Model Guide (Part Three)

Wang Yue1, Song Yaxin1, Wang Yifei1, translators; Yu Lian2, Wang Jing3, reviewers

(1 School of Law, Xi’an Jiaotong University, Xi’an, Shaanxi 710049; 2 School of Public Health, Xi’an Jiaotong University, Xi’an, Shaanxi 710061; 3 Beijing University of Chinese Medicine Affiliated Hospital, Beijing 100010)

Ethics and Governance of AI in Health: Multi-Modal Model Guide (Part Three)

Continued from previous article

Ethics and Governance of Artificial Intelligence for Health: Guidance on Large Multi-Modal Models

Series | Ethics and Governance of Artificial Intelligence for Health: Guidance on Large Multi-Modal Models (Part Two)

I. General Foundation Models (Multi-Modal Large Models) Applications, Challenges, and Risks

3 Risks of General Foundation Models (Multi-Modal Large Models) to Health Systems and Social and Ethical Issues

Many risks and issues associated with multi-modal large models will affect individual users, such as healthcare professionals, patients, researchers, or caregivers, but they may also pose systemic risks. Emerging or anticipated risks associated with multi-modal large models in the health sector and other AI-based technologies include: (i) risks that may affect national health systems, (ii) regulatory and governance risks, (iii) concerns from the international community.

3.1 Health Systems

The healthcare system is based on six components: providing healthcare services, healthcare human resources, healthcare information systems, access to essential medicines, finance, and leadership and governance. All six pillars may be directly or indirectly influenced by multi-modal large models. The risks associated with the use of multi-modal large models that may affect the healthcare system are described as follows:Overestimating the Benefits of Multi-Modal Large Models while Underestimating RisksSome individuals tend to exaggerate and overestimate the role of artificial intelligence, which may lead to the adoption of products and services that have not undergone rigorous safety and efficacy evaluations. This is largely due to the persistent allure of “technological solutionism,” where technologies such as AI and multi-modal large models are treated as “magic bullets” to eliminate deeper social, structural, economic, and institutional barriers before they are proven useful, safe, and effective.Multi-modal large models are novel yet untested; as mentioned, they do not produce facts but rather information similar to facts, which may be inaccurate. Consumers, policymakers, and the public are quite interested in multi-modal large models, but this may lead policymakers, healthcare providers, and patients to overestimate their benefits while neglecting the challenges and issues that multi-modal large models may bring. For policymakers, necessary evidence to determine the scope of multi-modal large model usage may not be obtainable before they are developed and used. The choice to use multi-modal large models should not take precedence over existing AI-based technologies already in use, nor should it take precedence over non-AI or non-digital solutions that may be underfunded and underutilized but have been proven to have therapeutic or public health benefits. Imbalanced healthcare policies and misleading investments may divert attention and resources away from interventions that have been proven effective and exacerbate the pressure on health departments to compress public healthcare spending, particularly disadvantaging resource-limited low- and middle-income countries.Accessibility and AffordabilityMany factors can affect equitable access to multi-modal large models that benefit healthcare providers and patients. One of them is the digital divide, which limits the use of digital tools in certain countries, regions, or among certain populations. The digital divide also leads to other disparities, many of which affect the use of AI, and AI itself may reinforce and exacerbate these gaps. Another factor that may affect people’s use of multi-modal large models is that, unlike the internet, many multi-modal large models can only be accessed after payment or subscription, as developing and operating multi-modal large models can be expensive. It is estimated that ChatGPT’s daily operating cost is $700,000. Some businesses are charging subscription fees for new versions of multi-modal large models, which will not only make them unaffordable for low- and middle-income countries but also for individuals, healthcare systems, or local governments in resource-scarce environments in high-income countries. In contrast, the poor in all countries can only use “cost-effective solutions” of multi-modal large models, while only the wealthier individuals can access the services of “real” healthcare professionals. A third factor is that currently most multi-modal large models only operate in English. Therefore, while they can receive input and provide output in other languages, they are more likely to produce misinformation or errors.Systemic BiasAs mentioned, the datasets used to train AI models are biased because many datasets do not include girls and women, minorities, the elderly, rural areas, and vulnerable groups. Generally, AI tends to favor the most represented populations in the data, so in unequal societies, AI may disadvantage minority groups. A particular issue with multi-modal large models is that bias may increase as the scale of the model expands, even if supposedly smaller multi-modal large models are being developed; however, the increasing amount of data used to train continuous models leads to an increase in bias. Bias may trigger discrimination throughout the healthcare system, affecting people’s access to essential services, including healthcare and quality care. However, at the same time, multi-modal large models are likely to contain some data that can “counteract” various forms of bias and stereotypes. Researchers have found that prompting models not to rely on stereotypes can have a tremendous positive impact on the algorithm’s responses.Impact on Workforce and EmploymentAccording to an investment bank, multi-modal large models are expected to lead to the loss (or “degradation”) of at least 300 million jobs. A report from the Organization for Economic Cooperation and Development indicates that the occupations most affected by AI-driven automation among its member countries are high-skilled jobs, partly because after the use of multi-modal large models, particularly in “finance, healthcare, and legal occupations… may suddenly find themselves at risk of automation by AI.” However, for many countries, the healthcare sector is not an industry but a core function of government, and healthcare professionals may not be replaced by technology. In addition, many countries still face shortages of healthcare professionals, including after the COVID-19 pandemic. The WHO estimates that by 2030, the shortage of healthcare professionals will reach 10 million, primarily concentrated in low- and middle-income countries. Therefore, validated safe and effective multi-modal large models could be used to bridge the gap between the workforce needed to provide healthcare services and the existing workforce. Another concern is the impact of the market introduction of multi-modal large models on the current and future number of healthcare professionals. A large technology company estimates that up to 80% of jobs will be affected by the arrival of AI. Consulting firm Accenture predicts that 40% of human work time may be impacted by multi-modal large models and optimistically points out that “the positive impact on human creativity and productivity will be immense.” However, as mentioned, the introduction of multi-modal large models may pose significant challenges for many healthcare professionals who need to be trained and adapt to multi-modal large models. Health systems must consider the challenges it poses to healthcare providers and the risks to patients and caregivers.The third concerning issue is the mental and psychological toll on those responsible for reviewing content, labeling training data, and removing content containing abusive, violent, or psychologically distressing material. Those responsible for filtering such content often work in low- and middle-income countries, earn low wages, and may suffer psychological distress from reviewing such content without access to counseling or other forms of healthcare.Healthcare Systems’ Dependency on Non-Adapted Multi-Modal Large ModelsWhile multi-modal large models can address the ongoing shortage of healthcare professionals and expand the coverage of healthcare systems, they may also lead these systems to become overly dependent on multi-modal large models, particularly those technologies developed by the industry. Therefore, if multi-modal large models in the healthcare and public health sectors are not maintained, reduced, or designed and updated only for high-income environments, healthcare systems relying on these multi-modal large models will have to adjust and may need to provide healthcare services without multi-modal large models. At that point, if healthcare professionals have already become “de-skilled,” outsourcing certain responsibilities to AI, or if patients expect to use AI, this could make the situation quite difficult. The associated risk is that if multi-modal large models cannot protect patients’ privacy and confidentiality, over-reliance on multi-modal large models may undermine individual and societal trust in the healthcare system, as people will no longer have confidence in obtaining healthcare services without compromising privacy.Cybersecurity RisksAs healthcare systems increasingly rely on AI, these technologies may become targets for malicious attacks and hacking, with some systems potentially being shut down, training data manipulated to alter their performance and responses, and data may also be “kidnapped” for ransom. As mentioned, a unique risk that jeopardizes security is that sensitive data is inputted into multi-modal large models that are not disclosed, authorized for use, or otherwise protected. Multi-modal large models themselves may also be affected by cybersecurity risks, such as “prompt injection” attacks. This type of attack refers to third parties inputting data into multi-modal large models, causing them not to operate in the manner intended by the developers. For example, a “prompt injection” attack could instruct a multi-modal large model designed to answer database queries to delete or modify information from the database. Currently, there is no solution to this vulnerability. Prompt injection attacks are currently being used by security researchers to illustrate the challenges faced by multi-modal large models, but they may also be exploited by malicious actors to steal data or defraud users.

3.2 Legal and Regulatory Compliance

While new laws can be enacted to regulate the use of AI, certain existing laws, especially those related to data protection and international human rights obligations, also apply to the development, provision, and deployment of multi-modal large models. Some multi-modal large models that have been developed and made available to the public may violate several major data protection laws, such as the EU’s General Data Protection Regulation (GDPR). This regulation covers various rights, such as the right to be free from the effects of automated decision-making. These rights, protections, and requirements must guide the development of AI.In EU member states and places like Canada, some of these violations have triggered investigations into multi-modal large models. Violations include: (i) multi-modal large models collecting and using personal data from the internet without obtaining individuals’ consent (nor collecting these data based on “legitimate interests”), (ii) not informing the public that their data is being used, and failing to grant users the right to correct errors, delete data (“the right to be forgotten”), or refuse the use of their data, (iii) multi-modal large models that are not sufficiently transparent when using sensitive data provided by users to chatbots or other consumer interfaces (despite legal requirements for users to be able to delete chat history data), (iv) multi-modal large models that do not have appropriate “age gate” systems to filter users under 13 years of age and those aged 13 to 18 without parental consent; (v) multi-modal large models that cannot prevent the leakage of personal information; (vi) multi-modal large models that release inaccurate personal information partly due to hallucinations. Other behaviors that may violate the GDPR include “explanation rights” requirements, where entities using personal data for automated processing need to explain how that system (such as a multi-modal large model) makes decisions. As mentioned, while some companies are researching ways to meet “explainability” requirements, it is currently impossible to explain how multi-modal large models make decisions.Many violations of data protection laws are also quite serious, relating to how multi-modal large models are trained, used, and managed by data controllers. Multi-modal large models may never be able to comply with the GDPR or other data protection laws. In 2023, a complaint submitted to a data protection agency in an EU member state claimed that a company’s large language model and its development and current operation systematically violate the GDPR.Many violations of data protection laws may also violate consumer protection laws. More broadly, if these issues cannot be resolved, they also directly violate WHO’s guiding principles on artificial intelligence for health, including the principle of protecting human autonomy and ensuring transparency, explainability, and understandability.The inability of companies to comply with existing laws may be a reason for some companies’ keen interest in upcoming AI regulations. In response to the EU’s planned AI Act, a leader from a large company stated that the company might not be able to provide its major multi-modal large model products in Europe because it may not be able to comply with relevant regulations. This ultimatum could lead to the erosion of privacy rights and other protections and may result in the provision of healthcare services depending on whether countries are willing to forgo certain human rights.

3.3 Societal Concerns and Risks

Like other AI technologies, multi-modal large models are expected to have broader societal impacts beyond the scope of the health system, which cannot be addressed by a single law or policy. These impacts include: multi-modal large models may reinforce the power and authority of a small number of tech companies (and their executives) that are leading in the commercialization of multi-modal large models. Multi-modal large models may also have negative effects on the environment and climate, as training and using multi-modal large models consume carbon and water. Until humans can ensure that AI does not replace human epistemic authority with information, evidence, and advice, this technology, which provides inaccurate, erroneous, biased information and lacks moral or contextual reasoning, will quickly become “entangled in the lives of billions at a pace that civilization cannot safely absorb,” including in healthcare and pharmaceuticals. Additionally, there are serious concerns that multi-modal large models may enhance technology-driven gender violence, including cyberbullying, hate speech, and the non-consensual use of images and videos, such as “deep fakes.” The latter risk is not addressed in this guide but warrants broader attention from the WHO, as it has serious negative impacts on the health and well-being of targeted populations, particularly underage and adult women.Challenges Associated with Large Tech CompaniesWith the expansion of the parameters and scale of multi-modal large models, the emergence of multi-modal large models has strengthened the dominance and centrality of a few large tech companies in the development and deployment of AI. Few companies and governments have sufficient manpower and financial resources, expertise, data, and computing power to develop increasingly complex multi-modal large models. The computing power and investment for multi-modal large models are increasing, and as demand for AI grows, the cost of recruiting “AI talent” is also very high. Multi-modal large models with the most powerful chips require multiple computers to work continuously for weeks or even months and thousands of chips to work together to complete training.As the costs of training, deploying, and maintaining multi-modal large models continue to rise, a few companies may “capture” many products and services (including in the health sector) as potential components, thereby marginalizing academia, startups, and even governments. In AI research, there is already compelling evidence that the largest companies are crowding out academia and government. The number of AI PhD graduates choosing jobs in companies is one example. Currently, the number of graduates choosing to work in companies is “unprecedented.” In 2004, only about 20% of graduates entered corporate jobs, while by 2020, nearly 70% of graduates entered corporate jobs. In the U.S. and other countries, the number of faculty specializing in AI research being hired by companies has increased eightfold since 2006. In terms of computing power and the use of large datasets, the industry has also become the dominant force over government and academia. In 2021, industrial models were 29 times larger than academic models.Furthermore, original spending, especially the original spending of government in high-income countries, lags far behind that of the industry. One study noted, “In 2021, U.S. non-defense government agencies allocated $1.5 billion for AI. In the same year, the European Commission planned to invest €1 billion ($1.2 billion). In contrast, by 2021, global industry investment in AI was expected to exceed $340 billion, far outpacing public investment.”The dominance of “AI investment” means that large tech companies now also dominate outputs and outcomes. The industry share of the largest AI models increased from 11% in 2010 to 96% in 2021, while the number of research reports co-authored by one or more industries increased by 16% between 2000 and 2020.The dominance of large tech companies not only determines the applications and uses of AI but increasingly also determines the priorities of early research. The dominance of the industry and the lack of government investment also mean that important AI technologies that serve the public interest, including healthcare and pharmaceuticals, may become increasingly scarce. This is different from the pharmaceutical sector, where significant investments have been made by governments, non-profits, and charities in research and development, especially in the early critical stages of drug development and the later stages of certain treatment methods. Therefore, companies will increasingly supervise and underpin the operation of the economy and social sectors, including the healthcare system, raising concerns about citizens’ and governments’ control over their own lives.In the absence of alternatives and regulation (even if laws are enacted in 2023, it may take years to fully implement), how large tech companies make internal decisions and how they engage with society and government become increasingly important. Companies may engage with, for example, frontier model forums or collaborate with governments of high-income countries to address various issues, including several voluntary commitments made with the U.S. government and upcoming commitments with the EU.Another concerning issue is that companies may not adhere to ethical social responsibilities. For example, some large tech companies establish ethics teams to ensure that the design and development of AI models comply with internal ethical principles, by introducing “friction” to require companies to slow down or stop certain development activities, but large tech companies often either sideline or dismantle these ethics teams. Dismantling an entire team responsible for AI ethics-related issues means that ethical principles are not “closely tied to product design” but are sidelined.Some large tech companies commit through frontier model forums to ensure that “cutting-edge AI models are developed responsibly and safely,” including multi-modal large models, “to determine best practices for responsible development and deployment of cutting-edge models,” and “to collaborate with policymakers, academia, civil society, and businesses to share knowledge about trust and safety risks.” In voluntary commitments with the U.S. government, tech companies guarantee to avoid harmful biases and discrimination and protect privacy. However, it remains unclear whether voluntary commitments or partnerships are sufficient to replace strong commitments to ethical standards. For example, an ethics team from one company had previously suggested halting the release of new multi-modal large models, but later they modified the document, downplaying previously recorded risks.Large tech companies have little history or expertise in developing healthcare products and services. Therefore, they may be insensitive to the needs of healthcare systems, healthcare providers, and patients, and may not address issues such as privacy or quality assurance that traditional healthcare enterprises and public health institutions are familiar with. Over time, their sensitivity may increase, as has been the case with other companies that have provided healthcare products and services for decades.Many companies developing multi-modal large models are not transparent with governments, regulators, or businesses that may use their models, which may (i) require evidence, data, performance, and other information to assess the risks and benefits of multi-modal large models, (ii) need the number of parameters in the model to measure how powerful the model is. Companies developing products and services using such models also do not disclose how they assess ethical challenges and risks, the safeguards taken, the responses of multi-modal large models to these safeguards, and when the use of the technology should be restricted or halted. Researchers assessed the transparency index of foundational models of 10 leading large language model developers based on 100 indicators and found that “not a single major foundational model developer provided close to sufficient transparency, indicating a fundamental lack of transparency in the AI industry.” The voluntary agreements made by the U.S. federal government with several large tech companies include two transparency commitments. These companies commit to: (i) sharing risk management information with industry, government, civil society, and academia; (ii) publicly reporting the capabilities, limitations, and areas of adaptation and non-adaptation of their AI systems. Although these commitments may represent an improvement over the status quo, they are all voluntary and open to interpretation; without specific regulatory requirements, full disclosure may not be achieved.Due to internal commercial pressures or external competition, companies are eager to bring new multi-modal large models to market as quickly as possible, even if appropriate testing, safeguards, and ethical risks and concerns have not been identified and addressed. An executive from one company stated, “Worrying about issues that can be resolved later is absolutely a fatal mistake.” Companies seek first-mover advantages, as the market share of multi-modal large models in certain areas (such as internet search) can generate revenue. According to one company, every 1% increase in market share for search engines corresponds to an increase of $2 billion in revenue. An executive from a large tech company stated that their multi-modal large model “is not perfect,” but because “the market demands,” it will be released. Companies that release multi-modal large models without fully identifying, validating, explaining, and mitigating risks will accumulate “moral debt,” the ultimate consequences of which will not be borne by the companies but by those most vulnerable to the negative impacts of such technologies. Members of the frontier model forum are committed to “advancing AI safety research” and “determining best practices,” and their voluntary commitments to the U.S. government include conducting internal and external testing of AI systems before their release.Commercial pressures may not only lead companies to hastily push multi-modal large models to market but may also lead companies to cancel or abandon products and services with significant public health benefits in favor of services that can generate revenue. In 2023, a large tech company “cut” the team developing ESMFold, a protein language model that can predict complete atomic-level protein structures from a single sequence, and also generated a database containing over 600 million protein structures. There are concerns that the company may be unwilling to “bear the costs of maintaining the database and allowing scientists to run the ESM algorithm on new protein sequences.”Carbon Footprint and Water Footprint of Multi-Modal Large ModelsThe expanding scale of multi-modal large models also has significant environmental implications. Multi-modal large models require vast amounts of data, and training data consumes a lot of energy. At a large enterprise, training a new multi-modal large model requires about 3.4 gigawatt-hours (2 months), equivalent to the annual energy consumption of 300 American households. Although some multi-modal large models are trained in data centers using renewable or carbon-free energy, most AI models are trained in power grids powered by fossil fuels. As more companies introduce multi-modal large models, electricity consumption will continue to rise, potentially having a significant impact on climate change.The WHO considers climate change an urgent global health challenge that requires prioritized action now and in the coming decades. Between 2030 and 2050, climate change is expected to lead to an increase of about 250,000 deaths annually due to malnutrition, malaria, diarrhea, and heat stress. By 2030, the direct health damage costs are estimated to be between $20 billion and $40 billion annually. In areas with weak healthcare infrastructure, primarily low- and middle-income countries, their capacity to respond will be the worst without proper preparedness and response aid.Multi-modal large models also have a significant impact on water use. The early training of a multi-modal large model at a large tech company consumed 700,000 liters of fresh water, while other data centers may use even more water. Although many developers are increasingly aware of their carbon footprint, many are unaware of their water footprint. A brief conversation with a multi-modal large model (20-50 questions and answers) requires water equivalent to a 500-milliliter bottle of water. The total water footprint of training multi-modal large models, including the manufacturing, transportation, and chip production of AI servers, may be much larger. Data centers can put pressure on local water supplies. For example, a company’s data center used more than 25% of the total water supply of a city in Oregon, USA. Another large tech company is planning to build a data center in a severely drought-stricken country, forcing local residents to drink saline water. Tracking water footprints is challenging because, while there is more awareness, measurement, and understanding of carbon footprints, companies lack the same understanding of water footprints or do not measure them.The “Danger” of Algorithms Replacing Human Epistemic AuthorityA more general societal risk associated with the emergence of multi-modal large models is that, despite providing seemingly plausible responses, they are gradually being regarded as sources of knowledge, which may ultimately undermine human epistemic authority, including in healthcare, science, and medical research. In fact, multi-modal large models do not produce knowledge and do not understand what they are “saying,” and they have no moral or contextual reasoning when answering questions. If this situation continues, society may not be prepared for the consequences of computer-generated reasoning. Early AI has provided information through social media algorithms, spread misinformation, negatively impacted mental health, and exacerbated polarization and division. Even as tech companies repeatedly warn of the dangers of multi-modal large models, they continue to release multi-modal large models directly to society without safeguards or regulation, which not only risks replacing human control over knowledge production but may also weaken human capacity to safely use knowledge in healthcare, medicine, and other systems that society relies on. This harm will particularly affect communities and populations in resource-scarce environments, as their data is less likely to be used to train AI systems, reducing the accuracy of system responses. However, these groups may heed the advice of AI systems, especially when there are no professional healthcare personnel or providers to contextualize or correct the errors or inaccuracies generated by multi-modal large models.Multi-modal large models are releasing increasingly imperfect information or misinformation into the public domain and knowledge base, which may ultimately lead to “model collapse,” where multi-modal large models trained on inaccurate or false information will also contaminate public information sources such as the internet. To avoid this situation while maximizing the role of multi-modal large models in healthcare and other important social domains, governments, civil society, and the private sector must guide these technologies to serve the common good.Ethics and Governance of AI in Health: Multi-Modal Model Guide (Part Three)

[To Be Continued, Stay Tuned]

Ethics and Governance of AI in Health: Multi-Modal Model Guide (Part Three)

Editor: Shang Dan

Reviewer: Ji Pengcheng

Leave a Comment