Introduction
The era of artificial intelligence (AI) participating in drug research and development is rapidly approaching. In recent years, AI technology has achieved remarkable results in small molecule drug development, demonstrating tremendous potential in molecular innovation, target discovery, and clinical outcome prediction, attracting the attention of numerous pharmaceutical companies and investors worldwide. In the first quarter of 2022, there were 27 investment activities in the AI drug development field, with a total amount exceeding 10 billion RMB (incomplete statistics).
This article focuses on the research and application of AI technology in drug development, introducing the core value of AI in segmented applications and the layout and development trends of domestic and foreign pharmaceutical companies in the AI pharmaceutical field.
1. Artificial Intelligence
Artificial intelligence (AI) technology has developed for over 60 years since it was proposed in 1956. With the continuous improvement of computing power, algorithms, and data, AI technology is gradually penetrating the field of drug research and development in healthcare. Similar to its application in other scenarios, the implementation path of AI + new drug development includes five major processes: 1) acquiring target training datasets; 2) AI autonomous learning algorithm modeling; 3) multiple training to optimize models; 4) applying test sets to evaluate model performance; 5) achieving predetermined goals such as molecular screening, prediction, and analysis based on the model. The three essential elements are algorithms, datasets, and models, among which algorithms and data are key to achieving applications.
The development history of AI technology in drug research and development
The implementation path of AI technology in new drug development
1.1Data – Structured Data and Open Access Promote AI Development
For AI in pharmaceuticals, structured and open access to literature data aids development, while the current scarcity of clinical data defines the competitive landscape. The development of artificial intelligence is based on learning and understanding vast amounts of data, so resolving the related data source issues is the primary prerequisite for AI to play a significant role in new drug development. The three main data sources for AI pharmaceuticals are medical literature data, drug chemistry data, and clinical trial data.
PubMed is a representative medical literature database. This free search engine provides a database for searching biomedical papers and abstracts, making it particularly suitable for the target discovery phase in AI pharmaceuticals. MEDLINE is the primary database source for PubMed. From the perspective of data availability, although PubMed contains 30 million bilingual academic papers in Chinese and English, covering fields such as biology, medicine, and pharmacy, providing a friendly literature search entrance and structured search results display for AI pharmaceuticals, it does not provide full-text services as it is an abstract database. AI pharmaceuticals require using NLP technology to extract and analyze full texts of articles, making PubMed insufficient. Currently, only a portion of PubMed provides two methods to obtain full texts. However, only some documents offer Full text links or PMC Full text to provide the full text.
Databases suitable for compound synthesis include ChEMBL and PubChem. ChEMBL is a large, open-access drug discovery database aimed at collecting drug chemistry data and knowledge during the drug research and development process. Information about small molecules and their biological activities comes from full-text articles in several core drug chemistry journals and is combined with data on approved drugs and clinical development candidates (such as mechanisms of action and therapeutic indications). Currently, ChEMBL includes 1,961,462 different compounds and 13,382 targets. From the perspective of data availability, ChEMBL’s acquisition method is very user-friendly and provides various data interaction methods, supporting direct downloads via the web or API calls through Python. All relevant labels in ChEMBL are marked in independent fields, resulting in excellent data structuring. PubChem is a chemical module database maintained by the National Institutes of Health, providing biological activity data for over 10,000 organic small molecules. PubChem aims to promote the public utilization of small molecule data resources and allows comprehensive chemical composition datasets to be downloaded for free via FTP. The PubChem database consists of three sub-databases: PubChem BioAssay, PubChem Compound, and PubChem Substance, which store biochemical experimental data, organized compound chemical structure information, and raw data of compounds uploaded by institutions and individuals, respectively. From the perspective of data availability, PubChem offers three data acquisition methods, in addition to direct downloads and Python script API calls, it also provides FTP downloads, making PubChem similarly user-friendly in data acquisition, with all relevant labels marked in independent fields in structured data.
TDC (Therapeutics Data Commons) is the first large dataset for ML (Machine Learning) in the biomedicine field, developed by researchers from Harvard, MIT, Stanford, and other institutions. Obtaining and processing raw biomedical data into ML-Ready data requires a lot of expertise, making it challenging for machine learning researchers to process it quickly and accurately. Biomedical datasets have been scattered in various corners, and there has never been a centralized platform to organize and access these data. The emergence of TDC may solve this problem; it is open-source, large-scale, and can be called with just three lines of code. TDC includes over 20 meaningful tasks for ML in biomedicine and more than 70 datasets, covering target protein discovery, pharmacokinetics, safety, and drug production, including small molecules, antibodies, vaccines, miRNA, etc.
The various data accumulated by pharmaceutical companies during the long drug development process is currently the main source of clinical trial data. Currently, many domestic pharmaceutical companies have not achieved full digitalization and lack sufficient data accumulation, making it difficult for AI to play a role. For large pharmaceutical companies, their research and development data is a non-public core asset, and they generally do not easily share data with other companies. Non-open-source research and development data create a high industry entry barrier, leading to a situation where AI new drug companies that first establish cooperation with large pharmaceutical companies enjoy a competitive advantage.
The databases of AI in the new drug development process
1.2Algorithms – Deep Learning and Open Source Applications Continuously Iterating and Optimizing
AI involves multiple methodological areas, such as reasoning, knowledge representation, and solution search, including the basic paradigm of machine learning (ML). A subfield of ML is deep learning (DL), which is the most commonly used technology in artificial intelligence in the pharmaceutical field. It mimics the structure of the human brain as a learning method to mine existing knowledge and find correlations in vast amounts of data. This characteristic makes deep learning particularly suitable for the new drug development field to discover relationships between diseases and targets, as well as between diseases and genes. It involves artificial neural networks (ANN), which include various types such as multilayer perceptron (MLP) networks, recurrent neural networks (RNNs), and convolutional neural networks (CNNs). For more information on artificial intelligence algorithms, please refer to Jingwei Research | Phoenix Platform: The Application of AI Technology in Small Molecule Drug Screening.
The application domains and methodological domains of AI technology
2. Applications of Artificial Intelligence in Drug Development
Through research on nearly 80 AI companies domestically and abroad, it was found that AI is mainly applied in new drug development in scenarios such as target discovery, compound screening and synthesis, crystal form prediction, protein structure prediction, patient recruitment, optimizing clinical trial design, and drug repositioning. Among these, target discovery, compound screening and synthesis, and protein structure prediction are considered the most transformative research areas in global AI + new drug development.
Statistics on the Application of Artificial Intelligence in Drug Development (as of April 2020)
2.1Target Discovery
Targets are the foundation of new drug development. Although many drug targets have been discovered, it is still just the tip of the iceberg. According to statistics, it takes pharmacologists over 20 years to infer targets based on relevant research literature and personal experience, and the probability of discovering a target is extremely low. AI can learn from vast medical literature and related data through natural language processing technology (NLP) and discover the action relationship between drugs and diseases through deep learning, finding effective targets and shortening the target discovery cycle. Traditional target research qualitatively infers the relationship between the structure and activity of physiologically active substances in an intuitive manner, thus discovering the targets on which drugs can act on body cells. Insilico Medicine utilized its self-developed AI drug target discovery platform, PandaOmics, and AI molecular generation and design platform, Chemistry42, to obtain the world’s first disease target for idiopathic pulmonary fibrosis (IPF) completely driven by AI.
2.2Compound Screening
AI drug discovery has relatively mature technological applications in the drug target discovery and drug screening stages. Each drug’s targeted protein and receptor are not specific; if they act on non-targeted proteins and receptors, side effects may occur. For new drugs that have not yet entered animal and human trials, it is necessary to detect and judge their safety and side effects in advance to screen out safer drugs. Currently, compound screening mainly uses high-throughput screening methods, where robots conduct millions of experiments at the same time, with screening costs reaching hundreds of billions of dollars. AI can approach the compound screening scene from two aspects: first, using deep learning and computational power to develop virtual screening technology to replace high-throughput screening, and second, utilizing image recognition technology to optimize the high-throughput screening process. This is expected to save approximately $26 billion in compound screening costs each year. BenevolentBio once used JACS technology to identify 100 potential compounds that could be used to treat ALS and successfully screened out five compounds; BergHealth screened up to 250,000 disease tissue samples to find new biological indicators and biomarkers for early cancer detection.
2.3Compound Synthesis
Compound synthesis primarily analyzes the drug properties of small molecules, including binding ability to targets, pharmacokinetics, and drug metabolism, to explore compounds with good activity and efficacy, and then design synthesis according to specific pathways. In this phase, pharmacologists and chemists need to conduct computer simulations on tens of millions of compounds sequentially, which can take several years to find active compounds for synthesis, with costs typically in the hundreds of millions of dollars. AI can use ML and DL to simulate the drug properties of small molecules, selecting the best candidate compounds for synthesis experiments within weeks, and controlling the testing cost of each compound to 0.01 cents, significantly reducing the cost of compound synthesis.
2.4Crystal Form Prediction
Crystal form prediction includes deep learning combined with cognitive computing. For small molecule drugs, different crystal forms result in different drug stability and solubility, so stable crystal form structures are related to drug quality. Relying solely on manual methods to obtain stable and soluble crystal forms not only requires a lot of time for experimental attempts, but also has a low probability of success. AI can significantly improve the effectiveness of crystal form prediction, relying on deep learning and cognitive computing capabilities to process large amounts of clinical trial data, finding the most effective crystal form within hours or even minutes.
AbbVie combined crystal prediction technologies from CrystalWise Technology to design a new integrated model that can predict the thermodynamic solubility of molecules using two-dimensional structures as input. Compared to traditional drug crystal form development, pharmaceutical companies using AI technology can more freely face challenges from generic companies regarding crystal form patents. In addition, crystal form prediction technology shortens the time for crystal form development, allowing for more efficient selection of suitable drug crystal forms, thereby shortening the research and development cycle and controlling costs.
2.5Protein Structure Prediction
Protein misfolding is common in many diseases, including type II diabetes, Alzheimer’s disease, Parkinson’s disease, Huntington’s disease, and amyotrophic lateral sclerosis, among other neurodegenerative diseases. Therefore, developing methods to accurately predict the three-dimensional structure of proteins is of significant value for aiding new drug discovery and understanding protein folding diseases. AlphaFold, developed by DeepMind, is an artificial intelligence network that can determine the 3D structure of proteins based on their amino acid sequences. CloudDeep Intelligent Medicine adopted a “de novo folding” protein structure prediction method to help resolve the SRD5A2 crystal structure and effectively enhance the accuracy of protein structure prediction through its self-developed AI tool “tFold,” playing a core role in scientific breakthroughs.
2.6Clinical Trials
Clinical trials are the longest and most costly phase in new drug research. Due to issues such as inadequate patient cohort selection and poor monitoring of patients during clinical trials, the current success rate of drug clinical trials is not high. In failed Phase III clinical trials, 57% of failures are due to insufficient efficacy, primarily due to incorrect dosing and failure to identify the appropriate target patient population. AI-assisted clinical trial design mainly utilizes natural language processing technology to quickly process similar research, clinical data, and regulatory information, as well as read clinical trial data. For example, Trials.ai uses AI to optimize clinical trial design, making it easier for patients to participate in clinical trials and eliminating unnecessary clinical operational burdens.
AI patient recruitment mainly utilizes natural language processing, ML, and other technologies to identify and match subject information from various sources with the inclusion/exclusion criteria of clinical trial protocols, including digitizing medical data, understanding the content of medical data, associating datasets, and pattern recognition, expanding the range of subjects, and developing simplified tools for patients to search for clinical trials. The Mayo Clinic collaborated with IBM Watson to scan clinical trial databases based on natural language processing technology to find suitable patients for clinical trials. In a pilot study they conducted, the IBM Watson clinical trial matching system increased the average monthly enrollment in breast cancer trials by 80%; Zero-Knowledge Technology utilized big data to integrate patient information, accelerating the speed of recruiting patients for clinical trials.
AiCure is a clinically validated artificial intelligence company that can visually confirm drug intake on smartphones. AbbVie collaborated with AiCure for a Phase II clinical trial of a schizophrenia drug, and the study results, presented at the International CNS Clinical Trials and Methodology (ISCTM) conference, confirmed that subjects monitored using the AiCure platform had a cumulative adherence of 89.7% over 24 weeks, higher than the 71.9% adherence of subjects monitored using mDOT. This study adds increasing scientific evidence showing the advantages of using AI in clinical trials to increase statistical power and reduce sample size, thereby lowering costs and accelerating drug development.
2.7Drug Repositioning
In drug repositioning, AI’s DL capabilities and cognitive computing capabilities can be used to match existing drugs that are already on the market or in the development pipeline with diseases, discovering new targets and expanding the therapeutic uses of drugs. Additionally, leveraging publicly available large datasets in the public domain, AI algorithms can be used to train and derive predictive ML models for cross-target activities, applying them to the reuse of existing drugs to identify new indications. AI technology can also discover new uses for drugs through simulated randomized clinical trials.
Amid the COVID-19 pandemic, in the race to find a cure for COVID-19, the UK company BenevolentAI published a paper in The Lancet claiming that its AI platform, by searching through vast scientific literature, identified Baricitinib as a potential treatment for COVID-19. Eli Lilly launched a Phase III clinical trial (ACTT-2) on May 8, combining Baricitinib with Remdesivir, which reduced mortality rates by 60% and 43% in two patient subgroups compared to using Remdesivir alone.
3. Dynamics of AI Industry Development
In 2021, a total of 77 financing transactions occurred in the global AI + pharmaceutical industry, with a cumulative financing amount of $4.564 billion. Domestically, the total financing amount in the first half of 2021 exceeded 1 billion RMB. The AI + new drug market currently has three main types of players: start-ups, tech giants, and traditional large pharmaceutical companies. Start-ups focus on compound screening and target discovery, tech giants leverage their technological advantages to aggressively enter the market, building data and technology platforms, while traditional large pharmaceutical companies mostly choose mergers and acquisitions or partnerships with start-ups to actively lay out AI drug research and development.
Companies Applying AI Technology in Drug Development
Domestic Companies Applying AI Technology in Drug Development
3.1Domestic Representative Companies
3.1.1 CrystalWise Technology
CrystalWise Technology was established in 2015 and is a drug research and development company empowered by quantum physics and artificial intelligence. It aims to achieve innovation in drug development by improving the speed, scale, innovation, and success rate of drug research and development. The intelligent drug development platform of CrystalWise Technology integrates cloud-based supercomputing digital research and development tools with advanced experimental capabilities, forming a research and development system where high-precision predictions and targeted experiments mutually verify and guide each other. As one of the world’s pioneering AI drug development companies, CrystalWise Technology has established a complete iterative process of research and development that closely integrates quantum physical dry laboratories with advanced wet laboratories, challenging the efficiency bottlenecks of traditional research and development and empowering new drug development to achieve breakthroughs in innovation speed and scale. Currently, the company has established partnerships with enterprises including NewGe Yuan, East China Pharmaceutical, SLRD Pharmaceutical, PhoreMost, Boteng Co., Zhongsheng Pharmaceutical, Pfizer, DQ Medical, and others. In August 2021, CrystalWise Technology completed its D-round financing with a financing amount of $400 million, becoming the first unicorn in the domestic AI drug development field with a post-investment valuation exceeding 13 billion RMB.
CrystalWise Technology’s Business System
3.1.2 Insilico AI
Insilico AI is a global leader in using end-to-end artificial intelligence for target discovery, small molecule chemistry, and clinical research. Insilico AI develops artificial intelligence systems that utilize deep generative models, reinforcement learning, transformers, and other modern machine learning technologies to generate new molecular structures with specific attributes. Insilico AI also develops software for generating synthetic biological data, target identification, and predicting clinical trial outcomes. The company integrates two business models: providing AI-driven drug discovery services and software through its self-developed Pharma.AI platform, and developing its own preclinical pipeline using its self-developed platform. Insilico AI has demonstrated its capability to identify new targets for major diseases within 18 months, generate and validate new molecules with the desired characteristics for that target, and design a research and development process for generating new preclinical candidate compounds.
In February 2021, Insilico AI announced that it had globally discovered a new mechanism for treating idiopathic pulmonary fibrosis (IPF) using AI to develop a preclinical candidate compound. This project shortened the new drug development cycle to 18 months, reducing costs to $2.6 million, far better than the average time of 2-5 years and cost of $10.98 million for traditional new drug development.
In December 2021, Insilico AI announced the discovery of two preclinical candidate compounds targeting PHD2—ISM012-077 and ISM012-042, for the treatment of renal anemia and inflammatory bowel disease, respectively, within 12 months.
Insilico AI’s Business Structure
3.1.3 Accutar Biotechnology
Accutar Biotechnology was established in 2015 and currently has AI computing laboratories, biological laboratories, structural laboratories, and chemistry laboratories in the US and Shanghai. Since its establishment, the company has completed four rounds of financing, with notable shareholders including Zhenge Fund, IDG Capital, Yitu Technology, Chuanghua Venture Capital, Dinghui Investment, Yunfeng Fund, and Coatue Management. Accutar Biotechnology has built a full-chain AI drug development platform, covering all aspects of preclinical development for small molecule drugs, including virtual screening, drug property prediction, chemical retrosynthesis, drug optimization, and repurposing existing drugs. Currently, Accutar Biotechnology has established a pipeline of more than 10 new biological drugs, covering first-in-class and best-in-class drug targets. In September 2021, the new drug AC0682 developed using AI technology was approved by the FDA to conduct clinical trials.
Accutar Biotechnology’s Research Pipeline
3.2Foreign Representative Companies
At the beginning of AI implementation, the United States was a pioneer, with the highest number of companies using artificial intelligence to enforce research and development, research centers, and institutions. Over time, the activity levels of companies in the UK and EU using artificial intelligence to reorganize drug discovery and initiate government initiatives have increased. The activities of the UK and EU in the pharmaceutical AI race are primarily driven by Novartis, which announced significant steps in reimagining medicine by establishing the Novartis AI Innovation Lab and selecting Microsoft as its strategic partner for artificial intelligence and data science.
3.2.1 Exscientia
Exscientia was founded in 2012 and is headquartered in Oxford, UK, providing an AI-driven platform for drug design, discovery, and development, focusing on small molecule generation, drug activity, ADMET prediction, and virtual screening in drug development. The company went public on NASDAQ in October 2021. Exscientia combines the latest AI technologies with experimental innovations to design a new drug discovery process that significantly improves the efficiency and quality of drug discovery, successfully advancing the world’s first fully AI-designed small molecule drug candidate into clinical trials. The company has collaborated with well-known enterprises such as Bayer, Sanofi, Sumitomo Pharma, and Bristol-Myers Squibb.
In February 2020, the long-acting 5-HT1A receptor agonist DSP-1181, developed in collaboration with Sumitomo Pharma, initiated Phase I clinical trials in Japan for the treatment of obsessive-compulsive disorder (OCD). This candidate drug molecule is the world’s first AI-designed candidate to enter clinical trials, with the entire project taking less than a year from concept to clinical entry.
Subsequently, in April and May 2021, Exscientia announced that its second AI drug (EXS-21546; an A2a receptor antagonist; indications include pancreatic cancer, lung cancer, etc.) and its third AI drug (DSP-0038; a dual-targeting 5-HT1A agonist and 5-HT2A antagonist; indication is Alzheimer’s disease) both entered clinical development stages.
Exscientia’s Research Pipeline
3.2.2 BenevolentAI
BenevolentAI is a clinical-stage AI drug discovery company headquartered in London, UK, listed on the Amsterdam Stock Exchange. The company developed the Benevolent Platform—a leading computational and experimental discovery platform focused on three key areas: target identification, molecular design, and precision medicine. The platform powers a growing internal pipeline of over 20 drug projects, from target discovery to clinical research, and maintains successful collaborations with AstraZeneca and leading research and charitable institutions. In early 2020, during the COVID-19 pandemic, BenevolentAI identified a potential drug, Baricitinib, that could inhibit COVID-19 infection and reduce inflammatory damage through its biomedical knowledge graph, publishing the research results in The Lancet on February 4 of that year. In June 2021, the FDA and regulatory agencies in India and Japan approved Baricitinib for inpatient treatment of COVID-19 patients.
The company’s website lists several in-development products (the following image shows only a portion), with only one in clinical stages: BEN2293. BEN2293 is a small molecule pan-Trk inhibitor currently undergoing an I/II clinical study (NCT04737304) for the topical treatment of mild to moderate atopic dermatitis (AD).
Some of BenevolentAI’s Research Pipeline
3.2.3 Recursion Pharmaceuticals
Recursion Pharmaceuticals was founded on November 5, 2013, and is a clinical-stage biotechnology company headquartered in Salt Lake City, Utah, USA, which went public on NASDAQ in April 2021. The company uses computer vision technology to process cellular images, evaluating drug response results based on cellular characteristics. They employ advanced imaging and AI technologies to conduct high-throughput cellular model experiments, measuring thousands of features of a single cell, such as the size and shape of the nucleus or distances between different intracellular regions, aimed at screening thousands of candidate drugs across hundreds of disease cellular models, including genetic diseases, inflammation, immunology, and infectious diseases, ultimately identifying new drugs for different diseases. On October 5, 2015, the clinical project REC-994 received FDA orphan drug designation, primarily for the treatment of cerebral cavernous malformations, and entered Phase II clinical trials in 2019.
Some of Recursion’s Research Pipeline
3.2.4 Relay Therapeutics
Relay Therapeutics is a clinical-stage precision medicine company founded in Massachusetts in 2016 and went public on NASDAQ in 2020. In the pharmaceutical industry, many companies design and develop drugs based on static protein molecular structures. However, in real biological environments, the structures of protein molecules are not static but change continuously. Relay’s Dynamo platform was developed based on this concept. Utilizing techniques such as “room temperature crystallography” and cryo-electron microscopy, it can simulate the dynamics of molecules over extended periods. Coupled with machine learning technology, this platform can deeply understand the dynamics of protein molecules, helping to better develop and optimize new drugs. The company is advancing its drug product line targeting precision oncology, including its lead candidates RLY-4008 and RLY-1971, as well as its PI3Ka mutant selective project, RLY-PI3K1047. In the first quarter of 2020, it initiated a Phase I clinical trial of RLY-1971 in patients with advanced solid tumors and in the third quarter of 2020 initiated the first human clinical trial of RLY-4008 enriched in patients with oncogenic FGFR2 alterations.
Relay’s Research Pipeline
4. Opportunities and Challenges in AI Industry Development
The years 2014-2015 marked the initial development stage of AI in pharmaceuticals, with low recognition of AI pharmaceuticals in the industry. In the early stages, AI pharmaceutical companies lacked funding and new drug development capabilities, and also needed to conduct a lot of preliminary technical accumulation work—at that time, AI pharmaceutical start-ups primarily provided technical services. The subsequent two years (2016-2017) were relatively “quiet” for AI pharmaceuticals. During this time, some AI pharmaceutical start-ups began to try to lengthen the chain of technical services, not just improving efficiency at a specific point or phase of new drug development, but pursuing more end-to-end solutions, such as directly providing a molecular compound. In 2018, the development of AI pharmaceuticals finally saw a small breakthrough and explosion. The earliest batch of AI pharmaceutical companies, including Schrödinger, Relay, Recursion, Exscientia, and Insilico AI, began to obtain verifiable results of clinical candidate molecules. More and more people began to believe in the possibility of using AI for drug development, and more new members joined the AI pharmaceutical race, especially in China. Currently, the popularity of AI pharmaceuticals is undeniable, but it still faces many challenges.
4.1Lack of Model Interpretability
Currently, AI has good applications in clearly defined and rule-based fields, such as Go and medical image recognition. However, many aspects of drug development have not been clearly defined, which poses a significant challenge in the AI pharmaceutical field. For example, in drug toxicity research, a toxicity prediction model can be constructed using existing assessment indicators based on toxicity data, but this model cannot effectively determine the toxicity of a candidate molecule. It is necessary to comprehensively consider factors such as species differences, dosage, and in vivo exposure, and the complexity of biological systems makes it difficult to clearly define the factors affecting toxicity.
4.2Data Dilemma
Pharmaceuticals require high quantities and quality of medical data. AI pharmaceutical companies typically obtain data through collaborations with pharmaceutical companies, proprietary experimental platforms, or by outsourcing to CROs (Contract Research Organizations). However, medical industry data generally suffers from insufficiency, privatization, and lack of standardization, creating a “data wall” that blocks the transition of AI pharmaceuticals from theory to practice.
Data format inconsistency is a major factor leading to insufficient data volume. Many active data sources come from literature or drug patents. These data originate from various laboratories around the world, each with distinct experimental habits and data standards. For example, some laboratories habitually use IC50, while others use Ki to represent drug activity. When aggregating this data, issues of data standardization and quality arise.
Furthermore, the privacy issues surrounding data sharing are also a significant challenge. Much of the data related to drug development is held by major pharmaceutical companies. The immense commercial value of this data leads these companies to be reluctant to share it with AI pharmaceutical companies. Consequently, pharmaceutical companies form isolated information silos, each fighting their own battles, making the situation of limited data even worse.
To address data issues, data labeling for AI and some distributed machine learning technologies have received increasing attention in drug development in recent years.
4.3Talent and Policy
The introduction of new technologies has changed the original drug development model, necessitating simultaneous updates to regulatory talent and policy guidelines, yet no targeted policy guidelines have been issued. Regarding talent, the lack of high-end interdisciplinary talent also limits the development of this field. Although many universities now offer AI-related majors and courses, there remains a significant gap in interdisciplinary talent proficient in both AI and pharmaceuticals. Many AI pharmaceutical company team members either understand AI but not pharmaceuticals, or understand pharmaceuticals but not AI, and the integration of multidisciplinary knowledge faces many obstacles, which will become a key factor affecting the development of the AI pharmaceutical industry.
4.4Unclear Business Models
From the AI pharmaceutical companies that have already gone public, three major business models can be identified. These three business models are also the most typical and extreme in the current AI pharmaceutical industry.
4.4.1 SaaS Providers Focused on Providing Software Platform Services
These companies are dedicated to providing the most advanced computational software/hardware tools, accumulating more data through extensive collaborations to support algorithm iteration, thereby helping pharmaceutical companies complete research and development tasks better and faster. A typical representative of this business model is Schrödinger.
4.4.2 Biotech Companies Focused on Developing Internal R&D Pipelines, Empowered by AI
These companies do not provide external software services and collaborate less with external enterprises, primarily pushing their own pipelines to verify the capabilities of their algorithm platforms faster. A typical representative of this business model is Relay Therapeutics.
4.4.3 AI CRO Companies Providing Outsourcing Services to Relevant Pharmaceutical Companies and CROs
These companies primarily promote pipeline development in collaboration with numerous external enterprises, utilizing extensive cooperation to accumulate more data to support the optimization and iteration of their algorithm models. A typical representative of this business model is Exscientia.
Companies need to appropriately position their roles and choose suitable business models. Currently, the actual output of AI drug development is relatively low. In April 2019, IBM decided to stop developing and selling drug development tools—Watson AI Suite—due to poor financial performance. Most companies currently rely on financing, and for AI + drug development technology innovation companies, whether to engage in drug development themselves or adopt a CRO model requires a suitable choice based on their own development.
Although the current state of AI + pharmaceutical development is not very optimistic and faces numerous challenges, it is clear that the combination of AI + drug development will inevitably be the trend in the pharmaceutical industry and will bring about a revolutionary transformation in the medical field over the next ten or even twenty years, ushering in a new era.
References
[1] 2021 AI Drug Development Research Report
[2] AI Helps Breakthrough in New Drug Development; Data Element Positioning Defines Competitive Landscape
[3] Liu Xiaofan, Sun Xiangyu, Zhu Xun. The Current Status and Challenges of Artificial Intelligence in New Drug Development [J]. Progress in Pharmacy, 2021, 45(7):8
[4] 2021 China Medical AI Industry Research Report
[5] Exscientia Official Website
[6] Drug Farm Official Website
[7] BenevolentAI Official Website
[8] Recursion Pharmaceuticals Official Website
[9] Relay Therapeutics Official Website
[10] AI in Drug Discovery Drugs by Stage and Therapeutic Area
[11] Yang X, Y Wang, Byrne R, et al. Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery [J]. Chemical Reviews, 2019, 119(18).
[12] V Kaul, Enslin S, Gross S A. The History of Artificial Intelligence in Medicine [J]. Gastrointestinal Endoscopy, 2020, 92(4)
[13] Artificial Intelligence to Deep Learning: Machine Intelligence Approach for Drug Discovery [J]. Molecular Diversity, 2021:1-46.