Artificial Intelligence (AI) has many definitions. One definition of AI is: “Artificial intelligence is a science and a set of computational techniques inspired by how humans use their nervous systems and bodily perceptions to learn, reason, and take action, but is often fundamentally different from these techniques. AI can be created as software or tools capable of mimicking human intelligence in certain situations, and even surpassing human intelligence in others.
Machine Learning (ML) and Deep Learning (DL) are frequently mentioned research fields in the context of artificial intelligence. Both forms of learning are subfields of AI. Machine learning is a process through which machines can learn from a given dataset without being explicitly programmed on what to learn.
Machines typically learn in either supervised or unsupervised ways. In supervised learning, scientists provide separate training and testing datasets for the machines. When machines learn in an unsupervised manner, they are considered to be “deep” learning. Deep learning is a relatively modern technique used to achieve machine learning. Deep learning algorithms take datasets and find patterns and key information by mimicking the interactions between neurons in the human brain. These algorithms are artificial neural networks—a computational system that simulates the brain’s ability to weigh the importance of certain data against others and process biases.
As of 2024, it has been 23 years since the completion of the milestone human genome sequence draft. This milestone has led to the generation of vast amounts of genomic data. It is estimated that genomic research will produce 2 to 40 EB of data in the next decade. DNA sequencing and other biotechnologies will continue to increase the number and complexity of such datasets. This is why genomic researchers need computational tools based on AI or machine learning that can process, extract, and interpret the valuable information hidden in this vast data repository.
Although the use of AI and machine learning tools in genomics is still in its early stages, researchers have developed many AI and machine learning-based programs to assist scientific research and medical practice. Key applications include: using facial analysis AI programs to examine faces for accurate identification of genetic diseases [1], using machine learning techniques to identify primary cancers from liquid biopsies [2], and using machine learning to compare pathogenic genomic variations with benign variations [3]. The laboratory at the Third Affiliated Hospital of Zhengzhou University utilizes machine learning to predict the association between genotype and phenotype in phenylketonuria (named the PPML model), which can predict the severity of this disease based on the genotype of alleles. Phenylketonuria is an autosomal recessive hereditary metabolic disease. PPML classifies PKU phenotypes into three categories: CPKU, MPKU, and HPA, through a training case database. PPML provides a powerful analytical tool for clinical analysis and inference of PAH mutation identification. This predictive model can be accessed at http://www.bioinfogenetics.info/PPML/. In this study, Assistant Researcher Fang Yang from the Third Affiliated Hospital of Zhengzhou University was the first author, and Professor Zhang Linlin was the corresponding author of the paper [4].
(PPML: Using Machine Learning to Predict the Association Between Genotype and Phenotype in Phenylketonuria)
By using machine learning models, the occurrence of specific diseases can be predicted, such as predicting the risk of type 2 diabetes through big data analysis [5]. The laboratory at the Third Affiliated Hospital of Zhengzhou University developed the “GDMPredictor” algorithm based on machine learning, primarily used to predict the risk of gestational diabetes mellitus (GDM) in pregnant women (http://www.bioinfogenetics.info/GDM/). This model utilizes the random forest algorithm in machine learning to assess the risk of GDM by analyzing clinical and biochemical factors. Users can input information including pregnancy history, endocrine disease status, and biochemical indicators, such as early pregnancy reactions, intrahepatic cholestasis, thyroid disease, eclampsia, twin pregnancies, gestational days, age, and body mass index. At the same time, biochemical test indicators such as α1-microglobulin, β2-microglobulin, cysteine C, bicarbonate binding capacity, fasting blood sugar, and serum creatinine can also be input. The “GDMPredictor” aims to improve pregnancy outcomes through early intervention and provide personalized risk assessment services. In this study, Chief Technician Xing Jinfang from the Third Affiliated Hospital of Zhengzhou University was the first author, and Professors Yuan Enwu, Zhang Linlin, and Assistant Researcher Fang Yang were co-corresponding authors of the paper [6].
(GDMPredictor: Gene Machine Learning for Predicting the Risk of Gestational Diabetes in Early Pregnancy)
References:
[1] Gurovich, Y., Hanani, Y., Bar, O. et al. Identifying facial phenotypes of genetic disorders using deep learning. Nat Med 25, 60–64 (2019). https://doi.org/10.1038/s41591-018-0279-0
[2] Cristiano, S., Leal, A., Phallen, J. et al. Genome-wide cell-free DNA fragmentation in patients with cancer. Nature 570, 385–389 (2019). https://doi.org/10.1038/s41586-019-1272-6
[3] Sundaram L, Gao H, Padigepati SR, McRae JF, Li Y, Kosmicki JA, Fritzilas N, Hakenberg J, Dutta A, Shon J, Xu J, Batzoglou S, Li X, Farh KK. Predicting the clinical impact of human mutation with deep neural networks. Nat Genet. 2018 Aug;50(8):1161-1170. doi: 10.1038/s41588-018-0167-z. Epub 2018 Jul 23. Erratum in: Nat Genet. 2019 Feb;51(2):364. doi: 10.1038/s41588-018-0329-z.
[4] Fang, Yang, et al. “Allelic phenotype prediction of phenylketonuria based on the machine learning method.” Human Genomics 17.1 (2023): 34.
[5] Brinati, D., Ronzio, L., Cabitza, F., Banfi, G. (2021). Artificial Intelligence in Laboratory Medicine. In: Lidströmer, N., Ashrafian, H. (eds) Artificial Intelligence in Medicine. Springer, Cham. https://doi.org/10.1007/978-3-030-58080-3_312-1
[6] Xing, J., et al. “Enhancing gestational diabetes mellitus risk assessment and treatment through GDMPredictor: a machine learning approach.” Journal of Endocrinological Investigation (2024): 1-10.
Author: Fang Yang
Editor: Song Liying
Initial Review: Xu Hao
Final Review: Zhang Linlin, Yuan Enwu