Source: HyperAI
This article is about 2790 words, recommended reading time 4 minutes. This article is from Master Xianchao of the most powerful research temple, Longquan Temple, who has been researching the integration of artificial intelligence and ancient literature in recent years. Currently, he has led the “Tripitaka” team to achieve practical applications of AI in automatic punctuation, translation between classical and modern Chinese, and text recognition of ancient books. Located at the foot of Phoenix Ridge in the suburbs of Beijing, Longquan Temple is arguably the strongest Buddhist temple in terms of scientific research in the country and even the world. Thanks to a saying by Master Xuecheng,“Buddhism is ancient, but Buddhists are modern”, it has encouraged the monks at Longquan Temple to engage in scientific research, write code, combine Buddhist studies with new technologies, and popularize and internationalize projects. The results have been remarkable, frequently trending, and attracting continuous attention from the outside world. Recently, Master Xianchao participated in a domestic technology conference and shared the technical practices of using artificial intelligence to organize and proofread the “Tripitaka.”
The Birth of Buddhist AI: Making Buddhist Texts Easier to Read
Master Xianchao originally studied condensed matter physics at Peking University, graduating in 2007. He took refuge in Longquan Temple in 2008 and has since been dedicated to compiling the Longquan Tripitaka and studying Buddhist theories. In 2016, the historic event of AlphaGo defeating Lee Sedol caught Master Xianchao’s attention towards AI. Since then, he began to explore combining AI with the OCR technology and automatic punctuation he was researching.Master Xianchao presenting his research results at the Techo Park Developer Conference AI Originating from Buddhism Solving the Pain Points of Ancient Texts The Longquan Temple’s organization and proofreading of the “Tripitaka” is a collection of Buddhist classics, also known as the collection of all scriptures. Over the past two thousand years of Chinese Buddhism, various dynasties have translated, supplemented, and revised the “Tripitaka.” There are dozens of versions in circulation today, with the shortest having over five thousand characters and the longest containing over one hundred and twenty million characters.
More than 60 officials, scholars, and monks participated in the revision of the Qianlong edition of the Tripitaka, with over 860 craftsmen involved in engraving, printing, and binding, completed over six years (the image shows the woodblock of the Qianlong edition of the Tripitaka) In 2012, Longquan Temple began organizing the “Tripitaka,” planning to take a full ten years to complete. Traditional methods of organizing ancient texts mainly involve version comparison, proofreading, and punctuation, which can ensure contemporary readers understand the obscure and rare scriptures as much as possible. Three years later, Longquan Temple published “The Eight Great Sections of Nanshan”; the following year, the Tripitaka office was established to explore the use of artificial intelligence technology to develop a deep learning-based character recognition engine; in 2017, Longquan Temple established the Artificial Intelligence and Information Technology Center, developing an engine capable of recognizing various versions of the Tripitaka and successfully digitizing the Tripitaka version of “Sixty Huayan.” Master Xianchao currently serves as the director of the Tripitaka office, responsible for organizing the “Tripitaka.” Automatic Punctuation: OCR + Deep Learning To lower the threshold for reading ancient texts and improve scholars’ work efficiency, Master Xianchao’s team has recently utilized technologies including deep learning and OCR to change the interpretation of the traditional “Tripitaka,” achieving impressive results. In modern Chinese, there are nearly ten commonly used punctuation marks, while ancient Chinese has only a few punctuation marks, making it difficult to read. Master Xianchao explained that automatic punctuation refers to the technology that automatically annotates modern Chinese punctuation in ancient texts based on algorithms without human intervention, primarily to facilitate modern readers’ reading. Previously, there had been related research on AI adding punctuation to ancient texts, but Master Xianchao stated that it mainly just added periods to ancient texts, which he considered “conservative and academic.” His team applied deep learning to automatic punctuation, allowing for higher accuracy in adding periods, commas, question marks, exclamation marks, colons, semicolons, and pauses to ancient texts. After verification, the results of their developed Transformer annotation are “almost indistinguishable from human annotations.” RNN+LSTM+ResNet Comprehensive Improvement Automatic punctuation, in the field of NLP, is a simple sequence labeling problem. The standard method for solving such problems is using Recurrent Neural Networks (RNN). To enhance the performance of RNN, bidirectional RNN was developed, where the output at each moment depends not only on all previous inputs but also on subsequent inputs. Subsequently, Master Xianchao’s team introduced the LSTM method. However, the automatic punctuation achieved based on these technologies was still not very satisfactory. The reason Master Xianchao’s team achieved unexpectedly good results is that they introduced the ResNet residual network on top of previous methods.
The team published a paper in 2019 “Compilation of the Tripitaka: When AI Meets Buddhism,” introducing their automatic punctuation technology. Master Xianchao explained that previous neural networks had a maximum structure of only a dozen or twenty layers; if the number of layers increased, the training results would be difficult to converge. In contrast, residual networks can have hundreds or even thousands of layers. Deeper networks help capture deeper semantic information, which is key to their success. The team also attempted to use Convolutional Neural Networks (CNN), and the final result showed that the punctuation accuracy of residual networks was on average 20-30% higher than that of convolutional neural networks. How efficient is the AI automatic punctuation tool? Master Xianchao completed the punctuation of about 20,000 words of ancient texts in one day, generating an economic value of 300 yuan based on the typical remuneration level of 15 yuan per thousand characters. Even if the accuracy of automatic punctuation is calculated at only 60%, it still generates a value of 180 yuan per day.
The team is continuously upgrading this automatic punctuation tool and the latest generation has achieved an accuracy of 93.3% Currently, due to the training data primarily sourced from Buddhist texts, the automatic punctuation is more suitable for punctuating Buddhist classics. However, he stated that in the future, this technology will also be applied to the organization of ancient texts in other fields such as history and literature, freeing scholars from mechanical and repetitive labor. The future model for proofreading ancient texts is expected to change to: AI first segments and adds punctuation; professional scholars conduct later proofreading and modifications. Master Xianchao’s team opened this automatic punctuation online service in 2018; visit Guji Cool (http://gj.cool) to try it out and apply for free API access.
Recognition and Translation: AI as a Treasure Chest for Sinicizing Buddhist Texts
In addition to automatic punctuation, Master Xianchao has also applied AI to multiple aspects of ancient text research. Classical and Modern Alignment: Alignment & Translation Classical and modern alignment refers to the alignment and translation from classical Chinese to modern Chinese. To achieve AI classical and modern alignment, Master Xianchao first constructed a corpus for alignment and then designed an alignment algorithm, achieving good results. Using two independent indicators of similarity and difference, it is very easy to locate misaligned sentences.Translating the Tripitaka and separating sentences for alignmentFacilitates later retrieval and proofreading by humans Due to the numerous specialized terms in the “Tripitaka” and the complex corpus of translations over the ages, it is not easy to handle without expertise in classical literature. The total word count of the “Tripitaka” is in the hundreds of millions; relying solely on a few experts would be an enormous workload. Therefore, AI’s involvement has alleviated a significant amount of work for experts. OCR Based on Deep Learning, Recognizing Ancient Text Characters Currently, most OCR software is designed for printed text, so it cannot effectively recognize the fonts in ancient literature. Master Xianchao and his collaborative team developed a new OCR engine based on the CNN+LSTM+CTC framework. They trained it using a dataset of over 70,000 complete images and 1.68 million text line images from the “Tripitaka (Goryeo version).”
Precise Character Segmentation Based on Weak Supervision Learning Ultimately, their developed OCR method can perform single character recognition, single column recognition, and semi-automatic multi-column recognition, effectively completing the digitization of various ancient texts.
OCR Software Recognizing Ancient Texts and Digitizing Them
Technology and Buddhism: Different Externalizations with Compassion at Their Core
Buddhism and technology are not far apart. We have previously reported on the trend of integrating Buddhism and technology in the article “In This Century, Buddha Sent Robots to Promote Buddhism”, with the emergence of robots like Xianer, Machine Guanyin, and smart prayer beads, technology has already been deeply and harmoniously integrated into Buddhism.Outstanding Works Emerge from the Integration of Technology and Buddhism, Attracting Attention Another well-known monk from Longquan Temple, Master Xianxin, founder of the IT Zen Camp, was asked in an interview about the relationship between Buddhism and technology. He replied: “Technology seeks the truth of the material world. Buddhism seeks the truth of the inner world.” Many who explore science and technology initially do so with the intention of contributing to humanity, which aligns with the most compassionate pursuits proposed by Buddhism; this is the commonality between technology and Buddhism.“References:Master Xianchao’s WeChat Official Account: “The Collision and Integration of Artificial Intelligence and Chinese Civilization”2050 Yunqi Conference: “Master Xian: Technological Practices of Longquan Temple”Longquan Temple Automatic Punctuation Tool:http://gj.cool/gjcool/index——END——