On May 10, 2024, an online and offline sharing event titled “AI’s Innovative Applications in Multimodal Database Construction” was held at the Tsinghua University MEM Center on Manufacturing Street in Zhongguancun. Despite the impending heavy rain, the event attracted more than twenty experts, scholars, and students from both China and abroad to participate offline, with nearly three hundred people watching online. They engaged in in-depth discussions surrounding the necessity of multimodal databases for linguistic research and the development of artificial intelligence, the construction of multimodal database platforms, and their applications in innovative simulations.


Multimodal expression methods include elements such as discourse, gestures, gaze, facial expressions, and emotions in human communication. Professor Li Ruofan from Beijing Language University began by explaining the advantages of multimodal corpora in exploring fine-grained linguistic patterns from the perspectives of cognitive states and the communicative nature of language. Multimodal corpora provide the possibility to explore and verify linguistic rules, and linguistic rules can also play a role in improving multimodal corpora.

Next, the guest speaker, CEO Chen Liangyu of Infinite Evolution (Beijing) Technology Co., Ltd., pointed out that linguistic theories based on multimodal data can enhance the interpretability of large language models and pre-set processing preferences for small language models. Accordingly, Chen Liangyu introduced the artificial intelligence multimodal data automatic analysis platform that his team is building. This platform can not only perform detailed transcription and annotation of audio data but also combine real-time or offline data extraction capabilities to conduct in-depth analysis of video data, identifying key multimodal information, which greatly improves the quality and efficiency of multimodal data processing. Furthermore, Chen Liangyu believes that the subtle multimodal cues in human communication may touch upon the boundaries of simulating human communication abilities. The linguistic rules verified through the multimodal data platform could help establish models for identifying and testing AIGC content.

During the Q&A session, attendees showed great interest in the information processing and recognition capabilities of the multimodal platform, actively asking questions and providing constructive feedback. Speaker Chen Liangyu addressed the questions one by one, engaging in deeper exchanges with the guests.
This MEM Think Tank event was unique and impressive, leaving attendees inspired and enriched through learning and communication.
The multimodal database has emerged with the development of new industries represented by artificial intelligence, assisting research in linguistics, language teaching, translation, and many other areas, while also focusing on its applications in simulation recognition, human-computer interaction, video game design, robotics development, virtual customer service, lie detection, authentication, medical diagnosis, and rehabilitation.
Text: Liu Zhiqiang
Photography: Zhang Henan
Review: Zhang Wei