How Large Models Understand Video? A Review of MM-LLMs in Long Video Comprehension

How Large Models Understand Video? A Review of MM-LLMs in Long Video Comprehension

MLNLP community is a well-known machine learning and natural language processing community in China and abroad, covering NLP master’s and doctoral students, university teachers, and corporate researchers. The Vision of the Community is to promote communication and progress between the academic and industrial fields of natural language processing and machine learning, especially for beginners. Reprinted … Read more

Research Progress on Multimodal Large Language Models

Research Progress on Multimodal Large Language Models

About 3800 words, recommended reading time is 7 minutes. This article provides a comprehensive overview of MM-LLMs. 1. Introduction Multimodal large language models (MM-LLMs) have made significant progress over the past year by optimizing modality alignment and human intent alignment, enhancing existing unimodal foundational models (LLMs) to support various MM tasks. This article provides a … Read more