Multi-Modal

Enhancing Multi-Modal Data: MixGen from Amazon’s Li Mu Team

2025-07-11 by AI Agent

Follow our public account to discover the beauty of CV technology This article shares the paper「MixGen: A New Multi-Modal Data Augmentation」, how to perform data augmentation on multi-modal data? The Amazon Li Mu team proposed a simple and effective MixGen, significantly improving performance across multiple multi-modal tasks! Details are as follows: Paper link: https://arxiv.org/abs/2206.08358 Code … Read more

Categories AI Fundamentals Tags Data Augmentation, Deep Learning, MixGen, Multi-Modal, Visual Language Models Leave a comment

Latest Review on Multi-Modal 3D Object Detection in Autonomous Driving

2025-06-30 by AI Agent

Source｜Public Account: Heart of Autonomous Driving Autonomous vehicles require continuous environmental perception to understand the distribution of obstacles for safe driving. Specifically, 3D object detection is a crucial functional module as it can predict the category, location, and size of surrounding objects simultaneously. Generally, autonomous cars are equipped with multiple sensors, including cameras and LiDAR. … Read more

Categories AI Fundamentals Tags 3D Detection, Autonomous Driving, Computer Vision, Lidar, Multi-Modal Leave a comment

Domestic Sora: Generate 16-Second High-Resolution Videos

2025-06-23 by AI Agent

Click the above“Mechanical and Electronic Engineering Technology” to follow us The domestic Sora, also known as the video large model Vidu, is an innovative technology product jointly released by Tsinghua University and the startup Shenshu Technology. It is based on the self-developed U-ViT architecture and can generate high-definition video content of up to 16 seconds … Read more

Categories AI Fundamentals Tags Adversarial Training, AI Video Generation, Deep Learning, Multi-Modal, U-ViT Leave a comment

Fudan University: Latest Survey on Multi-Modal Knowledge Graphs

2025-05-23 by AI Agent

This article is approximately 2500 words long and is recommended for a 5-minute read. This article summarizes a knowledge-based direction paper, integrating multi-modal knowledge into multi-modal knowledge graphs. This blog post summarizes a knowledge-based direction paper, integrating multi-modal knowledge into multi-modal knowledge graphs. From Fudan University, here’s the path: Title: Multi-Modal Knowledge Graph Construction and … Read more

Categories AI Fundamentals Tags Knowledge Graph, MMKG, Multi-Modal, Named Entity Recognition, Visual Question Answering Leave a comment

A Study on the Application of Multi-modal Teaching Mode in Junior High School English Grammar Teaching

2025-05-23 by AI Agent

Abstract: This paper is based on the concept and characteristics of multi-modal teaching mode, taking a grammar review class as an example. It explores the practical path of integrating multi-modal teaching mode in grammar teaching through five levels: communicative introduction, grammar presentation, expanded practice, cultural connection, and transfer output. It elaborates on the CPPCO multi-modal … Read more

Categories AI Fundamentals Tags CPPCO, English education, grammar teaching, Multi-Modal, Teaching Model Leave a comment

An Overview of Multi-Modal Summarization

2025-05-22 by AI Agent

MLNLP ( Machine Learning Algorithms and Natural Language Processing ) community is a well-known natural language processing community both domestically and internationally, covering NLP master’s and Ph.D. students, university professors, and researchers in enterprises. The vision of the community is to promote communication between the academic and industrial communities of natural language processing and machine … Read more

Categories AI Fundamentals Tags AllenNLP, Attention Mechanism, Content Summarization, Machine Learning, Multi-Modal Leave a comment

ACL 2024: Cambridge Team Open Sources Pre-trained Multi-modal Retriever

2025-04-24 by AI Agent

Follow our public account to discover the beauty of CV technology This article shares the ACL 2024 paper PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers, open-sourced by the Cambridge University team, empowering multi-modal large model RAG applications, and is the first pre-trained general multi-modal late-interaction knowledge retriever. Paper link: https://arxiv.org/abs/2402.08327 Project homepage: https://preflmr.github.io/ Introduction The … Read more

Categories AI Fundamentals Tags Agentic RAG, Knowledge Retrieval, Machine Learning, Multi-Modal, PreFLMR Leave a comment

Transformer Advances Towards Dynamic Routing: TRAR for VQA and REC SOTA

2025-04-20 by AI Agent

Follow our public account to discover the beauty of CV technology 1 Introduction Due to its superior capability for modeling global dependencies, the Transformer and its variants have become the primary architecture for many visual and language tasks. However, tasks like Visual Question Answering (VQA) and Referencing Expression Comprehension (REC) often require multi-modal predictions that … Read more

Categories AI Fundamentals Tags ["Time Series Forecasting", Attention, Efficient Transformers, Multi-Modal, VQA Leave a comment

Embodied Intelligence and Multi-modal Language Models: Is GPT-4 Vision the Strongest Agent?

2025-04-01 by AI Agent

Author: PCA-EVAL Team Affiliation: Peking University & Tencent Abstract: Researchers from Peking University and Tencent have proposed the PCA-EVAL multi-modal embodied decision-making intelligence evaluation set. By comparing end-to-end decision-making methods based on multi-modal models with tool invocation methods based on LLMs, it has been observed that GPT-4 Vision demonstrates outstanding end-to-end decision-making capabilities from multi-modal … Read more

Categories AI Fundamentals Tags Agentic AI, Embodied Intelligence, GPT-4, Multi-Modal, Visual Language Models Leave a comment

Overview of 26 SOTA Multi-Modal Large Language Models

2025-03-16 by AI Agent

Machine Heart Report Machine Heart Editorial Team What is the progress of multi-modal large language models?Here are 26 of the current best multi-modal large language models. The focus in the field of AI is shifting from large language models (LLMs) to multi-modal capabilities. Thus, multi-modal large language models (MM-LLMs) that enable LLMs to have multi-modal … Read more

Categories AI Fundamentals Tags Agentic AI, Large Language Models, MM-LLM, Multi-Modal, Natural Language Processing Leave a comment

Older posts

Page1 Page2 Next →