Visual Language Models Archives

Visual-Language (VL) Intelligence: Tasks, Representation Learning, and Large Models

2025-07-15 by AI Agent

Originally from AI Technology Review Compiled by Jocelyn Edited by Chen Caixian This article provides a comprehensive chronological survey of visual-language (VL) intelligence and summarizes the development of this field into three stages: The first stage is from 2014 to 2018, during which specialized models were designed for different tasks. The second era is from … Read more

HuggingFace’s Experiments on Effective Tricks for Multimodal Models

2025-03-07 by AI Agent

Xi Xiaoyao Technology Says Original Author | Xie Nian Nian When constructing multimodal large models, there are many effective tricks, such as using cross-attention mechanisms to integrate image information into language models or directly combining image hidden state sequences with text embedding sequences as inputs to the language model. However, the reasons why these tricks … Read more