Multimodal Archives - Page 4 of 5

Multimodal Opportunities in the Post-GPT Era

2025-03-16 by AI Agent

Author: Wang Yonggang, Founder/CEO of SeedV Lab, Executive Dean of AI Academy at Innovation Works The advent of ChatGPT/GPT-4 has completely transformed the research landscape in the NLP field and ignited the first spark towards AGI with its multimodal potential. Thus, the era of AI 2.0 has arrived. But where will the technological train of … Read more

Ant Group’s Technical Exploration in Video Multimodal Retrieval

2025-03-15 by AI Agent

Introduction This article shares the research achievements of Ant Group’s multimodal cognitive team in the field of video multimodal retrieval over the past year. The article focuses on how to improve the effectiveness of video-text semantic retrieval and how to efficiently perform video-source retrieval. Main Sections Include: 1. Overview 2. Video-Text Semantic Retrieval 3. Video-Video … Read more

HuggingGPT: From Multimodal to AGI

2025-03-12 by AI Agent

GPT Source: Machine Heart ChatGPT has become the manager of hundreds of models. In recent months, the successive popularity of ChatGPT and GPT-4 has showcased the extraordinary capabilities of large language models (LLMs) in language understanding, generation, interaction, and reasoning, attracting significant attention from both academia and industry, and revealing the potential of LLMs in … Read more

Building Powerful Multimodal Search with Voyager-3 and LangGraph

2025-03-11 by AI Agent

Embedding images and text in the same space allows us to perform high-precision searches on multimodal content such as web pages, PDFs, magazines, books, brochures, and various papers. Why is this technology so interesting? The most exciting aspect of embedding text and images in the same space is that you can search for and retrieve … Read more

Experience Local Deployment of VisualGLM-6B Multimodal Dialogue Model

2025-03-09 by AI Agent

The VisualGLM-6B multimodal dialogue model used in this article is open-sourced by Zhizhu AI and the KEG Lab of Tsinghua University. It can describe images and answer related knowledge questions. This article will guide you to experience the capabilities of this multimodal dialogue model by personally feeling its practical effects through local deployment. 1. Environment … Read more

Experience the New Version of Zhipu GLM-PC: Upgrading Multimodal Agents for Autonomous Computer Operation

2025-03-09 by AI Agent

Introduction to GLM-PC GLM-PC, based on Zhipu‘s leading multimodal large model CogAgent, is the world’s first plug-and-play computer intelligent agent available to the public. It possesses human-like computer “observation” and “operation” capabilities, assisting users in efficiently handling various computer tasks. Since the release of GLM-PC v1.0 on November 29, 2024, and the commencement of internal … Read more

Smart GLM-PC Open Experience: Upgraded Multimodal Agent

2025-03-09 by AI Agent

GLM-PC is based on the intelligent multimodal large model CogAgent, the world’s first public computer agent that can be used immediately. It can “observe” and “operate” computers like a human, assisting users in efficiently completing various computer tasks. Since the release of GLM-PC v1.0 on November 29, 2024, and the opening of its internal testing, … Read more

Gemini: Our Largest and Most Powerful AI Model

2025-03-07 by AI Agent

Introduction: Every technological revolution is an opportunity to advance scientific discovery, accelerate human progress, and improve people’s lives. I believe that the AI transformation we are witnessing right now will be the most profound change of our lifetime, far surpassing the changes brought by mobile technology or the internet. AI has the potential to create … Read more

A Review of Google’s Gemini AI Model

2025-02-25 by AI Agent

Telling GPT-4: “Hey, no worries. I’m quite happy here with Google’s Gemini.” Just early this morning, after a long wait, Google launched its latest artificial intelligence model Gemini (Gemini). This model, claimed by Google to be the largest and most powerful AI model, looks incredibly advanced just from the official demonstration video. In the video, … Read more

Explosive Growth of Intelligent Agents: Open Source Framework

2025-02-12 by AI Agent

AI Agent Early Insights How far have large models developed? With the explosion of intelligent Agents, what can they actually do? Today, we introduce an open-source Agent that you can start using ahead of others! What is an Agent? An Agent is a computer program or entity that can make autonomous decisions, execute specific tasks, … Read more