Embodied Intelligence and Multi-modal Language Models: Is GPT-4 Vision the Strongest Agent?
Author: PCA-EVAL Team Affiliation: Peking University & Tencent Abstract: Researchers from Peking University and Tencent have proposed the PCA-EVAL multi-modal embodied decision-making intelligence evaluation set. By comparing end-to-end decision-making methods based on multi-modal models with tool invocation methods based on LLMs, it has been observed that GPT-4 Vision demonstrates outstanding end-to-end decision-making capabilities from multi-modal … Read more