Overview of Best AI Agent Papers for 2024

This is an article published in the “Heroic Journey” column by the Hero.

2025 Issue No. 15 Total Issue No. 98

This article has a total of 6191 words and takes about 10 minutes to read.

Contact the Hero to join the group.

Stay updated with the latest insights and cutting-edge news from the global industry!

Overview of Best AI Agent Papers for 2024

(Please be sure to indicate your real name, industry, and position when adding. No advertising or sales.)

Hello, everyone!

Since the emergence of GenAI (Generative AI) driven by ChatGPT, the topic of artificial intelligence seems to be closer to our work and life than ever before.

Humans both create and develop AI while being influenced and changed by it. In this unprecedented era of great change, the best way for humans to survive in a world of co-creation and mutual influence with AI is to engage in high-dimensional thinking and listen to multiple perspectives.

Thus, the Hero will take you on a global tour to see what kind of digital intelligence stories are unfolding around the world.

Join the Hero in AI and keep pace with the Hero!

Overview of the Best AI Agent Papers for 2024

Best AI Agent Papers of 2024

The Hero’s Golden Sayings

The issues of digital transformation, consult the Hero without worries..

(Like and share this article to receive a copy of the latest industry digital transformation report, limited to the first 10 people.)

Original Link:

Hero Original + Juteq Official Website

Main Text:

AI agents have become one of the most popular technology trends of 2024, but they are still new and require significant improvements. This year, we have seen outstanding research and advancements in agents that even prompt us to rethink the overall concept.

From frameworks to surveys, we have categorized this year’s list as there are numerous novel research papers from top companies and universities.

Let’s take a look at this year’s collection of research papers that make AI agents more interesting:

Framework Papers

(Frameworks)

1. Microsoft’s Magentic-One

Magentic-One is an updated version of the Microsoft Autogen framework, designed to create a universal multi-agent system for solving open web and file-based tasks across various domains.

Main Features:

Multi-agent architecture for handling complex tasks
Capable of processing web-based and file-based inputs
Universal approach for cross-domain versatility

The system may employ a set of coordinated specialized agents, each focusing on different aspects of task resolution, such as information retrieval, reasoning, and output generation. We can infer that Magentic-One might use a hierarchical structure where a central coordinator manages task distribution among various specialized agents.

2. Agent-oriented Planning in Multi-Agent Systems

This framework introduces a new method for planning in multi-agent systems using a meta-agent architecture. The system aims to enhance coordination and decision-making among multiple AI agents.

Main Features:

Meta-agent architecture for supervised planning
Improved coordination among multiple agents
Enhanced decision-making capabilities

We can envision a system where the meta-agent supervises and coordinates the planning activities of individual agents. This meta-agent has a global view of tasks and can optimize the overall planning strategy by considering the strengths and limitations of each agent in the system.

3. Amazon’s KGLA

Amazon’s KGLA (Knowledge Graph-enhanced Agent) framework aims to improve knowledge retrieval across various domains. The system utilizes knowledge graphs to enhance the functionality of AI agents.

Main Features:

Integration of knowledge graphs with AI agents
Improved knowledge retrieval capabilities
Applicability across multiple domains

The KGLA architecture may consist of several key components:

Knowledge Graph: Structured representation of domain knowledge
Agent Interface: Allows agents to query and interact with the knowledge graph
Retrieval Mechanism: Efficiently extracts relevant information from the graph
Reasoning Module: Combines retrieved knowledge with agent functionality

This integration enables agents to access and utilize structured knowledge more effectively, potentially improving their performance in complex tasks requiring extensive domain knowledge.

4. Harvard University’s FINCON

FINCON is a multi-agent framework based on LLM developed by researchers at Harvard University, specifically designed for various financial tasks. It employs conversational verbal reinforcement to enhance agent performance.

Main Features:

Specialized in financial domain tasks
Multi-agent architecture
Conversational reinforcement learning

The FINCON architecture may include:

1. Multiple specialized agents: each focusing on different aspects of financial tasks

2. Conversational interface: allows agents to communicate and learn from each other

3. Language reinforcement learning module: provides feedback to improve agent performance

4. Task coordinator: manages the distribution and integration of sub-tasks

This framework focuses on financial tasks and its use of language reinforcement makes it particularly suitable for complex financial decision-making and analysis scenarios.

5. OmniParser for GUI-based AI Agents

OmniParser introduces a multi-agent approach specifically for UI navigation in GUI-based AI agents. The system aims to enhance the interaction capabilities of AI agents with graphical user interfaces.

Main Features:

Specialized for GUI navigation
Multi-agent system for parsing visual elements
Enhanced interaction of AI agents with graphical interfaces

The OmniParser architecture may include:

Visual Element Detector: identifies UI components in images
Semantic Interpreter: understands the function and context of detected elements
Navigation Planner: determines the optimal path for GUI interactions
Action Executor: performs planned interactions with the GUI

This system focuses on visual parsing and GUI navigation, making it particularly valuable for tasks involving automated software testing, user interface analysis, and the development of more intuitive AI assistants.

6. IBM’s AutoRestTest

AutoRestTest is a framework developed by IBM for testing REST APIs using multi-agent and semantic graph approaches. The system aims to improve the efficiency and effectiveness of the API testing process.

Main Features:

Specialized for REST API testing
Utilizes a multi-agent approach
Combines semantic graphs for improved understanding

The AutoRestTest architecture may include:

API Parser: extracts API specifications and structures
Semantic Graph Generator: creates graphical representations of API relationships
Test Case Generator: designs test scenarios based on the semantic graph
Multi-agent Executor: runs tests using multiple dedicated agents
Results Analyzer: interprets test results and identifies issues

This framework employs semantic graphs and multi-agent execution, allowing for more comprehensive and intelligent API testing, potentially uncovering issues that traditional testing methods may overlook.

7. Microsoft’s AIOps

AIOpsLab is a comprehensive framework developed by Microsoft for designing, developing, and evaluating autonomous AIOps (AI for IT Operations) agents. The system aims to advance the field of AI-driven IT operations.

Main Features:

Holistic approach to AIOps agent development
Supports design, implementation, and evaluation phases
Focus on autonomous operations in IT environments

The AIOpsLab architecture may include:

Agent Development Environment: tools for creating and training AIOps agents
Simulated IT Infrastructure: replicates real IT environments for testing
Scenario Generator: creates diverse operational challenges
Performance Evaluation Module: assesses agent effectiveness
Feedback Loop: allows for iterative improvements to agents

This framework provides a standardized platform for advancing the AIOps field, potentially achieving more efficient and reliable IT operations management.

8. Alibaba’s Graph Reader

Graph Reader is a framework proposed by Alibaba aimed at enhancing the long-context capabilities of LLMs using graph-based agents. This approach aims to improve the performance of LLMs in tasks requiring a broad understanding of context.

Main Features:

Graph-based long text representation
Agent-driven graph exploration
Enhanced long-context understanding for LLMs

The Graph Reader architecture includes:

Text-to-Graph Converter: transforms long texts into graphical structures
Graph Explorer Agent: navigates the graph to extract relevant information
LLM Interface: integrates graph-based knowledge with LLM functionality
Response Generator: generates responses based on graph exploration and LLM processing

This innovative approach enables LLMs to effectively handle longer contexts by leveraging the structure of graphs and targeted exploration, potentially overcoming the limitations of traditional sequential processing.

9. DynaSaur by Adobe and the University of Maryland

DynaSaur is a framework for LLM agents that can dynamically create and write operations online. The system aims to enhance the flexibility and adaptability of AI agents in various tasks.

Main Features:

Dynamic action creation and combination
Online learning and adaptation
Higher flexibility for LLM agents

While specific architecture details are not provided, we can infer that DynaSaur may include:

Action Generator: creates new operations based on task requirements
Combination Module: combines actions to form complex behaviors
Online Learning Component: adjusts agent behavior in real-time
Task Analyzer: determines appropriate actions for given situations

This framework allows for the dynamic generation and combination of actions, enabling the use of more general and adaptive AI agents, potentially improving performance across various tasks and environments.

10. ShowUI: Microsoft GUI Visualization Agent Visual-Language-Action Model

ShowUI is a visual model developed by Microsoft researchers to improve UI element recognition in GUI-based AI agents. The system aims to enhance the interaction capabilities of AI agents with graphical user interfaces.

Main Features:

Specialized for GUI element recognition
Integration of visual, language, and action modeling
Improved performance of GUI-based AI agents

The ShowUI architecture may include:

Visual Encoder: processes GUI images to recognize elements
Language Model: interprets textual information in the GUI
Action Predictor: determines appropriate interactions with GUI elements
Multi-modal Fusion Module: integrates visual, textual, and action information

This model focuses on GUI element recognition and interaction, potentially significantly improving the performance of AI agents in tasks involving software testing, user interface analysis, and automated GUI navigation.

11. Automated Design of Agent Systems at Columbia University

This framework focuses on the automated invention of novel components (building blocks) and their combination to create innovative agents. It aims to advance the field of AI agent design by introducing automation and creativity into the process.

Main Features:

Automated invention of agent components
Innovative combination of components
Creation of innovative AI agents

Although specific architecture details are not provided, we can envision a system that includes:

1. Component Generator: creates new agent building blocks

2. Compatibility Analyzer: determines how components can be combined

3. Agent Assembly: builds agents from compatible components

4. Performance Evaluator: assesses the effectiveness of created agents

5. Evolutionary Optimizer: iteratively improves agent design

This automated approach to agent design may lead to the discovery of novel and efficient AI architectures, potentially pushing the field beyond human-designed systems.

Experimentation and Analysis Papers

(Experimentation & Analysis)

1. Can Graph Learning Improve Planning for LLM-based Agents?

Microsoft’s research demonstrates how graph learning can enhance the planning capabilities of LLM-based agents, especially when using GPT-4 as the core model. This groundbreaking study provides empirical evidence for integrating graph structures into agent planning systems.

Main Features:

High-level graph learning integration with LLMs
GPT-4 core model optimization
Enhanced planning capabilities for AI agents

The architecture includes:

Graph Learning Module: processes and analyzes graph structures
Planning Optimizer: enhances decision-making
GPT-4 Integration Layer: connects graph learning with language models
Performance Analysis System: measures and validates improvements

This research significantly advances the field of AI planning systems by showcasing the practical advantages of graph learning integration.

2. Thousand-Individual Generative Agent Simulation – Stanford University and Google DeepMind

The collaboration between Stanford University and Google DeepMind achieved significant results in simulating 1000 different individuals using only two hours of audio data.

Main Features:

Large-scale behavior simulation
Efficient audio data processing
Advanced generative modeling

The architecture includes:

Audio Processing Engine: analyzes and extracts behavior patterns
Simulation Generator: creates individual behavior models
Scaling Module: manages large-scale simulations
Validation System: ensures the accuracy of simulated behaviors

This breakthrough opens new possibilities for large-scale behavioral modeling and simulation.

3. ByteDance’s Bug Fix Analysis

ByteDance conducted comprehensive testing to identify the most effective LLMs for automated bug fixing, providing valuable insights for implementing agent-based code repair systems.

Main Features:

Automated bug detection and analysis
LLM performance comparison framework
Real-time code fixing capabilities

The architecture includes:

Defect Detection Engine: identifies and categorizes code issues

LLM Integration Layer: connects multiple language models

Code Analysis Module: evaluates repair suggestions

Performance Monitoring System: tracks repair success rates

This research significantly advances the processes of automated code maintenance and quality assurance.

4. Google DeepMind’s Improved Multi-Agent Debate System with Sparse Communication Topology

Research on a multi-agent debate system with sparse communication topology shows improved performance despite limited information sharing.

Main Features:

Sparse communication optimization
Enhanced agent debate protocols
Efficient information sharing mechanisms

The architecture includes:

Communication Topology Manager: optimizes information flow
Debate Protocol Engine: manages agent interactions
Information Sharing Module: controls data exchange
Performance Analysis System: measures communication efficiency

This breakthrough enhances our understanding of efficient multi-agent communication systems.

5. Improving AI Agents through Symbolic Learning

A comprehensive study of the progress and challenges of LLM-based multi-agent systems, focusing on problem-solving and world simulation applications.

Main Features:

Extensive analysis of current LLM-MA systems
Problem-solving capability assessment
World simulation application review

The architecture includes:

Analysis Framework: evaluates system capabilities
Comparison Engine: assesses different approaches
Challenge Identification System: maps current limitations
Future Direction Mapper: outlines project development paths

This survey provides important insights for the future development of LLM-based multi-agent systems.

Survey Papers

(Surveys)

1. Survey on LLM-based Multi-Agent Systems

A comprehensive study of the progress and challenges of LLM-MA systems, focusing on applications in problem-solving and world simulation scenarios.

Main Features:

Comprehensive analysis of LLM-MA systems
Progress tracking and challenge identification
Application-centered evaluation framework

The architecture includes:

System Analysis Framework: evaluates current LLM-MA implementations
Challenge Mapping Module: identifies key obstacles and limitations
Application Evaluation Engine: reviews practical applications
Future Direction Predictor: outlines project development trajectories

This survey provides important insights for advancing the development of LLM-based multi-agent systems.

2. Survey of LLM-brained GUI Agents

A broad analysis of the evolution and complexity of GUI-based agents across various domains, highlighting key developments and challenges.

Main Features:

Historical evolution analysis
Cross-domain complexity assessment
Implementation pattern evaluation

The architecture includes:

1. Evolution Tracker: maps development progress

2. Complexity Analysis Engine: assesses implementation challenges

3. Domain Comparison Module: evaluates cross-domain applications

4. Pattern Recognition System: identifies successful implementations

This survey greatly aids in understanding the development patterns and future directions of GUI agents.

3. Dawn of GUI Agents: Case Study of Sonnet 3.5

A comprehensive analysis of the application capabilities of Anthropic computers across multiple domains, providing practical insights into real-world applications.

Main Features:

Multi-domain usability testing
Performance metric analysis
Real application evaluation

The architecture includes:

1. Usability Testing Framework: evaluates interface interactions

2. Performance Measurement System: tracks success metrics

3. Domain Adaptation Module: assesses cross-domain capabilities

4. Implementation Guidelines: provides practical deployment insights

This case study provides valuable insights for practical GUI agent implementation.

4. CSIRO’s AgentOps Taxonomy

A systematic classification of AI Agent operations, providing standardized terminology and operational frameworks.

Main Features:

Standardized terminology framework
Operational classification system
Implementation guidelines

The architecture includes:

1. Terminology Database: maintains standardized definitions

2. Classification Engine: organizes business categories

3. Relationship Mapper: links related concepts

4. Implementation Framework: guides practical applications

This taxonomy establishes critical standards for the development and deployment of AI agents.

5. OpenAI Governance Practices for AI Systems

A comprehensive framework outlining seven key principles for implementing safe and responsible AI agent systems in business environments.

Main Features:

Safety-first approach
Accountability framework
Business implementation guidelines

The architecture includes:

1. Safety Assessment Module: evaluates potential risks

2. Accountability Framework: ensures responsible deployment

3. Implementation Guidelines: provides practical steps

4. Monitoring System: tracks compliance and performance

These guidelines establish fundamental standards for the responsible deployment of AI agents.

Benchmarks for AI Agents

(Benchmarks for AI Agents)

1. Partner: benchmarks for planning and reasoning in multi-agent tasks.

A comprehensive evaluation framework for assessing planning and reasoning capabilities in multi-agent systems, focusing on human-agent coordination.

Main Features:

Human-agent coordination metrics
Planning capability assessment
Household activity simulations

The architecture includes:

1. Task Generation Engine: creates testing scenarios

2. Coordination Assessment Module: evaluates interaction quality

3. Performance Metrics System: measures success rates

4. Analysis Framework: provides detailed insights

This benchmark provides valuable metrics for improving human-agent coordination systems.

2. Salesforce’s CRM Arena

A complex benchmarking system for evaluating AI agents in customer support scenarios using a real sandbox approach.

Main Features:

Real scenario simulation
Customer support focus
Sandbox testing environment

The architecture includes:

1. Scenario Generator: creates real customer cases

2. Response Evaluation System: assesses agent performance

3. Metrics Analysis Engine: measures effectiveness

4. Integration Testing Module: verifies CRM compatibility

This benchmark advances the evaluation of AI agents in customer service applications.

The above AI agent papers represent significant advancements in addressing challenges related to long-context processing, multi-agent systems, and automated AI design. By leveraging techniques such as graph-based representations, dynamic action combinations, and automated component generation, these methods are pushing the boundaries of AI agents and large language models.

2024 is just the beginning of the future landscape of AI agents. As agents become more complex and versatile, they will require less human intervention, and we believe that what will happen in 2025 will be very exciting.

(Due to space limitations, the article has been abridged. Please refer to the end of the article for access to the full text link.)

Hero’s Note:

The future is an era where humans dance with agents.

The future is an era of achieving a “human-centered” production and lifestyle that strives for excellence.

The future is on its way.

If you are interested in the viewpoints mentioned in the article, wish to learn and apply them in your enterprise, and want to further learn on the path from digitalization to intelligence transformation, establishing a foundational consensus for individuals and enterprises in the digital intelligence era, and building business leadership for individuals and organizations, please scan the QR code below to get information about the “Digital Transformation and Innovation Management VeriSM” international certification course and other related course information.

The Hero has a well-known examination service organization in the IT field, EXIN (International Information Science Examination Association) and the internationally leading organization in AI, BCS (British Computer Society) jointly released the AI instructor certification. Based on the comprehensive industry digital transformation training and consulting experience, the Hero has collaborated with EXIN to create a newly upgraded AI course – “EXIN BCS AI Essentials“, if you wish to further understand the essence of AI’s business value and engage with the specific competitive strategies and innovative business practices of top AI companies globally, welcome to enroll, directly scan the QR code at the beginning of the article and leave a message to the Hero on WeChat!

This course is particularly suitable for business management, project management, and functional management personnel with a non-IT technical background. If you are not interested in the market’s tool-based courses aimed at the general public that use GenAI to create images and write PPTs, or algorithm-based courses aimed at technical personnel, but wish to truly grasp the business value of AI and understand how to leverage AI to empower business outcomes from the perspective of enterprises, organizations, and business, then this course is very suitable for you, and it may be the only course on the market positioned this way. We hope you become a learning pioneer in this direction!

We welcome you to like, appreciate, and share this article with one click, and you will receive the collection of AI agent papers mentioned in the text. The Hero has specially organized it for you! Limited spots available, act fast!

Alright! That’s all for this issue! See you next time!

Heroic Journey Column: Professional perspective, popular language, interpreting the technological dynamics in the Jianghu.

Leave a Comment Cancel reply