QAV: Principles and Prospects of Agent Collaboration Based on Large Language Models

Author: Zhang Jialin





This article is approximately 6000 words long, with a suggested reading time of 12 minutes.
Human beings need to weigh the pros and cons and adopt appropriate regulatory and control measures to ensure the continuous, safe, and sustainable development of artificial intelligence.

About seventy thousand years ago, a genetic mutation endowed humans with advanced language abilities, allowing for richer and more specific communication. Based on language, humans can create a shared imagination through fictional stories, enabling greater trust and cooperation across larger groups to achieve larger goals. This capability is unique to humans. However, the astonishing “language abilities” exhibited by agents based on large language models (LLMs), such as GPT-4, signify that this unique ability is no longer exclusive to humans.

Sam Altman, the “father of ChatGPT,” stated on social media that a new Moore’s Law may soon emerge, whereby the number of agents in the universe doubles every 18 months. If each independent agent possesses language abilities similar to GPT-4 or higher, and can autonomously engage in prompt learning, communication, and specific instruction exchanges, achieving human-like language communication, then a natural inference is that this language-based agent collaboration will become inevitable.

As the number of agents continues to grow, evolving into large-scale agent collaboration, it could revolutionize many fields such as scientific research, engineering design, and economic activities. Undoubtedly, this transformation will profoundly change people’s work and lifestyles. However, just as human civilization was gradually established through fictional stories, whether large-scale agent collaboration will create a kind of “agent civilization” and what impacts this new type of civilization will have on human society are urgent research topics worth exploring.

This article first constructs a simple analytical model QAV for agent collaboration based on large language models and then briefly explains the basic principles of agent collaboration using this model, analyzing the main characteristics and performance of agent collaboration. It also provides an example using GPT-4 and LLaMA as large language models to explore their application prospects.

1. Basic Principles of Agent Collaboration Based on Large Language Models

An agent is an entity with perception, reasoning, decision-making, and action capabilities, able to autonomously collect information, analyze data, and take action to achieve goals. Agents can include software programs, robots, and AI systems.

Agent collaboration refers to the work and interaction process among multiple agents to achieve common goals. During collaboration, agents need to share data, resources, and knowledge, coordinating actions to efficiently complete tasks. Agent collaboration has applications in many fields, such as autonomous vehicles, drone formations, smart home systems, and complex AI systems. Effective agent collaboration requires addressing key issues such as communication, resource allocation, decision-making, and collaborative control.

Collaboration based on large language models differs from previous protocol (rule)-based agent collaboration as a new collaboration paradigm. It refers to agents loaded with large language models that can autonomously engage in language-based communication with other agents. The meta-operations (mo) for communication between agents mainly include three types: prompts, instructions, and completions.

Prompt (P: prompt) is the text provided as input to the agent to trigger the agent to produce output. Typically, a prompt can be a question, request, or other forms of text information aimed at guiding the agent to understand the user’s needs and respond accordingly.

Instruction (I: instruction) is a specific type of prompt used to explicitly tell the agent what task it is expected to perform. Instructions are usually descriptive and contain detailed information about how to answer questions or complete tasks. By using clear instructions in prompts, agents can generate more accurate and relevant outputs.

Completion (C: completion) refers to the output text generated by the agent based on the given prompt and the output generated from the prompt. These completions aim to meet user needs, such as answering questions, writing articles, or coding. The quality and relevance of completions depend on how well the agent’s loaded model understands and processes prompts and the knowledge learned during training.

Using only these three meta-operations, agents still cannot build collaboration. Trust is the premise and foundation of collaboration; agent collaboration requires mutual trust. A triangular computational trust model for agents can be constructed:

In this triangular relationship, we have three agents: Q (questioner), A (answerer), and V (validator). Q is responsible for generating prompts, A is responsible for completions, and V is responsible for verifying whether A’s completions are accurate, meet predetermined rules, or are verifiable (e.g., whether the program can execute in a sandbox). Each agent possesses the three meta-operations (P, I, C). We call this model the QAV model.

First, Q submits instructions and prompts to A. Then, A generates a completion based on the received prompt and instruction. A submits the generated completion to V for verification, rather than directly sending the completion to Q.

V’s task is to verify the completion provided by A to ensure it meets certain rules, standards, or verifiability. V may refer to knowledge bases or other reliable external resources to verify A’s completion. If V deems A’s completion acceptable, V will send the completion to Q. If the completion is found to be inaccurate or inappropriate, V can instruct A to provide a modified completion or offer alternative completions until the requirements are met. Upon receiving A’s completion confirmed by V, Q can continue to submit new prompts based on the obtained completion. This process continues in a loop, forming a cycle of ongoing interaction among the three agents.

A simple example of how a QAV operates is as follows:

First, let Q (the questioner) pose a task question about “building a website.” Next, A (the answerer) and V (the validator) will perform corresponding operations according to the requirements of the model.Q: How to build a website?A: To build a website, you need to follow these steps:

Select a domain name
Register the domain name
Purchase website hosting services
Design the website layout and structure
Create website content
Optimize the website’s SEO
Publish the website and maintain it
Submit to V for verification.

V: The response includes the basic steps needed to build a website. The response is acceptable. Send the response to Q.After receiving the response, Q may continue to ask specific questions, such as:Q: How to choose a good domain name?A: Choosing a good domain name requires considering the following points:Short and concise: The domain name should be short and easy to remember, facilitating user input and sharing.Easy to spell: Avoid using easily misspelled words.Relevance: The domain name should relate to the website’s theme or brand.Use appropriate domain suffixes: For example, .com, .org, .net, etc., choose the appropriate suffix based on the nature of the website.Submit to V for verification.V: The response covers the main aspects of how to choose a good domain name. The response is acceptable. Send the response to Q.…

Since large language models can generate almost unlimited output, even this simple collaborative network can continue, thus achieving specific tasks. The factors limiting these agents’ collaboration are mainly computation power, storage resources, and energy.

Based on the triangular computational trust model, since each completion has been verified, the overall stability and reliability are improved. As a result, agents can achieve good collaboration capabilities.

If the QAV model is not adopted, but rather allows two agents, one as Q (the questioner) and the other as A (the answerer) to communicate directly without human intervention, the following situations may occur:

Autonomous Interaction: The two agents will autonomously interact with each other, with Q posing questions or instructions to A, and A responding to questions or executing instructions based on its training and knowledge.

Content Diffusion: Without human intervention, the interactive content between the two agents may continuously diffuse, involving various topics and fields. This interaction may lead to interesting and unexpected results, but it may also result in irrelevant or meaningless conversations.

Possible Incorrect Answers: Without human intervention, A may provide incorrect answers or inaccurate information. This is because agents respond solely based on their pre-trained knowledge bases and patterns, which may have biases or limitations.

Lack of Goal Orientation: Without human intervention, the interaction between the two agents may lack clear goals. This means their conversations may have no practical application value and may not solve real-world problems.

Possible Unethical or Unsafe Content: The interaction between the two agents may involve unethical, unsafe, or illegal content. This is because there is no human intervention to supervise and limit their behavior, leading them to potentially produce outputs that do not meet ethical and safety standards.

The main features of the QAV collaboration model are:

1) Clear Division of Labor: By assigning questions, answers, and verification to different agents, this model allows each agent to focus on its tasks, thereby improving efficiency.

2) Cyclical Interaction: A continuous interactive cycle is formed among the three agents. This cyclical process allows agents to gradually optimize solutions until they meet predetermined rules and standards.

3) Scalability: This model can be applied to various scenarios, such as automated programming, intelligent question-and-answer systems, and knowledge graph construction. Additionally, the model can be expanded to include more agents to solve more complex problems.

The main performance analysis of this collaborative model includes:

1) Effectiveness: Achieved through the continuous interaction and verification of the three agents {Q-A-V}.

2) Adaptability: Can adapt to different scenarios and tasks by adjusting the interaction strategies among the three agents. This flexibility enables the model to handle problems of varying complexity.

3) Maintainability: By separating questions, answers, and verification, this model reduces the complexity of individual agents to some extent. This makes it easier to update, upgrade, or maintain the agents.

Simulations conducted using the above model indicate that language-based agent collaboration networks can be applied to a wide range of real-world scenarios and tasks. However, despite the many advantages of this model, it also faces challenges, such as considering computation power, storage resources, and energy constraints in practical applications.

Expanding this basic collaboration model forms an agent collaboration network. Since different agents load different large language models, the collaboration network can be divided into homogenous and heterogeneous types. For example, agent Q uses the GPT-4 model, while A uses the LLaMA model. Typically, homogeneous models are used within a cluster of agents, while heterogeneous modes are adopted between clusters.

Although large-scale experiments have not yet been conducted, it is foreseeable that the main issues of collaboration between homogeneous and heterogeneous networks will require significant interaction, similar to communication between humans with different languages and cultures.

2. A New Paradigm of Agent Collaboration

Based on whether a large language model is loaded, agents can be divided into two categories: agents loaded with large language models (llmAI) and agents not loaded with large language models (nllmAI). There are significant differences in the collaboration modes of these two categories of agents.

The collaboration modes of nllmAI agents can be classified based on task type, communication methods, and collaboration levels. The main collaboration modes include:

1) Centralized Collaboration: In this mode, a central agent controls and coordinates the behaviors of other agents. The central agent typically has global information and decision-making capabilities, while other agents execute tasks according to the central agent’s instructions.

2) Distributed Collaboration: In the distributed collaboration mode, each agent independently executes tasks and collaborates when necessary. In this mode, there is no explicit central control structure among agents, but collaboration is achieved through local information and mutual negotiation.

3) Hierarchical Collaboration: The hierarchical collaboration model combines the advantages of centralized and distributed collaboration, with different levels of agents collaborating. In this mode, high-level agents are responsible for global planning and decision-making, while low-level agents execute specific tasks. Collaboration is achieved through information transmission and negotiation between high-level and low-level agents.

4) Competitive Collaboration: In the competitive collaboration mode, there is competition among agents, but they also need to collaborate in certain situations to achieve common goals. In this mode, agents need to balance competition and collaboration, flexibly adjusting strategies based on task requirements.

The basic mode of collaboration for llmAI agents is the QVA model. In the QAV collaboration model, when the completion generated by answerer A is submitted to validator V for verification, the V agent actually needs the collaboration of other non-language model agents to better complete the verification task. For example, the V agent may issue commands to a software sandbox to verify the effect of the code submitted by A for building a website. Thus, it can be seen that collaboration between nllmAI and llmAI agents represents a new paradigm of agent collaboration. A schematic diagram of the basic principles is as follows:

Under the new paradigm, the specific implementations of collaboration between llmAI and nllmAI agents are diverse. For example, in the QVA model, the V agent may need to rely on other nllmAI agents (such as software sandboxes, simulation devices) to complete verification tasks. In this paradigm, various types of agents can flexibly combine and collaborate based on their strengths and task requirements, further expanding the application potential of AI in various fields.

Using GitHub Copilot as an example, we illustrate a specific process of this collaboration paradigm:

First, the llmAI is responsible for understanding requirements, generating code drafts, and resolving conceptual issues during development. The requester (Q) can submit a detailed programming requirement to llmAI, such as “developing a Python-based web application that implements user login and data storage functions.” The llmAI will parse the requirements and generate an initial code draft.

Next, nllmAI agents will participate in the actual development and verification of the code. These agents may include compilers, testing frameworks, and software sandboxes. During this process, the code draft generated by the llmAI will be passed to nllmAI, which will compile, test, and debug the code to ensure it runs correctly.

Throughout the code development process, frequent communication and collaboration are required between llmAI and nllmAI. For example, nllmAI may discover errors in the code or areas that do not meet requirements and provide feedback to llmAI. Then, llmAI will modify the code draft based on the feedback and resubmit it to nllmAI for verification. This process will continue until the generated code meets the requirements and runs correctly.

In the entire collaboration process, llmAI is primarily responsible for understanding requirements, resolving conceptual issues, and generating code drafts, while nllmAI is responsible for specific code development, testing, and verification tasks. This collaborative approach can fully leverage the advantages of llmAI and nllmAI in automated programming, achieving more efficient and accurate code generation.

The main advantages of the new paradigm lie in its flexibility and collaborative effects. llmAI and nllmAI can flexibly combine and collaborate based on task requirements to solve various complex tasks. Furthermore, cross-category collaboration helps address problems that a single agent may struggle to solve, improving the overall system performance.

Through the analysis, experiments, and practices described above, it is evident that collaboration between llmAI and nllmAI agents under the new paradigm can already accomplish many complex tasks, and will continue to tackle even more complex challenges.

3. Governance and Security Challenges of the New Paradigm

In the new paradigm, governance and security are both important topics that need to be studied and explored from multiple levels, including technology, ethics, and law.

First, due to the large-scale and continuous collaboration among agents, while humans gain significant benefits, they inevitably lose much control. To ensure transparency and traceability in the agent collaboration process, real-time monitoring and auditing mechanisms need to be implemented. This includes monitoring agent behaviors, the content of communications among them, and task execution statuses, as well as timely intervention and auditing of potential anomalies, data breaches, and security risks.

Agent collaboration involves a large amount of data exchange, so strict data privacy and security measures must be implemented. This includes encrypting sensitive data, access controls, anonymization, etc., to ensure the safety of data during transmission and processing.

Large-scale agent collaboration also raises complex ethical and accountability issues. Therefore, AI ethical guidelines and accountability regulations need to be established to ensure that agents adhere to ethical and legal standards during collaboration. Additionally, it is essential to clarify the responsibility of agents to hold parties accountable when issues arise.

Large-scale agent collaboration also involves cross-industry and cross-border cooperation, necessitating the development of corresponding laws and policies. This includes regulations regarding data, intellectual property, privacy rights, and oversight policies for international agent collaboration.

Moreover, to better govern large-scale agent collaboration, education and training are needed to enhance people’s understanding and skills regarding agent collaboration, AI ethics, and security. Through these measures, effective governance of large-scale agent collaboration can be achieved, ensuring that the benefits of such collaboration for humanity are realized alongside safety, compliance, and efficiency. Consequently, a series of specific operational recommendations are needed:

1) Design a Unified Collaboration Protocol: Develop a unified collaboration protocol applicable to different types of agents (such as llmAI and nllmAI), clarifying role divisions, communication mechanisms, and task assignments in the collaboration process to achieve efficient collaboration.

2) Introduce Regulatory Mechanisms: Establish regulatory agencies for agents to monitor and audit information exchange, task assignments, and resource usage during collaboration, ensuring compliance and safety throughout the process.

3) Establish a Security Protection System: Strengthen security standards for communication among agents, drafting protocols that adopt encryption communication, access controls, and data privacy protection technologies to prevent information leakage and malicious attacks.

4) Introduce an Agent Security Level Assessment System: Design algorithms for agent detection that assess agent behaviors and collaboration strategies based on task requirements and resource conditions. Test their security levels.

5) Promote Multi-Party Participation: Encourage all stakeholders to participate in the governance of agent collaboration, including developers, users, regulatory bodies, etc., to jointly establish norms, assess risks, and solve problems.

Conclusion:

The growth of the number of agents and their collaboration will become increasingly close, also bringing challenges and issues such as data privacy, security, ethics, and morality. Therefore, when developing and applying these agents, humans need to weigh the pros and cons and take appropriate regulatory and control measures to ensure the continuous, safe, and sustainable development of artificial intelligence.

“Sapiens: A Brief History of Humankind,” Yuval Noah Hararihttps://github.com/people-art/QAV“Artificial Intelligence: A Modern Approach,” Stuart Russell“A Brief Analysis of Computational Trust,” Zhang Jialin Generated by GPT-4 Aivanda Laboratory Tests and Experimentshttps://github.com/people-art/QAV/blob/6a1c48cb96eab676a1fb04dbf12204bf288b75d7/GPT-4-testcodepen.io, GPT-4 Demonstrationhttps://openai.com/research/gpts-are-gptsEditor: Yu TengkaiProofreader: Cheng Anle

Leave a Comment Cancel reply