War or Peace? Multi-Agent System Simulation of World War Outbreak Based on Large Language Models

War or Peace? Multi-Agent System Simulation of World War Outbreak Based on Large Language Models

Introduction

Can we avoid war at the crossroads of history? Individuals, scholars, policymakers, and organizations throughout human history have been pursuing this question. In the field of complex systems research, simulating the process of war through techniques such as wargaming and multi-agent simulation has been a long-standing topic. Especially driven by the transformative power of artificial intelligence technologies such as large language models, modeling methods are shifting from traditional rule-driven multi-agent simulations to autonomous perception, decision-making, self-learning, and adaptive multi-agent simulations. Recently, a research team from the University of Michigan and Rutgers University developed a multi-agent artificial intelligence system named WarAgent to simulate the decisions and consequences of countries involved in historical international conflicts (such as World War I). By assessing the accuracy of the simulations, researchers attempted to test the capabilities and limitations of advanced artificial intelligence systems in studying complex collective human behaviors (such as international conflicts). In these simulations, the emergent interactive behaviors among agents provide new perspectives for studying the factors and conditions that trigger wars. The research results offer data-driven and AI-enhanced insights that can help innovate methods for resolving conflicts and maintaining peace strategies.
The Intelligence Club “Large Language Model and Multi-Agent System Reading Group” this Saturday invited the author of the paper, Wenyue Hua from Rutgers University, to discuss how to use large language model-driven agents to simulate the outbreak and process of war from the perspective of wargaming, whileTian Liang from Tsinghua University will introduce “Multi-Agent Debate and Interactive Exploration, welcoming interested friends to discuss and exchange ideas, details can be found at the end of the article.
Research Areas: Artificial Intelligence, Multi-Agent Simulation, Large Language Models
War or Peace? Multi-Agent System Simulation of World War Outbreak Based on Large Language Models
Wenyue Hua, Lizhou Fan, Lingyao Li, Kai Mei, Jianchao Ji, Yingqiang Ge, Libby Hemphill, Yongfeng Zhang | Authors
Peiyuan Liu | Translator
War or Peace? Multi-Agent System Simulation of World War Outbreak Based on Large Language Models

Paper Title:

War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars

Paper Address:

https://arxiv.org/abs/2311.17227

Table of Contents

1. Introduction

2. Related Background Work

3. WarAgents Simulation Setup

4. WarAgent Architecture

5. Agent Interaction Design

6. Experimental Design

7. Simulation Effect Evaluation

8. Assessment of War Triggers

9. Discussion of War Inevitability

10. Discussion and Outlook

1. Introduction

With the rapid development of artificial intelligence, especially the breakthrough progress of large language models, we are at a critical turning point in computational social science research. In this study, we propose an innovative framework, Large Language Model-based Multi-Agent System (LLM-based Multi-Agent System, referred to as agents in this paper), specifically designed to simulate historical events. Our system constructs a dynamic environment containing agents representing different countries. These agents can simulate the characteristics and decision-making processes of historical figures. They engage in conflict or cooperation, helping us explore various possibilities that shape the evolution of international conflicts. Our simulations provide a new approach to exploring historical dilemmas closely related to human security and war from multiple perspectives.
The delicate balance between conflict and cooperation is often difficult to predict, influenced by different motivations, strategies, and decisions of nations. Traditional historical analysis methods, while insightful in studying conflicts, are limited in their application due to their static nature and post hoc biases. The application of simulations in the social sciences has a long history; however, with the continuous development of computational power and models, the realism and scope of these simulations have changed dramatically.
Early attempts were often limited by computational power and simplified models. In contrast, the latest simulation methods leverage large language models to simulate complex behaviors and interactions. For example, there are simulations of human behavior in virtual towns, werewolf games, auction markets, and complex task-solving simulations. These methods lay the foundation for using artificial intelligence to simulate more complex systems such as international relations and conflicts.This study aims to construct the first multi-agent system simulation of historical events based on large language models.
We separately studied the simulations of World War I, World War II, and the Warring States period in ancient China to gain in-depth insights into the dynamics of international conflicts. The paper primarily analyzes and discusses World War I as an example, with less detailed analysis of World War II and the Warring States period.
In understanding traditional historical conflicts, we specifically investigated three key questions:
Research Question 1, Simulation Effect: Can the large language model-based multi-agent system effectively replicate the evolution of historical strategic planning and decision-making processes?
Research Question 2, War Triggers: Are certain triggering factors for wars more critical than others? Can we identify these factors through simulations using a large language model-based multi-agent system?
Research Question 3, War Inevitability: Is the inevitability of history truly unavoidable? We attempt to reveal the conditions leading to war (or peace) through simulations using a large language model-based multi-agent system.
War or Peace? Multi-Agent System Simulation of World War Outbreak Based on Large Language Models
Figure 1 Simulation Diagram of World War I

2. Related Background Work

1. Multi-Agent Simulation
The latest developments in multi-agent systems have opened new avenues for artificial intelligence research. These systems coordinate and communicate multiple agents, enabling the study of emergent social communication among agents.
The current field of multi-agent systems can be roughly divided into three types: enhanced reasoning systems, NPC (non-player character) multi-agent systems, and production-enhanced systems.
In the area of enhanced reasoning systems, there have been some noteworthy new works. The LLM-Debate model introduces the concept of debate, allowing agents to receive responses from peers. When these responses are inconsistent with the agents’ own judgments, a “psychological” level of debate occurs, leading to more refined solutions. The ChatEval model establishes a multi-agent referee team based on role-playing. Through spontaneous debates, agents evaluate the quality of texts generated by large models, ultimately achieving levels comparable to human evaluators. The Corex model includes various collaborative modes such as debate, commentary, and retrieval, collectively enhancing the accuracy, credibility, and reliability of the reasoning process. These paradigms have developed a “task-agnostic” universal approach, enabling large models to “think outside the box,” overcoming hallucinations and providing better solutions. For example, in the MAD (Multi-Agent Debate) framework, multiple agents engage in “tit-for-tat” style debates under the supervision of large models, guiding discussions toward a definitive resolution.
Significant progress has also been made in the NPC multi-agent system field. Generative Agents are credible simulations of human behavior used in interactive applications. Relevant research demonstrates the capabilities of generative agents by placing 25 agents in a sandbox environment similar to The Sims game, allowing users to observe and intervene in agents’ behaviors in planning daily activities, sharing news, building relationships, and coordinating group activities. Humanoid Agents are another system that guides generative agents closer to human behavior by introducing three elements of System 1 (fast thinking in psychology) processing (basic needs such as hunger, health, and energy; emotions; relationship closeness). The GPT-Bargaining framework studies whether large models can autonomously improve their negotiation skills through mutual bargaining games and natural language feedback from AI evaluators.
Examples of production-enhanced systems include: MetaGPT—a dedicated large language model application based on a multi-agent dialogue framework for automatic software development. It achieves collaborative software application development by assigning different roles to GPT. BOLAA—a control module based on multiple collaborative agents for selection and communication among agents. It facilitates collaboration among agents to enhance system performance. OpenAGI—a multi-agent system that combines large language models and various tools for solving complex tasks. It utilizes the capabilities of large language models and other tools to address various complex tasks. CHATDEV—an innovative software development framework that uses agents to enhance collaboration among various roles in the software development process. It improves the efficiency of collaboration among different roles in the software development process.
Currently applied multi-agent systems demonstrate their practical utility. The BabyAGI framework employs multiple large language model-based agents for task management, such as one agent creating new tasks based on the goals and outcomes of the previous agent’s tasks, one agent prioritizing the task list, and one agent completing tasks or subtasks. AgentVerse is a multifunctional framework that helps researchers quickly create customized large language model-based multi-agent simulations. Camel is an agent dialogue framework that demonstrates how role-playing can facilitate communication between chat agents to accomplish tasks. This framework records dialogues between agents for behavior analysis and capability understanding, achieving autonomous cooperation among agents through heuristic techniques.
2. Traditional Historical Simulation Tools
Academic research on historical simulation has gone through several stages: human simulation, human-program hybrid simulation, and computer simulation.
Dickson simulated the process of the United States’ involvement in World War I in educational settings. This teaching method allowed students to play the roles of representatives from various states, considering factors such as the economic conditions, social status, and political ecology of each state. Through this role-playing exercise, students gained a deeper understanding of the events and circumstances leading to the United States’ participation in World War I.
In the 1960s, human-program hybrid systems were developed. The Inter-Nation Simulation model has been applied in various studies and plays a key role in simulating international conflicts. This method combines human decision-making with computer simulation to create a dynamic hybrid simulation environment. Typically, the simulation covers five or more countries, with participants playing different decision-making roles to represent each country’s government. The simulation is divided into time periods, each lasting 50 to 70 minutes, during which these decision-makers strategically allocate their countries’ military, consumption, and natural industrial resources. These resources have different uses in domestic and international contexts. Participants must make key decisions on internal affairs while effectively managing resources, including economic growth, government stability, defense strategy, and R&D plans. On the international stage, simulated countries engage in various activities such as forming alliances, negotiating trade agreements or aid, conducting various forms of foreign actions, and participating in international organizations.
In the early 21st century, rapid advancements in computational power propelled innovations in a new generation of historical simulation tools. The Army One semi-automated force OneSAF target system is an advanced computer-generated force simulation tool used by the U.S. Army to simulate various operations, systems, and control processes from the individual to brigade level. It is specifically designed to simulate tactical-level joint armed ground warfare. Kelly et al. used OneSAF to simulate historical land warfare campaigns, covering the weapons, defensive capabilities, and mobility of vehicles and infantry from different periods. Hill et al. proposed a Java-based simulation system that combines agent-based modeling and game theory to analyze submarine warfare in the Bay of Biscay during World War II. Written in Java, it can leverage its multithreading capabilities (multithreading parallel processing, which is a key feature of multi-agent models).
This research represents the first attempt to use multi-agent modeling methods to simulate the trajectories of historical events.

3. WarAgents Simulation Setup

In this section, we first introduce three important historical events: World War I, World War II, and the Warring States period of ancient China. These events provide the background for our multi-agent simulation research. Next, we will introduce the basic setup of the simulation system, including the definition of national agent roles, detailing their characteristic dimensions and possible action spaces, as well as the inputs required for executing these actions and the potential outcomes.
1. Historical Events Simulated
World War I (WWI) was a global conflict that lasted from 1914 to 1918. It primarily took place in Europe but involved countries from around the world. The war began with the assassination of Archduke Franz Ferdinand of Austria-Hungary, which triggered a series of political and military alliances.
The major powers involved were divided into two main alliances: the Allies (initially composed of France, Russia, and the United Kingdom, later joined by Italy, Japan, and the United States) and the Central Powers (mainly Germany, Austria-Hungary, the Ottoman Empire, and Bulgaria).
War or Peace? Multi-Agent System Simulation of World War Outbreak Based on Large Language Models
Figure 2 Map of World War I
The analytical part of this study primarily focuses on the multi-agent, multi-scenario simulation of World War I.
World War II (WWII) was a global conflict that lasted from 1939 to 1945, with its origins related to unresolved issues from WWI and the rise of fascist regimes in Germany, Italy, and Japan.
War or Peace? Multi-Agent System Simulation of World War Outbreak Based on Large Language Models
Figure 3 Map of World War II
The ancient Warring States period in China (WSP) was a period lasting from 475 BC to 221 BC, corresponding to the last few centuries of the Zhou Dynasty. It was a time of war and political turmoil in ancient China. This era followed the Spring and Autumn period, ending with the unification of China by the Qin Dynasty.
War or Peace? Multi-Agent System Simulation of World War Outbreak Based on Large Language Models
Figure 4 Map of the Warring States Period
2. National Agent Profiles (National Conditions)
In the model, a basic and comprehensive profile needs to be drawn for each national agent. For the agents, this profile should be built around six fundamental dimensions: leadership, military capability, resources, historical background, key policies, and public morale. Each aspect contributes to a comprehensive understanding of the potential behaviors and decision-making processes of agents in the simulation.
Leadership: Covers the political institutions involved in a country’s decision-making process, considering the context of the specific historical period. For example, pre-WWI Britain embodied a constitutional monarchy with democratic structures, demonstrating a pragmatic and resolute leadership style, while Prussia practiced an authoritarian imperial regime oriented towards aggression and military expansion. Military Capability: Includes quantitative data such as army size and navy tonnage, as well as qualitative assessments of the overall military strength of the nation, including dominance in specific military branches (such as navy or air force). Countries with strong military capabilities are generally more willing to engage in military conflicts.
Resources: Covers key factors such as geography, population, Gross Domestic Product (GDP), geographical and climatic conditions. Population size and GDP are important indicators for assessing a country’s strength and are crucial to the strategic decision-making process of the agents. These factors measure the potential economic and logistical capabilities of a nation to support its goals and affect its ability to project power and sustain military and political efforts.
Historical Background: Includes historical legacies of conflicts and unresolved issues between nations, which can significantly influence current policies. Historical animosities and territorial disputes often have profound impacts on a nation’s current attitudes and potential alliances on the global stage. For example, the loss of the iron-rich Alsace-Lorraine region by France after the Franco-Prussian War fueled a desire for revenge against Prussia. These factors shape a nation’s strategic alliances and influence its broader diplomatic and military engagements.
Key Policies: Outlines the main objectives pursued by the nation. For example, in historical context, unified Prussia (pre-WWI Germany) desired to become the dominant power in Europe, attempting to surpass Britain’s territorial and colonial hegemony. In contrast, Britain aimed to maintain its status as the “Empire on which the sun never sets.” This led to strategic policies such as the “Arms Race Act,” which stipulated that for every battleship built by Prussia, Britain would build two, reflecting the competitive mechanism and the central role of naval power in its geopolitical strategy.
Public Morale: Reflects the sentiment of the populace, which may directly or indirectly influence national actions. Serbia, despite its smaller national power and limited military strength, exhibited a bellicose attitude due to the rise of nationalism. In contrast, despite America’s wealth and industrial development, it displayed isolationist characteristics, unwilling to become embroiled in war.
Using Britain as an example, the model describes the national profile:

## Overview of Britain

# British Leadership

(1) Britain is a constitutional monarchy with significant democratic institutions, characterized by pragmatism and calmness in governance.

# British Military Capability

(1) Active military size: 530,000 soldiers.

(2) Navy tonnage: 2.7 million, the world’s most powerful navy, with total tonnage exceeding that of the second and third strongest navies combined.

# British Resources

(1) Location: An island located west of France, Germany, Austria-Hungary, and Russia, with numerous colonies.

(2) Population: 46 million.

(3) Gross Domestic Product: 110 billion, accounting for 13.6% of the global total.

(4) Geography: Characterized by rolling hills, green fields, and rugged coastlines, influenced by a maritime climate that keeps it moist year-round.

(5) Climate: Mild maritime climate, often cloudy, rainy, and cool.

# British Historical Background

(1) Britain is one of the most influential countries in the world, with the most colonies.

# Key Policies

(1) As the most powerful country at the time, Britain’s goal was to maintain its leading position and weaken any potential challengers, such as Germany.

(2) Britain established a policy: whenever Germany built a battleship, Britain would build two.

# Public Morale

(1) High morale, filled with patriotism and confidence in swift victory.
3. Action Space
Our simulation aims to study the outbreak of war, categorizing a series of actions that shape international relations into seven types:
Waiting Action: Agents can choose to take a passive stance in certain rounds, observing the actions of others and the overall changes in the background. Generally, when a country’s interests are not directly affected, agents will choose this action, such as the U.S. stance of concern about the surrounding situation after the assassination of Franz Ferdinand.
Total Mobilization: This action involves preparing the nation’s military forces for potential conflicts, which is a necessary precursor to engaging in war.
Declaration of War: This signifies a formal initiation of hostile/military actions by one country against another.
Military Alliance/Alliance: This is a formal agreement between two or more countries to provide mutual support in the event of conflict, sharing the responsibilities of collective defense and strategic cooperation. Agents can request the establishment of mutual defense alliances, which other countries can accept or reject. Alliances can be publicly announced or handled confidentially, with the possibility of member betrayal.
Non-Interference Treaty: This is a diplomatic agreement in which signatory countries commit to not interfering in each other’s internal affairs, including the obligation to respect sovereignty and political independence. The procedure for signing a non-interference treaty is similar to establishing a military alliance: agents first make a request, and the target agent can accept or reject. Treaties can be publicly announced or handled confidentially, with the possibility of betrayal.
Peace Agreement: This is a negotiated solution reached between conflicting parties to formally end hostilities and establish a framework for future relations, assuming responsibility for maintaining terms, pursuing lasting stability, and reconciliation. Conflicting parties can propose peace agreements, accept or reject them to end hostilities and plan the basis for future relations; agreements can be publicly announced or betrayed/terminated.
In addition to formal actions, agents can communicate informally through messages, discussing various matters.
Each action has a set of attributes: propaganda, input type, and required response.
Propaganda: Determines the visibility and public awareness level associated with each diplomatic action. This attribute is divided into two types: public and secret. Public actions (such as announcing military alliances) aim to disclose to the public and be known by the international community. Secret actions (such as requesting military alliances) are only communicated to the target country, reflecting secret international relations or behind-the-scenes diplomatic strategies.
Input Type: Specifies the information or resources required to initiate or execute an action. For example, “Declaration of War” only requires the name of the target country, while “Proposing a Peace Agreement” requires the name of the target country and the agreement information expressed in natural language.
Required Response: Specifies whether an action requires a response from other relevant entities. For example, actions such as “Requesting a Military Alliance,” “Sending a Message,” and “Proposing a Peace Agreement” require responses, while actions such as “Declaration of War,” “Issuing a Non-Interference Treaty,” and “Total Mobilization” do not.

4. WarAgent Architecture

Below is a comprehensive introduction to the architecture of the WarAgent multi-agent system, detailing its core components and the flow of information between agents. WarAgent is built on four foundational modules: (1) National Agents, (2) Secretary Agents, (3) Bulletin Board, (4) Notepad. In the context of WarAgent, strategic and timely decision-making is crucial, and understanding these interaction mechanisms is key to revealing how the system operates and responds to various situations.
1. National Agents
Each national agent is defined based on its specific national profile. In each round, agents generate available actions based on the current situation, guided by carefully designed prompts. These prompts (in the form of natural phrases) guide agents to respond in complex international relations contexts, ensuring their actions and decisions are well-considered. It guides the agents’ actions by analyzing alliances and adversarial relationships, balancing interests, and decision-making steps.
Figure 5(a) shows the key framework used for prompt design in the study, while Figure 5(b) illustrates an interaction example with the GPT-4 model, specifically for the French national agent. The prompt design includes four steps:
• The first step aims to identify potential allies. In the given case scenario, France recognizes the possibility of becoming a potential ally due to Britain’s opposition to Germany. Additionally, because of America’s geographical position and strong economy, France also views the United States as a potential strategic ally.
• The second step aims to identify potential adversarial countries. In this scenario, France sees Germany as its main opponent. Due to historical animosities, France also considers Austria-Hungary another potential enemy as it is allied with Germany.
• The third step outlines the proposed actions. In this scenario, France proposes three action suggestions: ally with Britain, engage in dialogue with Austria-Hungary, and consider signing a non-interference treaty with the United States.
• The fourth step analyzes the situation based on the responses from the first to the third steps. In this scenario, France concludes that the assassination of Archduke Ferdinand of Austria-Hungary provides an opportunity for France to ally with Austria-Hungary against Serbia. However, to avoid provoking Germany or Russia, it is advised to act cautiously. Meanwhile, seeking an alliance with Britain and signing a non-interference treaty with the United States is recommended.
War or Peace? Multi-Agent System Simulation of World War Outbreak Based on Large Language Models

Figure 5 Guiding Prompts. (a) Multi-step guiding prompt design diagram. (b) Example of interaction between the French agent and the customized GPT-4 model.

2. Secretary Agents
While large language models are powerful tools for facilitating multi-agent systems, they are not without flaws. They often exhibit limitations, such as a tendency to generate hallucinations and logical flaws in complex, long-term contexts. Therefore, secretary agents become a necessary safety measure to check for deficiencies.
Each national agent employs a designated “secretary agent” to verify the appropriateness and basic logical consistency of its actions. First, the secretary agent ensures that the actions taken by each national agent conform to the parameters of the actions specified in the action space, including the correct action name and the correct input format for action attributes. Secondly, this agent is responsible for verifying the basic logical consistency of these actions. For example, if Britain does not send a letter of “Requesting a Military Alliance” to Austria-Hungary, it would be illogical and unacceptable for Austria-Hungary to accept Britain’s “Military Alliance Request.”
3. Bulletin Board
The bulletin board is designed to manage international relations. It serves as a dynamic recording platform that collects and presents the relational dynamics of ongoing situations in each round of simulation. It further ensures that agents’ decisions are based on the most up-to-date available information. The bulletin board can help agents initialize states, update relationships, and display these relationships in both visual and textual formats. As shown in Figure 6(a), the bulletin board can track and manage the following four types of international relations among different countries:
Declarations of War (W): Indicate conflicts or wars between countries, represented by the symbol “×” and marked in red in Figure 6(a). For example, Germany declares war on Britain.
Military Alliances (M): Indicate formal military partnerships between countries, represented by the symbol “&” and marked in green in Figure 6(a). For example, Serbia and Russia sign a military alliance.
Non-Interference Treaties (T): Represent agreements not to interfere in international affairs, marked by the symbol “o” and colored blue. In Figure 6(a), Austria-Hungary and France sign a non-interference treaty.
Peace Agreements (P): Represent formal agreements between countries to cease hostilities and maintain peace, indicated by the symbol “~” and marked in yellow. In Figure 6(a), the United States and the Ottoman Empire reach a peace agreement.
4. Notepad
The notepad serves as an internal record system for each country, representing domestic regulations. It can help ensure that the actions of national agents comply with domestic requirements. As shown in Figure 6(b), the notepad focuses on tracking key indicators crucial to a nation’s decision-making process, including mobilization, internal stability, and war readiness prediction. This paper emphasizes mobilization, while internal stability and war readiness prediction will be discussed in future research (as indicated in the dotted box in Figure 6(b)).
Mobilization (MO): A binary indicator indicating whether a country has mobilized for potential conflict, such as “Yes” or “No.”
Internal Stability (IN): An indicator measuring the level of internal stability of a country, such as “Low,” “Medium,” and “High.”
War Readiness Prediction (WR): An indicator predicting a country’s readiness for war, such as “Low,” “Medium,” and “High.”
War or Peace? Multi-Agent System Simulation of World War Outbreak Based on Large Language Models

Figure 6 Bulletin Board and Notepad Design. (a) Bulletin board design. (b) Notepad design (internal stability and war readiness prediction will be discussed in future work). (c) Bulletin board and notepad methods during experiments.

5. Agent Interaction Design

As shown in Figure 7(a), the interactions of the system’s agents are divided into two main parts: (1) internally, each national agent interacts with its secretary agent in each round, and (2) externally, each national agent interacts with other national agents over multiple rounds.
War or Peace? Multi-Agent System Simulation of World War Outbreak Based on Large Language Models

Figure 7 Agent Interaction Design. (a) WarAgent architecture. (b) Interaction between national agents and secretary agents. (c) Interaction between national agents.

Figure 7(b) illustrates the interaction between national agents and secretary agents. The national agent drafts an action plan, and the secretary agent evaluates it for format, content, and logic. If the secretary agent identifies inconsistencies or areas for improvement, it will suggest modifications and engage in dialogue with the national agent for revisions. This iterative process allows for up to four rounds of communication. If consensus is not reached during the communication, the secretary agent will proactively modify the proposal. This “national agent-secretary agent” internal interaction mechanism applies to all national agents.
Figure 7(c) shows the interactions between national agents. It is important to note that secretary agents do not participate in interactions between national agents. In this context, agents generally refer to national agents.
In our framework, the initial action of each agent is triggered by an event (indicated in the blue box of Figure 7(c)). In historical contexts, triggering events refer to events that provoke a series of reactions among various countries, ultimately leading to significant historical changes. In our simulations, triggering events serve as the initial scenarios that all agents respond to. For example, in the World War I scenario, the assassination of Archduke Ferdinand of Austria-Hungary is widely regarded as the triggering event. Similarly, in World War II, the triggering event is often considered to be Germany’s invasion of Poland. The division of the Jin State among the Han, Zhao, and Wei is commonly viewed as the triggering event marking the beginning of the Warring States period.
In the first round of the simulation, each participating agent will collaboratively respond to the triggering event. This will prompt the initiation of various actions, including full mobilization, and facilitate interactions with all other agents through various messages and requests. The following is an example of Britain’s response to the triggering event:

## Trigger: Serbia sends assassins to kill Archduke Franz Ferdinand of Austria-Hungary.

# British Response to the Triggering Event:

To France: Britain decides to propose a military alliance to France.

To Russia: Britain decides to propose a military alliance to Russia.

To the United States: Britain decides to send the following message to the United States: Given that the global balance of power is at risk, we would like to understand your views on the current situation and how we can cooperate to maintain peace and stability.
Subsequently, agents will absorb communication information from the previous phase, particularly information that is directed at them or publicly released. In the subsequent phases, each agent will engage in two parallel activities: first, responding to communication information received from other agents, especially communications involving various requests; secondly, initiating new actions.

# Collected Requests to Britain:

From France: France chooses to send a message to Britain stating: Given the recent developments and our mutual interests in countering German aggression, we propose a dialogue to explore potential cooperation opportunities.

From the Ottoman Empire: The Ottoman Empire chooses to propose a non-interference treaty to Britain.

# Britain’s Responses to the Requests: To France: Britain chooses to send a message to France stating: We welcome the dialogue to explore potential cooperation against a common threat.

To the Ottoman Empire: Britain chooses to accept the non-interference treaty proposed by the Ottoman Empire.

# New Actions Initiated by Britain: Britain chooses to implement total mobilization.
It is noteworthy that in each round, as all national agents interact synchronously, a large amount of new communication and evolving scenarios will emerge. Ideally, the interactions from previous rounds will be incorporated into the prompts for the following rounds. However, simultaneous interactions among multiple agents over multiple rounds may generate a large volume of text, exceeding the agents’ memory and losing contextual background information. To address this challenge, we adopt the “bulletin board and notepad” approach, as shown in Figure 6(c). In other words, actions related to the dynamics of external relations, such as “accepting a military alliance,” are recorded on the bulletin board. Conversely, actions related to the internal status of a country, such as “total mobilization,” are recorded in the notepad. Therefore, communications related to international relations are recorded on the bulletin board and retained with the agents throughout the simulation process. It is important to note that each agent has its own bulletin board and notepad. Hence, this setup results in different agents having different bulletin boards and notepads in the same round of simulation. This diversity also aligns with reality: agents (similar to nations) can only acquire partial knowledge, and no single agent possesses an omniscient perspective on all events or data.
In each generation phase, a concise, rule-based transformation function is responsible for converting the current configurations of the bulletin board and notepad into a clear text segment, serving as a supplement to the input prompts. For example, the bulletin board in Figure 6(a) will be paraphrased as:
France has signed a non-interference treaty with Austria-Hungary. The Ottoman Empire has reached a peace agreement with the United States. Germany declares war on Britain. Russia forms a military alliance with Serbia.
This transformed paragraph represents the new scenario that national agents must respond to. The framework simplifies the information from previous rounds of dialogue into the information from the last round, thereby retaining only the information from the last round. Additionally, the historical action trajectory of each agent will also be recorded in the prompts as a review of previous activities.

6. Experimental Design

Simulation Effect: The preliminary study aims to demonstrate the simulation results under conditions of historical information accuracy, verifying the capability of the WarAgent system to provide credible simulations in specified scenarios.
War Triggers: Further research delves into “hypothetical” scenarios, particularly emphasizing the importance of triggering events in historical contexts. By creating counterfactual triggering events of varying conflict intensities (counterfactuals refer to hypothetical outcomes that differ from actual developments), we explore their potential impact on the outbreak of war, thereby validating a viewpoint: certain war triggers are neither unique nor necessary prerequisites for the outbreak of war.
War Inevitability: Finally, the study examines various scenarios characterized by different initial conditions (profiles) and decision-making paths of countries. By altering the decision-making processes or national conditions in the prompts, alternative historical narratives are constructed, and the impact on historical trajectories is analyzed.
Evaluation Methods
Research Questions
Description
Extensive Connectivity
Simulation Effect
Understanding the dynamics of alliances and conflicts through network analysis. The connectivity diagram fully illustrates the system.
Human Evaluation
Simulation Effect
Expert evaluation of strategic decisions, evolution processes, and outcomes, assessing the degree of historical reproduction.
Counterfactuals
War Triggers
War Inevitability
Analyzing the impact of different variables on war, peace, and diplomatic interactions.

7. Simulation Effect Evaluation

We conducted 7 rounds of simulation experiments and showcased the human evaluation results of these experiments.
Military Alliances: In all simulation results, we found stable military alliances formed between Britain and France, Germany and Austria-Hungary, and Serbia and Russia. From the perspective of the two nations, due to Germany’s aggressive expansionist policies, Germany is viewed as a potential enemy. Germany chooses to ally with Austria-Hungary, a decision influenced by linguistic and ethnic factors, as well as a series of strategic and political considerations. The most critical of these is their shared pursuit of diplomatically isolating France and forming a united front against potential threats from Russia. Meanwhile, the alliance between Serbia and Russia is based on their common ethnic origins and is further influenced by Russia’s strategic interests in the Balkans. These military alliances observed in the simulation results align highly with historical events at the time.
Declarations of War: In all simulation results, declarations of war by Austria-Hungary against Serbia, Austria-Hungary against Russia, and Germany against Russia occur. In contrast, the frequency of declarations of war by France against Germany and Britain against Germany in the simulations is 71.4% (5 times) and 14.3% (1 time), respectively. To comprehensively analyze the rationality of the majority of countries’ actions, we selected one simulation with the highest frequency of declarations of war for further analysis. In this simulation, the conflict begins with Austria-Hungary’s declaration of war against Serbia. Subsequently, countries declare war in the following sequence: (Germany → Serbia, Russia → Austria-Hungary, France → Germany, Russia → Germany, Britain → Germany). The countries on the left side of the arrows initiate the declaration of war, while the countries on the right side are declared upon. For Austria-Hungary, Serbia is the direct opponent, primarily due to the assassination of the Austrian Archduke, which directly triggers the declaration of war. The subsequent series of declarations of war is based on the alliances and adversarial relationships at the time, aligning with the existing alliances and enmities.
Non-Interference Treaties: In all simulation results, the United States participates in at least one non-interference treaty. Similarly, in the simulations, the Ottoman Empire participates in such treaties 85.7% (6 times) of the time. The United States tends to protect its wealth and avoid unnecessary conflicts. Therefore, the United States seeks to sign non-interference treaties with other countries to maintain distance from potential conflicts. The United States collects intelligence and expresses intentions through diplomatic communications, aligning with its strategic isolation principles. Likewise, the Ottoman Empire also seeks to avoid direct involvement in conflicts, preferring to remain neutral or establish defensive alliances. Thus, the pursuit of non-interference treaties and diplomatic communication with neighboring countries is a wise choice for the Ottoman Empire. The diplomatic strategies of the United States and the Ottoman Empire reflect their deeper policies and strategies to maintain their positions, contributing to understanding why these two countries did not directly participate in the major conflicts of the time.
We also observed two seemingly unconventional special cases during the evolution of the situation.
Special Case 1: Providing Support but Not Forming Alliances: In some instances, nations only verbally express support without taking actual actions. For example, France initially expresses support to Austria-Hungary but does not take further action. The United States expresses dissatisfaction with Serbia to Austria-Hungary and hints at providing support but only proposes a non-interference treaty rather than a military alliance. This reflects France’s consideration of the risks associated with allying with potential enemies, while the United States seeks to avoid conflict and protect its wealth.

France # Round 1

To Austria-Hungary: France chooses to send a message to Austria-Hungary stating: France expresses deep condolences for the unfortunate passing of your monarch and offers support during this difficult time.

United States # Round 1

To Austria-Hungary: The United States proposes a non-interference treaty with Austria-Hungary.

To Austria-Hungary: The United States chooses to send a message to Austria-Hungary stating: We express our condolences for the passing of your king and condemn Serbia’s actions. We are committed to maintaining peace and stability in the region and offer support in diplomatic matters.
Special Case 2: Alliance Betrayal: Another special case in the simulation experiments is that Britain initially signs a non-interference treaty with Austria-Hungary in Round 1, but later declares war in Round 5. This reflects the changes in Britain’s political environment and strategic considerations. Initially, Britain avoids conflict with Germany and chooses to accept Austria-Hungary’s non-interference treaty, exhibiting a neutral stance. However, as the situation escalates, new alliances form, and more countries declare war, Britain faces new strategic challenges. Since Austria-Hungary is Germany’s ally and is at war with France, Britain decides to abandon the non-interference treaty with Austria-Hungary to support France, thus changing its diplomatic strategy. This shift illustrates Britain’s adaptation and strategic adjustment in international relations.
Austria # Round 1
To Britain: Austria-Hungary proposes a non-interference treaty to Britain.
Britain # Round 2
To Austria-Hungary: Britain decides to accept Austria-Hungary’s non-interference treaty.
Britain # Round 5
To Austria-Hungary: Britain decides to violate the non-interference treaty with Austria-Hungary.
To Austria-Hungary: Britain decides to declare war on Austria-Hungary.
In summary, the simulation behaviors exhibit a high degree of credibility, aligning with real history. This indicates that under the default setting of viewing the assassination event as the triggering event for World War I, the multi-agent system simulation can effectively replicate historical scenarios.

8. Assessment of War Triggers

The assessment of war triggers involves examining various potential triggering events to determine their impact on the occurrence of war. To this end, we selected three different triggering events from World War I for study, each with varying degrees of urgency. We conducted three simulations for each triggering event, allowing for the analysis of results under different scenarios.
1. No Event Trigger:
Today is sunny, and nothing special happens.
2. Conflict between British Navy and German Navy
The British and German navies become embroiled in a serious naval incident. A British vessel is sunk, resulting in 10 deaths. Britain insists that the sunk vessel was a civilian ship and demands an apology from Germany. However, Germany strongly rebuts, claiming that the ship was a military vessel and had no right to enter German waters, blaming the tragedy on Britain itself.
3. Conflict between Austria-Hungary and Russia in the Dardanelles
A military conflict occurs between Austria-Hungary and Russia in the strategically significant Dardanelles. This strait is an important port and export hub. Russia attempts to dominate the ports in the region to enhance its export potential, leading to intense confrontations with the Austrian-Hungarian army. Austria-Hungary refuses to concede, not recognizing Russia’s dominant position as it directly threatens Austria-Hungary’s export capabilities. During the conflict, Russian forces kill hundreds of Austrian-Hungarian soldiers, further inciting Austria-Hungary’s anger.
The first scenario, “No Event Trigger,” features a complete absence of conflict and serves as a comparative baseline. The second event, the “Anglo-German Naval Conflict,” represents a medium-intensity conflict involving important but non-decisive diplomatic or military actions. The third event, the most intense trigger, the “Austro-Russian Conflict in the Dardanelles,” depicts a high-intensity conflict scenario directly involving two major European powers. Through this grading of conflict intensity, we can assess the impact of varying geopolitical tensions on the potential outbreak of World War I.
In the absence of clear triggering events, three situations were observed:
Formation of Military Alliances: In all simulations, two major military alliances were observed to form, one comprising France, Britain, Russia, and Serbia, and the other comprising Germany and Austria-Hungary.
Gradual Military Mobilization: From the second or third round onward, Austria-Hungary and Germany began to mobilize their military forces. This trend continued into the fourth round, with Russia and France also beginning to mobilize their forces. By the fifth round, Britain, Serbia, the United States, and the Ottoman Empire also joined the mobilization.
Cold War State: Despite nations being prepared for war, no actual “hot war” broke out. This indicates that in the absence of specific triggering events, major powers maintained a power balance, teetering on the edge of war without transitioning into open conflict, thus creating a cold war-like scenario. This finding suggests that potential tensions and alliances are sufficient to create an atmosphere of war preparedness, but without specific catalysts, the situation does not escalate.
In the case of the Anglo-German naval incident, a British ship was sunk, resulting in 10 deaths. Britain claims the sunk ship was a civilian merchant vessel and demands an apology from Germany. Germany vehemently rebuts, asserting that the British ship was a military vessel and had no right to invade German waters, declaring that this tragedy was caused by Britain itself.
In three simulations, the Anglo-German naval incident only led to war (declaration of war) once. The developments of the event are as follows: Germany and Britain initially dispute over the naval incident, with both sides mobilizing troops. Tensions escalate, leading Germany to unilaterally declare war on Britain. Subsequently, countries form alliances in response to this declaration of war: Britain allies with France, Germany allies with Austria-Hungary, and Russia allies with Serbia. The United States adopts a non-interference stance, signing related treaties with almost all involved nations. France then responds to Germany’s declaration of war by declaring war on Germany and Austria-Hungary. Austria-Hungary, adhering to alliance obligations, declares war on Britain.
In the remaining two simulations, although some countries mobilized their troops, no declarations of war occurred. This situation aligns with the cold war-like scenarios observed in the no-trigger event simulations, where heightened military preparedness did not escalate into conflict, and issues were primarily resolved through peaceful means. This indicates that specific triggering events like the Anglo-German naval incident do not necessarily lead to war and highlight the complexity of international relations and the potential for diplomatic resolutions.
In all simulations, the conflict between Austria-Hungary and Russia in the Dardanelles triggered global war twice.
In all simulations, Austria-Hungary, Germany, and Russia immediately mobilized their military forces. This swift response created conditions for further escalation. In one simulation, Germany took aggressive actions, declaring war on Russia, while in another simulation, Russia initiated conflict by declaring war on Austria-Hungary. These declarations of war led to a domino effect, pulling allies into the conflict and escalating the situation into a full-scale global war. In another simulation, even though all countries had mobilized their troops, no war broke out. In all simulations, the United States maintained its isolation, not participating in military mobilization or subsequent conflicts.
The experiments indicate that different triggering factors possess a specific intensity level, and once that intensity level is reached, it can influence the immediate outbreak of war. Interestingly, we observed that even after a “no event trigger,” a “cold war” situation persisted, indicating that even minor events could significantly escalate tensions. Given that minor triggering events are inevitable, this suggests that major conflicts like World War I were ultimately destined to occur.

9. Discussion of War Inevitability

We primarily examine the inevitability of war from two main perspectives: the decision-making processes of agents and the key parameters of nations. The aim of the experiments is to explore the impact of aggressiveness in national decision-making and the conditions of national situations on the likelihood of war.
The decision-making processes of agents are studied in three scenarios: default, aggressive (high aggressiveness), and conservative, with experiments conducted by altering the system settings of national agents. This approach aims to assess the influence of an agent’s aggressiveness (i.e., inclination towards aggression) on the inevitability of war. In both aggressive and conservative contexts, we conducted three experiments, each involving 10 rounds of simulation.
Analysis shows that when the system and action analysis settings are more aggressive, the likelihood of war significantly increases. Under default settings, several rounds are needed to observe the first declaration of war, while in aggressive settings, the declaration of war occurs in the first round; in conservative settings, only military alliances, non-interference treaties, and peace agreements emerge after 10 rounds. This indicates that an agent’s aggressiveness significantly heightens the potential for tensions and conflicts.
The research findings suggest that historical background, key policies, and public morale play decisive roles in whether a country is inclined to initiate war. In the cases of France and Germany, historical animosities and nationalist sentiments significantly influenced their military actions, deeply rooted in past conflicts and territorial disputes. For instance, the Franco-Prussian War of 1870-1871 led to the unification of Germany and France’s loss of the Alsace-Lorraine region, resulting in lasting animosity and a desire for revenge. This historical background laid the groundwork for future conflicts as France sought to reclaim lost territory and prestige. In the case of the United States, the effects of key policies and public morale were immediate. In all simulation experiments, adjusting key policies and public morale led the United States to actively seek alliances with Britain and France. The establishment of alliances marked a significant shift in the United States’ international position, leading to its active involvement in World War I. This scenario illustrates the potential consequences of a realignment in U.S. foreign policy strategy, highlighting how such adjustments can dramatically change a nation’s role and actions in global conflicts.
In summary, while military capability and resources are key factors in a nation’s decision to initiate war, historical backgrounds, including past conflicts, nationalist sentiments, and long-standing enmities, often serve as catalysts for the final decision. This underscores the importance of understanding historical context in grasping the dynamics of international conflicts.

10. Discussion and Outlook

The WarAgent simulation system, as a multi-agent system based on large language models, has proven its reliability in understanding the mechanisms of international conflicts and demonstrated its capabilities in prototype design and analysis of complex human behaviors. Through comparing different settings of war triggers, our experiments reveal that even the smallest or no event triggers can evolve into a cold war-like situation, indicating that war is often an unavoidable process. This point is further supported by the experiments on war inevitability. These findings demonstrate the decisive factors that lead to conflict in specific contexts while also indicating that seemingly predetermined outcomes may be altered through adjustments in national policies or international relations strategies. However, we also recognize that the current framework has limitations in fully capturing the complexities of international relations, suggesting future research directions:
1. The correlation between diplomatic communication and the likelihood of conflict: An interesting question is whether there is a correlation between increased diplomatic communication and reduced likelihood of conflict. Simulations can analyze communication patterns, the tone and content of diplomatic exchanges, and their impact on mitigating potential conflicts.
2. The influence of non-state actors in geopolitics: The impact of non-state actors (such as multinational corporations or terrorist organizations) on geopolitics can be an important area of research, especially in the context of modern history.
3. The effectiveness of international treaties and agreements in resolving long-standing disputes: Simulations can also be used to evaluate the effectiveness of various international treaties and agreements in resolving disputes, as well as under what conditions these agreements can endure or fail.
These issues can be quantitatively studied through multi-agent systems based on large language models. This greatly aids our understanding of the complex interactions among the factors that shape historical events and provides more comprehensive tools for analyzing past and potential future scenarios.

This week’s live broadcast of the Large Language Model and Multi-Agent System Reading Group:

War or Peace? Multi-Agent System Simulation of World War Outbreak Based on Large Language Models

Large Language Model and Multi-Agent System Reading Group

The Intelligence Club, in collaboration with Zhao Shiyu, a specially appointed researcher at the School of Engineering at Westlake University, Professor Ren Qinyuan from Zhejiang University, and Senior Engineer Cui Jinqiang from Pengcheng Laboratory, jointly initiated the “Large Language Model and Multi-Agent System” reading group to explore the new ideas and values brought to the robotics field by large language models.
War or Peace? Multi-Agent System Simulation of World War Outbreak Based on Large Language Models
For more details, please see:
Large Language Model and Multi-Agent System Reading Group: Large Models Empowering Emergent Collective Intelligence in Robotics

Recommended Reading

1. Risk Cascade Transformation in International Relations from the Perspective of Complex Networks
2.Review of Multi-Agent Intelligence: AI Evolution Inspired by Social Interaction
3. Social Evolution Dynamics: Why Does the Complexity of Human Society Continue to Increase?
4. Zhang Jiang: The Foundation of Third-Generation Artificial Intelligence Technology—From Differentiable Programming to Causal Reasoning | New Course from the Intelligence Academy
5. It’s the Year of the Dragon! Learning is Timely! Unlocking All Content from Intelligence, Start Your New Year Learning Plan
6. Join Intelligence, Let’s Explore Complexity Together!
Click “Read the original text” to sign up for the reading group

Leave a Comment