Embodied Cognitive Intelligence Framework for Autonomous Unmanned Systems

Think Tank Highlights #Global Defense Dynamics #US Military Dynamics #Russian Military Dynamics #Taiwan News #Weidian for Sale

#South Korea #Raytheon #Japan #Electronic Warfare #Northeast Asia Military Dynamics #Unmanned

The following article is sourced from the Science and Technology Herald, authored by Sun Changyin, Mu Zhaoxu, et al., and reprinted from the Armed Police Research Institute.

Autonomous unmanned systems are a type of intelligent system that possesses autonomous cognition, motion planning, decision-making, and reasoning capabilities. Their goal is to complete general tasks in complex, open, dynamic scenarios with little to no human involvement. This article addresses a series of issues faced by autonomous unmanned systems in cross-domain collaborative tasks, such as low efficiency in collaborative perception, poor reliability of self-organized network communication, slow resource scheduling processes, and conflicts in task allocation. It explores the integration of large models and generative artificial intelligence technologies to construct an “intelligent cognitive architecture” that combines “large models + autonomous unmanned systems + AI-generated content” to promote the application of embodied cognitive intelligence in autonomous unmanned systems.

The concept of embodied intelligence can be traced back to Norbert Wiener’s 1948 publication “Cybernetics: Or Control and Communication in the Animal and the Machine.” In this work, Wiener elaborated on the core ideas of control, feedback, and human-machine interaction, proposing that “an automatic control system must adjust its movements according to changes in the surrounding environment,” emphasizing that machines generate and develop behavioral intelligence through interaction with their environment (Figure 1). Wiener’s ideas marked the early exploration of how machines can mimic biological adaptive behaviors. The development of machine intelligence relies not only on advances in algorithms and computational power but also on a deep understanding and practice of perceiving the environment and adapting to or responding to changes in it.

Figure 1: Behavioral Artificial Intelligence Embodied Cognitive Logic Framework

Behavioral intelligence has always been the research goal of intelligent machines, and the initial definition of artificial intelligence is “the science and engineering of making intelligent machines.” Its core emphasizes that behavioral intelligence requires machines not only to process information but also to effectively interact with the physical world. In 1991, three professors from MIT published “The Embodied Mind: Cognitive Science and Human Experience,” marking the formal introduction of embodied cognition theory, which is considered a revolutionary paradigm shift in cognitive science. The embodied cognition theory posits that agents that can adapt well to their environment morphologically can quickly learn intelligent behaviors, emphasizing that cognition possesses embodied, contextual, and generative characteristics. Embodiment requires the system’s subject to perceive its physical environment and act within it; this is not limited to simple sensor data processing but also includes control of physical behaviors and feedback mechanisms. For instance, an embodied intelligent robot can not only perceive a doorknob but also understand how to use it to open a door. Contextuality requires the system’s subject to understand and adapt to the specific context of its operating environment. This understanding transcends the environment of single task execution and requires dynamic adaptation to environmental changes. Generativity emphasizes that the system’s subject continuously generates new data, knowledge, and solutions through interaction with the environment. This characteristic is key to distinguishing embodied intelligence from traditional computational models, as it operates not just under preset rules but also creatively solves problems when faced with unknown challenges. The development of generative capabilities means that intelligent systems can learn from experience and improve their behavior, thereby better adapting to the complex and ever-changing real world. The three aspects of embodiment, contextuality, and generativity together form the foundation of embodied cognitive intelligence, transforming intelligent machines from passive information processing tools into dynamic participants capable of actively understanding and influencing their environment.With the rapid development of artificial intelligence technology, multimodal large models have become important interfaces for machines to acquire knowledge and interact with the network and physical world. By integrating multi-source perception data, such as vision, hearing, and touch, large models can simulate human comprehensive perception functions, providing machines with a more comprehensive and integrated capability to understand the cognitive world. This “integration” of environmental information not only enhances the dimensionality and accuracy of the machine’s perception of the world but also allows machines to approach human natural behavior patterns more closely when receiving and processing information. Furthermore, multimodal large models can continuously update the robot’s internal knowledge base based on open-source social media information, enabling ongoing learning and evolution. This dynamic learning and updating mechanism grants robots the ability to quickly adapt and respond when faced with new situations and challenges. For example, by accessing real-time traffic information and social media data, autonomous vehicles can gain a global perspective of the road network, instantly adjust their route planning, avoid sudden traffic congestion and accidents, ensuring driving efficiency and safety. The applications of these multimodal large models are not limited to one-way processing of information; more importantly, they allow machines to demonstrate human-like problem-solving strategies when encountering issues. Machines can learn to generalize how to extract lessons from past experiences through memory and learn to creatively apply this knowledge in new situations through associative methods, thereby effectively solving problems. This enhancement of capabilities enables machines to autonomously perform more complex tasks, demonstrating higher intelligence and adaptability.As the main application carrier and validation platform of information technology, autonomous unmanned systems have made breakthrough progress in recent years under the support of new-generation information technologies such as artificial intelligence, the Internet of Things, and 5G communication. However, due to their large scale, heterogeneous equipment, and complex operating mechanisms, autonomous unmanned systems face multiple challenges when executing cross-domain collaborative tasks. Firstly, collaborative perception is fundamental; it requires different systems to accurately share and interpret environmental data. For example, in complex rescue operations, ground robots, drones, and other sensing devices must exchange precise geographic and environmental information in real-time to collaboratively conduct search and rescue. Secondly, communication efficiency is key to the timely execution of collaborative tasks. Communication delays or errors can lead to the failure of the entire operation, especially in time-sensitive tasks requiring precise synchronization. Resource allocation issues involve how to effectively distribute limited computational resources, energy, and time among multiple tasks and systems. Reasonable resource allocation can maximize the overall system’s work efficiency, avoid resource waste, and ensure priority for critical tasks. Again, conflict resolution is a challenge in cross-domain collaborative tasks for autonomous unmanned systems, especially when multiple systems or algorithms may have different or conflicting objectives. For instance, multiple autonomous unmanned systems may simultaneously need the same resource or have overlapping task paths in space; designing algorithms to efficiently avoid these potential conflicts is key to ensuring the smooth progress of collaborative tasks. Finally, collaborative decision-making capability is at the core of efficient team operations. This requires not only that each individual system possesses efficient decision-making capabilities but also that these decisions can be shared and coordinated within the group. An effective collaborative decision-making mechanism should ensure that all systems operate under common goals and strategies, maintaining decision consistency and adaptability even when environmental or internal states change.To address the aforementioned challenges faced by autonomous unmanned systems in cross-domain collaborative tasks, integrating multimodal large models to construct an embodied cognitive intelligence framework is particularly important.By integrating multimodal information from various sensors, such as visual, auditory, and tactile data, this framework aims to provide autonomous unmanned systems with a comprehensive view of environmental perception and scene cognition. Additionally, by connecting and accessing data from the Internet and other trusted sources in real-time, the system can continuously update its internal knowledge base, thereby improving the accuracy of scene cognition and the response speed of task allocation. The key to the embodied cognitive intelligence framework of autonomous unmanned systems lies in its support for multi-type heterogeneous devices to generate and interactively learn in a digital environment in real-time. This means that the system not only trains in digital scenarios but also continuously learns and adapts to the environment in practical operations, optimizing its behavior patterns and decision-making strategies based on new data and experiences. For example, by autonomously combining different machine learning algorithms, autonomous unmanned systems can identify which collaborative strategies are most effective for specific tasks, subsequently applying these strategies to similar situations to improve overall operational efficiency. Virtual testing environments allow systems to optimize their strategies through extensive pre-training before executing tasks to adapt to complex changing environmental conditions and task requirements. This technology will effectively enhance the efficiency of limited resource allocation and conflict resolution capabilities, enabling the system to make optimal decisions that align with overall system goals when facing resource competition or conflicting objectives. Ultimately, the embodied cognitive intelligence framework will strengthen the cross-domain collaborative capabilities of autonomous unmanned systems, allowing them not only to complete tasks independently but also to collaborate with other manned/unmanned systems, sharing information and decisions to effectively address environmental uncertainties and task complexities. By embedding multimodal large models to construct the embodied cognitive intelligence framework of autonomous unmanned systems, these systems will be better equipped to face the challenges of cross-domain collaborative tasks, achieving a higher level of autonomy and adaptability.1 From Embodied Intelligence to General Artificial Intelligence

The idea of embodied intelligence can be traced back to mid-20th century behavioral artificial intelligence research, which primarily focused on how machines acquire and develop intelligence through direct interaction with the environment. This early research was significantly influenced by the behavioral theories of psychologists such as Bruner and Skinner, whose core idea posited that learning primarily occurs through the results of actions and feedback, rather than mere observation or instruction. During this period, behavioral artificial intelligence research emphasized the importance of “behavioral interactive learning,” where agents must learn through actual operations rather than merely acquiring skills through observation; true intelligence arises from dynamic interaction with the physical world and continuous practical activities. Guided by this idea, early roboticists and AI researchers sought to develop machines capable of autonomously exploring their environment and learning through trial and error. Among the most representative is the Shakey autonomous mobile robot developed by Nils Nilsson and others at the Stanford Research Institute from 1966 to 1972. In this project, the robot had to learn to find the shortest path through multiple attempts, a capability that could not be directly acquired by passively watching videos or mimicking other walkers. Through this practice, the robot not only learned how to complete specific tasks but also developed strategies for solving new problems, thus effectively applying its learned skills in broader environments.

Compared to “behaviorist” artificial intelligence, “symbolist” artificial intelligence emphasizes abstract and symbolic processing of information, while “connectionist” artificial intelligence focuses on mimicking the neural connection mechanisms in the brain to construct audiovisual cognitive computation methods. As an extension of behavioral artificial intelligence, embodied intelligence relies more on the actual operations and behavioral interactions of agents in specific scenes, emphasizing that agents must be capable of direct interaction with the physical world. This includes not only the ability to perceive the environment but also the ability to perform physical operations within that environment. For example, an embodied intelligent robot can perceive surrounding obstacles and physically navigate around them to perform tasks, rather than simply simulating this behavior in a computational model. Contextuality refers to the fact that the agent’s behavioral operations are inseparable from the specific environment and context in which it operates. This behavioral intelligence goes beyond simple environmental perception and cognition, extending to the ability to adaptively adjust the agent’s behavior based on different environmental conditions. This requires the agent to operate not only in static or highly controlled environments but also to make effective decisions and actions in the continually changing real-world context. Generativity describes the agent’s ability to generate new knowledge and solutions through interaction with the environment. This characteristic allows embodied intelligence to creatively adapt and solve problems when facing unknown or unprogrammed situations. Agents continuously learn through practice and experience, enabling them to independently generate solutions beyond their knowledge base.Large models (LMs), autonomous unmanned systems (UASs), and artificial intelligence content generation technologies (AIGC) are expected to jointly shape a new embodied cognitive intelligence architecture (Figure 2). By combining “(large models) mind – (autonomous unmanned systems) body – (AI-generated) environment,” the autonomous unmanned systems can gain the ability for autonomous evolution through continuous learning and adaptive environmental changes. Large models, such as GPT and PaLM, learn language and cognitive world models and their laws by processing large-scale datasets, providing autonomous unmanned systems with a cognitive core akin to human “mind,” enabling them to naturally understand human language and emotions, thereby enhancing communication efficiency to collaboratively complete social interactions and execute tasks. The rapid development of autonomous unmanned system hardware provides an “execution carrier” for these “mind models.” When various types of unmanned systems are equipped with advanced sensors and actuators, they can not only collect environmental data in real-time, providing real-time environmental input for their “mind/decision system”; they can also enhance their analytical and understanding capabilities of environmental situations through access to large models, efficiently coordinating and organizing complex physical tasks through natural language interaction. Artificial intelligence content generation technology provides a rich digital “virtual training ground” for the cognitive abilities of the “mind” and “body” of autonomous unmanned systems, allowing them to conduct complex social interactions and task simulation training that they have never experienced in the physical environment. AI-generated content and tools not only provide autonomous unmanned systems with a virtual training testing ground containing infinite tasks but also allow humans to interact with robots in a natural language manner, thereby testing and enhancing the environmental cognition and embodied execution capabilities of autonomous unmanned systems while maintaining physical safety. Therefore, this article proposes an integrated “computing-control-testing” embodied cognitive intelligence framework that combines large models, autonomous unmanned systems, and AI-generated content, forming an embodied cyclical cognitive process of autonomous unmanned systems (Figure 2). This architecture not only strengthens the autonomous unmanned systems’ learning and adaptability to unknown changing environments but also expands their cognitive boundaries of the physical world through task training in virtual training grounds, enabling autonomous unmanned systems to better understand and respond to complex environmental and task requirements. Compared to existing intelligent frameworks, the embodied cognitive intelligence framework proposed in this article focuses on the many challenges faced by autonomous unmanned systems in cross-domain collaborative tasks, improving the quality and efficiency of cross-domain collaborative tasks and enhancing system safety performance.

Figure 2: Embodied Cyclical Cognitive Process of Autonomous Unmanned SystemsAutonomous unmanned systems, by integrating various sensor data from the social-physical-information domain and large models, enable themselves to perceive physical environments and social interaction changes, responding to environmental changes based on their task needs, i.e., adaptively executing specific tasks/behaviors. In practical applications, the combination of embodied intelligence with large models and autonomous unmanned system technologies is greatly unleashing the immense production and labor potential of machines. For example, Google’s PaLM-E large model integrates the 540B parameter PaLM and the 22B parameter Vision Transformer (ViT), merging text and multimodal data from the robot’s own onboard sensors, including images, robot status, and working scene data, enabling the robot to not only understand the specific layout of its surrounding environment but also to self-adjust based on real-time changing human needs and physical obstacles, thus allowing the machine to flexibly execute picking and handling tasks. With the further development of general artificial intelligence technology, embodied intelligence will enable autonomous unmanned systems to better adapt to human work and living environments, becoming new types of laborers and achieving collaborative work with human labor. This development trend will reshape the productivity ecology in various aspects, including society, economy, and industry.2 Challenges of Cross-Domain Collaborative Tasks for Autonomous Unmanned Systems

Driven by the strategy of becoming a technology powerhouse, the trend of intelligent and networked autonomous unmanned systems has become evident and their applications are increasingly widespread, with cross-domain collaborative operations across “air, land, and sea” becoming routine tasks. Taking the Russia-Ukraine conflict as an example, both sides have invested heavily in new air-sea land autonomous unmanned devices, collaboratively executing complex military tasks, greatly reducing personnel casualties and shortening military decision-making cycles. At the beginning of the war, Ukraine was the first to use drones for battlefield reconnaissance and surveillance, real-time intelligence collection, mutual military capability assessment, artillery fire guidance, and close-range to long-range strikes; in October 2022, the first unmanned boat operation occurred between Russia and Ukraine, with the Ukrainian military using 7 unmanned boats and 9 drones to raid Sevastopol port, attacking the Russian Black Sea Fleet stationed there; Russia also deployed its self-developed “Marker” unmanned combat vehicle for the first time in this war, capable of executing various ground combat tasks, including anti-armor operations. However, currently, collaborative tasks of autonomous unmanned systems primarily focus on air-sea and air-land collaborations, and cross-domain collaborative combat capabilities still need to be improved.

When executing cross-domain collaborative tasks, autonomous unmanned systems face a series of severe challenges. Firstly, full-space cross-domain autonomous unmanned systems are often large in scale and highly heterogeneous, involving various types of robots and equipment, each with its unique operating mechanisms and operational standards. This heterogeneity leads to the coexistence and interaction of multiple operating rules within the system, complicating the prediction and control of the entire system’s behavior. Additionally, these systems often contain numerous parameters and variables, which generally exhibit strong coupling and nonlinear relationships, coupled with environmental uncertainties and multi-level decision-making requirements, making it exceptionally difficult to establish a universal mathematical model for such systems. At the same time, frequent sensor device failures lead to discontinuous data acquisition or difficulty in detecting data quality through traditional methods, increasing operational uncertainties and risks. In practical operations, this complexity often manifests as high demands for environmental adaptability and decision-making, as systems need to maintain efficient operation under continuously changing or even harsh environmental conditions. These challenges also represent the current technical bottlenecks limiting the application development of autonomous unmanned systems.Collaborative perception is the foundation of cross-domain collaborative tasks for multi-source heterogeneous autonomous unmanned systems.Harsh environmental conditions often lead to unstable device perception, data and modality loss, etc., meaning that in certain situations, necessary perception information (such as visual or auditory data) may be incomplete or completely absent due to obstructions, shielding, or other environmental factors. For example, in fog or smoke, the effectiveness of visual sensors is greatly diminished, leading to a sharp decline in the availability of visual information. Secondly, dynamic changes in the environment, such as seasonal changes, weather variations, or other dynamic factors in the scene, can also cause changes in the modality correlation of perception data. For instance, tasks performed in summer may need to handle lighting and temperature conditions completely different from those in winter, which may affect the stability of perception devices and their cognitive and data processing efficiency regarding the operational environment. Intense environmental noise is also a significant issue. In noisy environments, such as those with high wind speeds or mechanical vibrations, the data collected by sensors may be interfered with, affecting the accuracy and reliability of the data. This situation is particularly severe in collaborative operations, as different devices or robots may rely on each other’s perception data for precise location positioning and task coordination. These challenges require cross-domain collaborative perception to possess high adaptability and robustness to ensure that autonomous unmanned systems can maintain high efficiency and accuracy even in extreme or adverse conditions.Cross-domain networking communication is key to the effective collaborative operation of autonomous unmanned systems, but the performance in complex interference environments needs improvement.In particular, many unmanned equipment systems currently rely on remote control as the primary method, which places higher demands on networking communication. However, military operational areas are often in strong interference environments, greatly affecting the stability of communication links. In mountainous areas, deserts, densely built urban areas, or industrial environments with many interferences, communication equipment must adapt to extreme temperature changes, physical obstructions, electromagnetic interference, etc., to maintain stable data transmission. Secondly, the physical characteristics of the propagation medium greatly affect the transmission distance of signals and the reliability of information. For example, under the interference of thick walls or metal structures, wireless signals are prone to attenuation, reducing the communication range and efficiency. Communication issues in underwater or underground environments are even more complex; traditional radio communication methods are difficult to apply, requiring the use of sound waves or other special technologies to adapt to the medium’s limitations on signal propagation characteristics. Additionally, communication systems must also address compatibility issues with different communication standards and protocols. In cross-domain collaborative environments, various devices may use different communication technologies and protocols, requiring the system to achieve efficient interoperability and data integration to ensure smooth collaborative operations. As task complexity increases, the bandwidth and data processing capacity demands of communication systems also rise. The real-time transmission and processing of large amounts of data, especially when using high-resolution sensors and executing complex decision-making algorithms, place higher demands on communication bandwidth and latency. Currently, the complex coupling relationships and the unknown, unquantifiable mutual influences between subsystems pose significant challenges for cross-domain collaborative operations of autonomous unmanned systems.Efficient resource scheduling, reliable task allocation, and conflict resolution are challenges in cross-domain collaborative tasks for autonomous unmanned systems.Firstly, the multi-type heterogeneous perception units within the system pose significant challenges to resource utilization. Different types of perception units—such as radar, laser scanners, cameras, etc.—have varying demands for processing power, energy, and bandwidth. In resource-limited environments, optimizing resource allocation for these heterogeneous units to ensure that each unit can operate effectively without interfering with one another is key. In large-scale collaborative operations, tasks must be allocated swiftly and accurately to suitable units for execution; any delay may lead to a decrease in overall operational efficiency. This requires the system not only to have efficient task scheduling algorithms but also to possess the ability to process large amounts of information and make rapid decisions. The complexity of tasks presents challenges to the optimality of scheduling system results. In multi-task and multi-objective operational environments, ensuring that the allocated results meet the specific needs of each task while achieving overall optimal or near-optimal outcomes is a highly complex issue. This includes balancing the priorities of short-term and long-term goals, adjusting resources between urgent and routine tasks, and effectively achieving load balancing within the system. Furthermore, conflict resolution is also an aspect that cannot be ignored in resource scheduling and task allocation. Resource competition or objective conflicts may occur between multiple tasks or units, such as multiple drones competing for the same communication channel or airspace at the same time. Effective conflict resolution strategies need to identify these potential conflicts in real-time and propose solutions to avoid interference and efficiency loss during task execution.In collaborative tasks of multi-type heterogeneous autonomous unmanned systems, how to effectively combine and integrate various perception information to make efficient decisions is crucial.Firstly, various perception devices such as visual sensors, radar, infrared sensors, etc., come from numerous sources, and the types and formats of data they capture differ greatly. Extracting useful information for decision-making from this vast and diverse data, quantifying its utility, and evaluating its effectiveness is the foundation for achieving effective collaborative tasks. This not only requires efficient data processing algorithms but also complex data fusion technologies to ensure the integrity and timeliness of information during the decision-making process. In practical applications, due to communication interruptions, limitations of perception devices, or external environmental influences, the collected data may exhibit temporal discontinuities or incompleteness, and issues of temporal discontinuity and fragmentation further complicate collaborative decision-making. Decision-makers need to make judgments based on incomplete or interrupted data streams, increasing the computational and analytical burden. At the same time, different units in heterogeneous systems may adopt different decision-making logics and processing algorithms, making the decision-making dimension even more complex. This necessitates highly optimized collaborative decision-making frameworks and algorithms to accommodate various types and capabilities of system units. In summary, the challenges faced by collaborative decision-making in multi-type heterogeneous autonomous unmanned systems are multifaceted, involving data processing, information integration, and the complexity of decision-making. Solving these issues requires not only advanced technologies and algorithms but also relies on the overall optimization of system design to ensure rapid and accurate decision-making in complex and variable environments.3 Embodied Cognitive Intelligence Framework for Cross-Domain Collaborative Tasks of Autonomous Unmanned SystemsHumans expand their cognitive boundaries through exploration of the unknown, and their ability to understand things is limited by their own experience, knowledge, memory, and computational capabilities. Under the constraints of bounded rationality, humans optimize individual behavioral decisions through continuous trial and error, appropriate fault tolerance, and timely corrective actions, organizing, coordinating, or executing actions within a specific workspace to achieve desired goals. As an extension of behavioral artificial intelligence, the research goal of embodied intelligence is to enable machines to recognize and act like humans. In the “computing-control-testing” collaborative embodied cognitive intelligence framework, any intelligent entity capable of autonomous action consists of at least a cognitive decision carrier (mind) and an action execution carrier (body), executing tasks in a specific operational space (environment).In recent years, with the rapid development of generative artificial intelligence technologies, various large model tools have emerged and garnered widespread attention from leading technology companies and academic research institutions worldwide.On one hand, large models endow unmanned systems with powerful single-machine decision-making capabilities, and their inherent natural interaction capabilities allow members within unmanned systems to communicate, coordinate resources, and share strategies more efficiently through “dialogue-style group chats”; on the other hand, the multi-source sensor data collected by unmanned systems themselves provides the decision-making basis for large models to understand the environment and execute tasks, laying the foundation for real-time data-driven control of unmanned system behavior. Artificial intelligence content generation technology provides ample “training data” for the embodied cognitive intelligence of autonomous unmanned systems, enabling the systems to explore the boundaries of cognition at lower costs and allowing them to optimize their behavior without the physical limitations and risks of the real world. In virtual testing environments, the training tasks of autonomous unmanned systems can be described through a set of semantic definitions, with each semantic entity capable of creating new tasks through searching, combining, and other operations within the task semantic space. The arrangement and combination of properties such as the number, type, and state of semantic objects will allow the involved semantic objects (unmanned systems) to be trained in a specific semantic space (a virtual environment composed of data generated by AIGC), enabling them to reach the capability level required for executing tasks in real physical space. This setup allows autonomous unmanned systems to “experience” and “learn” various scenarios in simulated environments, accumulating experience and exploring the limitations of perception accuracy, computational resources, communication networking, and task execution, thereby reducing the costs of physical experiments while applying the knowledge and rules learned through generative content training to specific tasks in the real world.Therefore, this article proposes an integrated “computing-control-testing” embodied cognitive intelligence framework that combines “large models + autonomous unmanned systems + artificial intelligence content generation technology” (Figure 3). This architecture aims to fully leverage the powerful data processing capabilities and natural language interaction capabilities of large-scale machine learning and pre-trained models, the behavioral operation capabilities of unmanned system mechanical bodies, and the scenario-based task content capabilities of AI-generated content to achieve autonomous unmanned systems with adaptive capabilities to complex environmental changes, verifiable capability boundaries and failure conditions, and quantifiable task completion evaluations. In this architecture, large models serve not only as the “computational brain” of collaborative cognitive intelligence, processing complex organizational, coordination, and decision-making tasks, but also continuously optimizing their cognitive patterns through ongoing trial and error and learning. The mechanical body interacts directly with the physical world, executing specific tasks such as transporting, maintaining, or navigating, verifying whether the strategies tested on AI-generated content can achieve the expected effects.

Figure 3: The “Computing-Control-Testing” Embodied Cognitive Intelligence Framework of Autonomous Unmanned SystemsIn particular, to address the potential issues of artificial intelligence-generated content violating the basic laws of the real world, leading to the insecurity, instability, and waste of time and resources in embodied intelligent systems, this framework constructs a human-machine hybrid intelligence for autonomous unmanned systems through a human-in-the-loop approach, incorporating human expert knowledge and experience to enable the systems to discern reality from illusion, identify truth from falsehood, and engage in counterfactual thinking. Firstly, the ability to discern reality is a basic requirement for autonomous unmanned systems during actual task execution, as the autonomous unmanned systems under this framework need to seamlessly switch between AI-generated virtual content and the real environment perceived in the physical world. This ability enables the system to correctly interpret and respond to simulated data in the virtual training environment while effectively applying learned skills and strategies in the real world. Secondly, the ability to identify truth is the foundation for ensuring the reliability and safety of information exchange and operation in autonomous unmanned systems. The systems not only need to process information from different sources but also must evaluate the reliability and authenticity of this information. In practical applications, unmanned systems may encounter various cyber attacks and be influenced by misleading information or erroneous data generated by faulty sensors. Only systems capable of discerning the truth can avoid being misled by erroneous information, ensuring the accuracy of decisions. Finally, the ability to engage in counterfactual thinking allows unmanned systems to consider various alternative options and scenarios when faced with decisions, representing a higher cognitive function. For example, during the decision-making process, the system can use large models to integrate multi-source data to infer the potential consequences of different decision choices, quantifying the evaluation of decision effects and potential losses, thereby making optimal decisions under conditions that align with real physical scenarios and resource limitations. These capabilities not only enhance the operational safety and effectiveness of the systems but also ensure that they can work stably and reliably in the complex and variable real world and the strictly controlled virtual environments.4 ConclusionThis article discusses the immense challenges faced by autonomous unmanned systems in collaborative tasks within complex dynamic and time-varying environments, proposing an integrated “computing-control-testing” embodied cognitive intelligence framework that combines “large models + autonomous unmanned systems + AI-generated content” based on the research and understanding of behavioral artificial intelligence. This architecture not only optimizes the systems’ real-time interaction, collaborative decision-making, and autonomous execution capabilities but also significantly enhances their adaptability to environmental and task changes. By simulating and training complex tasks on directed generated content (images, voice, video, etc.), the systems can test and refine their strategies without the risk of physical damage, which is key to improving the precision and efficiency of system behavior operations. In the future, this technological combination is expected to drive autonomous unmanned systems toward the development of more universal general intelligent machines. As technology matures and applications expand, this framework is foreseen to demonstrate unprecedented flexibility and effectiveness across more industrial fields, such as automated driving, remote medical care, disaster response, and intelligent manufacturing. Furthermore, this integrated system architecture can better understand and respond to complex and variable human needs and environmental challenges, thereby achieving more personalized services and interactions while improving operational safety and efficiency. This framework is not only a demonstration of system innovation but also signifies the tremendous potential of the combination of artificial intelligence and autonomous unmanned system technologies in enhancing industrial efficiency and generating new qualitative productivity. It will profoundly change the way technological innovation occurs and have a far-reaching impact on social, economic, and cultural levels, enabling future autonomous unmanned systems to serve the development of human society and national security stability in a more prepared, humanized, and intelligent manner.

Reprint Statement

This article is sourced from publicly available materials on the internet or has been authorized for reprinting through the original public account’s whitelist. The copyright belongs to the original author. The content of the article does not represent the views of this platform or its responsibility for its authenticity. The purpose of reprinting is to convey information and for online sharing. If there are any copyright issues, please contact us promptly, and we will delete it as soon as possible.

Leave a Comment Cancel reply