| Abstract | Human digital twin (HDT) technology can present human physiological characteristics in a digital form, but in the field of social communication and marketing, due to its lack of consciousness and autonomous linguistic and behavioral processing ability, it can not play an effective role in improving user experience. The Large language model (LLM) represented by ChatGPT has been proved to have strong applicational value. The agent enabled by AIGC can deeply integrate with digital twin to form a new form of digital avatar with personalized self-awareness and behavior – Agent human digital twin (AHDT). Under the technical support of personalized model construction and generative AI, the level of similarity in physical and behavioral condition of the Agent human digital twin with human beings will be significantly improved. It will not only have a series of internal and external characteristics of human beings, but also optimize of user experience in the whole process from sensory, interaction to emotion in the process of communication with users, and build a new social relationship between human and agent.
| Keywords | Agent human digital twin Physical similarity Behavioral similarity User experience
Research Background of the Problem
When we are still discussing the technical characteristics and application value of digital twins, the emergence and rapid development of AIGC (Artificial Intelligence Generated Content), cognitive computing, and affective computing have prompted digital twins to evolve into “intelligent agent digital twins.” Compared with digital twins that have achieved external simulation and anthropomorphic images, intelligent agent digital twins have made significant leaps in internal thinking and emotional capabilities. In JD Live, the AI digital person based on Liu Qiangdong as a prototype—JD “No. 1 Procurement” Dong Ge can not only skillfully introduce various products, direct the live broadcast process, and guide netizens to comment, but also reply to netizens’ comments impromptu, interacting with users through cognitive and thinking abilities. Today, generative artificial intelligence (GAI) has the ability to creatively generate, control, and improve valuable multi-dimensional data, producing AI-generated content. This technology combined with digital twins has become the latest solution in many scenarios such as e-commerce live streaming and intelligent assistants.
Previous research has found that avatar representation (i.e., real-time controlled digital representation of users) can influence how users interact with others and affect behavior in immersive virtual environments. Intelligent agent digital twins, with their superior anthropomorphic characteristics in appearance and behavior, may achieve deep interaction with physically real human beings in future agent social interactions, which is reflected not only in sensory aspects but also in emotional aspects.
With the deepening development of user-centered concepts, the importance of user experience in product and service strategy operations has become a consensus in society, serving as an important indicator for measuring the utility of products or services. Moreover, providing a higher quality user experience has become part of differentiation strategies, helping to win new users, increase market share of products or services, and alleviate pricing pressures. Many industries today face bottlenecks in enhancing user experience. In terms of providing artificial services, firstly, to establish smoother language and emotional connections with users, companies have to increase their human investment in positions such as live broadcasters and artificial assistants, which compresses their profit margins; secondly, the user experience achieved through manual labor has uncertainties, including physiological limitations on service duration, public relations risks arising from moral cognition, and difficulties in standardizing image management, all of which significantly affect user experience, thereby impacting customer satisfaction and even the fate of the entire enterprise. Addressing these issues has led to the emergence of digital virtual humans and digital twins. In the application of digital virtual humans, due to the expansion of industry scale and insufficient technological development, the user experience standards provided by digital virtual humans have become increasingly rigid, with a serious degree of homogeneity, which is not conducive to the sustainable development of enterprises and brands. Although digital twins can partially achieve anthropomorphism and personalization of sensory experiences, their internal cognition is limited to the output of database information and the programmatic execution of preset algorithms, highlighting a strong tool-like attribute that restricts further enhancement of user experience. Clearly targeting differentiated operations for user experience and related end-user value propositions requires a comprehensive understanding of how people interact with technology. Intelligent agent digital twins, based on breaking through the physiological bottleneck of real people, achieve deep evolution in anthropomorphic similarity in appearance and behavior, combining emotional computing and cognitive computing with AIGC intelligent generation interaction content. Users will experience a different interaction experience with intelligent agent digital twins through sensory and emotional multimodal pathways compared to traditional digital twins and digital virtual humans, which may have a significant impact on enhancing user experience and product service satisfaction.
Literature Review
2.1 Digital Twins
Digital Twin is a virtual replica of a physical object, process, or system, a computerized model that simulates the behavior, performance, and characteristics of physical objects in real time. Many internationally renowned companies have begun to explore the application of digital twin technology in product design, manufacturing, and services. In the healthcare field, many related studies have explored the application of digital twin technology combined with human physiology and psychology, leading to the creation of a new concept known as Human Digital Twin (HDT). They generally believe that digital twins can achieve personalized medical services, making it a necessary and promising technology. As more and more industries adopt digital twins as a solution to compensate for the existing shortcomings of virtual digital humans, the definition of digital twins has been enriched and refined. Wang and Zhou believe that digital twins are correspondents of humans reflecting multi-dimensional information, thus achieving two-way interaction between the physical world and the cyber world. It is a model or database of humans that can digitally depict specific objects (such as physical models and physiological models) using key information. Hafez believes that the role of digital twins is to provide a complete record of various interactions in the entire human-artificial intelligence space and to identify all possible patterns that emerge in these interactions. Wei defines digital twins as replicas or correspondents of real people in the real world existing in cyberspace. Numerous literatures have provided specialized definitions of digital twins from a technical perspective, combining application categories in their respective fields, which can be patients, doctors, workers, etc. However, Xu Ruiping et al. provided a simple yet accurate definition, stating that applying digital twin technology to real people is a “digital twin,” which can liberate people physically, mentally, and in terms of boundaries.
2.2 Intelligent Agent Digital Twins
With the embodiment of artificial intelligence development, the concept of agents, which once remained in philosophical thought, has undergone a transformation in form and concept. In artificial intelligence research, agents are defined as entities that display intelligent behavior and possess qualities such as autonomy, reactivity, proactivity, and social ability. Agents aim to enable computers to understand users’ interests and autonomously represent them in actions. This can refer to biological agents or software and hardware entities with certain cognitive abilities. The cognition of agents can be divided into abstract and embodied levels. In fact, intelligent agent digital twins, as abstract digital forms, hope to achieve embodied perception and experiential effects for users. The widespread application of large language models, especially ChatGPT, has allowed researchers in the field of artificial intelligence to see its value for agents and to construct integrations. LLM-based agents use large language models as the main component of their brain or controller, expanding their perception and action space through multimodal perception and tool utilization strategies, and achieving reasoning and planning abilities through techniques such as chain of thought (CoT) and task partitioning. Agents, as the information hub like a brain, can provide continuous power for the intelligence of digital twins.
From their concepts and practical applications, digital twins and human digital twin technologies focus on the digital presentation of material subjects, with user interaction effects remaining at the anthropomorphism of sensory experiences and the application purpose of solving problems, where tool attributes outweigh social attributes. With the continuous advancement of artificial intelligence technology, the empowering role of generative artificial intelligence on digital twins has become more evident. Generative artificial intelligence (GAI) is a model trained on a large amount of data in a specific field (such as images and text) to generate similar data. It is driven by a series of underlying technologies represented by large language models (LLM), particularly deep learning technology at its core, which has produced promising results in various natural language understanding tasks, especially topic classification, sentiment analysis, question answering, and language translation. Thanks to the empowerment of generative artificial intelligence, digital twins, primarily with tool attributes, have developed into a new form—intelligent agent digital twins.
Intelligent agent digital twins are a new type of anthropomorphic digital twin formed by the organic integration of “agent” and “digital twin,” achieving qualitative leaps in both internal and external aspects. Beyond providing intelligent agent digital twins with multimodal output capabilities, generative AI also enables intelligent agent digital twins to possess internal perceptual intelligence, merging external anthropomorphic simulation with internal cognitive computing, emotional computing, and artificial intelligence empowerment to create social intelligent agents. Compared to traditional digital twins, intelligent agent digital twins exhibit different levels of simulation development in thinking (cognition), emotion, and behavior.
2.2.1 Human Thinking (Cognition)
Data from physical and virtual spaces, including collected data, data generated by AIGC models, simulated data, historical data, etc., are large and complex, thus efficient and reliable data management is essential. Intelligent agent digital twins can utilize diffusion models and generative adversarial networks (GAN) for preprocessing, which can be used for further thinking training and cognitive expansion. In addition, intelligent agent digital twins can perform logical and regular reasoning on the input content based on trained AIGC models, ultimately completing output tasks. In fact, the human feedback reinforcement learning (RLHF) used in the globally popular chat generation pre-training model ChatGPT implicitly combines human experience and knowledge. This cognitive expansion process enables intelligent agent digital twins to deepen their understanding of the real world and output content that conforms to the laws and paradigms of the material world, presenting a highly realistic and anthropomorphic thought process.
2.2.2 Human Emotion
Affective computing and sentiment analysis are both terms related to the computational interpretation and generation of human emotions or feelings. The former primarily concerns instantaneous emotional expression, usually related to voice or image/video processing, while the latter mainly pertains to long-term opinions or attitudes, typically related to natural language processing. End-to-end deep learning frameworks and deep convolutional neural networks (DCNN) have been proven to be more effective for emotion recognition and sentiment analysis. Research shows that in terms of accuracy in emotion classification, large language models (LLMs) represented by ChatGPT can not only compete with traditional transfer learning methods but can also exceed them in certain cases. Intelligent agent digital twins can utilize the “emotional twin digital person” architecture to achieve emotional modeling, emotional perception, emotional encoding, and emotional expression, bridging the emotional gap between humans and machines, truly realizing “agent socialization.”
2.2.3 Human Behavior
Human behavior includes core intrinsic values and the external expressions triggered by them, where external expressions are multimodal. The external expressions of intelligent agent digital twins in interactions mainly include visual (facial actions and body movements) and auditory (voice forms and language content) presentations. In fact, regarding the core intrinsic values, intelligent agent digital twins, empowered by large language models, can achieve proactive learning of external data content. However, the urgency of the value alignment problem is becoming increasingly prominent with the accelerated iteration of generative intelligence. The value alignment problem arises from the disconnection between what artificial intelligence does and what we want it to do, especially in aspects such as universally recognized values. On the other hand, large models serve as technological intermediaries for intelligent agent digital twins to simulate cross-cultural dialogue, providing possibilities for “value simulation,” including value integration and value conflict, allowing humans to evaluate and find the best state, thus enhancing social experiences.
In terms of multimodal external expressions, intelligent agent digital twins fully utilize the audio-text-image intelligent generation capabilities empowered by large language models, directly mapping input sequences of acoustic features to visual feature sequences. The basic method for generating facial images, 3DMM (3D Morphable Models), is constructed based on convolutional neural networks (CNN) and generative adversarial networks, which can combine average faces, shape coefficients, and expression coefficients to compute and reconstruct a three-dimensional face, whose features can change with the variations in language and text. Moreover, researchers have recently created voice-operated character animation (VOCA) to achieve 3D facial animation by capturing the speaker’s style and generating emotional facial features; the latest proposed EGAME can generate full-body human gestures from audio and masked gestures, including facial, partial body, hand, and global movements. In terms of voice, various voice cloning technologies have trained real human voice datasets through model construction, generating digital voices that closely resemble real human voice characteristics in the physical world. The recent popular videos of “AI Sun Yanzi” and “AI Jay Chou” on Bilibili are typical applications of this technology. Therefore, the combination of intrinsic values and external visual and auditory simulation greatly enhances the effectiveness of intelligent agent digital twins in presenting human behavior, providing a technological foundation for achieving human-machine socialization, or even the future of “carbon-based/silicon-based symbiosis.”
2.3 Image-Behavior Similarity
Visual appearance plays a significant role in impression formation, whether online or offline. The effectiveness of human-machine interaction largely depends on the similarity in appearance and behavior between the machine and real humans, as this can impact users’ self-expression and emotional experiences. The anthropomorphic similarity in visual aspects significantly influences the social organization of digital avatars, which is a key issue in understanding the cognitive processes used to form social interactions in the virtual world. Research shows that in the healthcare field, having a visually similar appearance may strengthen social bonds among peers sharing various health and wellness issues. Beyond cognition, the visual similarity of virtual avatars can also trigger actual intrinsic behaviors in users, as avatars with higher similarity may gain relatively higher levels of trust. In conditions using self-avatars and face-to-face interactions, participants can complete tasks faster and experience better and more efficient outcomes when cooperating than when competing.
Schultze, in studying the sense of presence of self-avatars, used the criterion that “people’s behavior in virtual environments is the same as their behavior in similar real-world environments” as one of the metrics, thus using behavioral similarity as one of the measurement standards for human-machine interaction effects. The design elements of behavioral similarity include control ability, interaction objects, and interaction methods. Studies have shown that higher avatar-user similarity generally leads to higher task engagement. In fact, if the behavior of digital twins presents mechanized and programmatic characteristics, users will have a stronger tool demand for them; when digital twins output content with more human-like and realistic wording and tone, the social effects of agents will increase significantly, greatly enhancing users’ social experiences.
2.4 User Experience
The concept of user experience (UX) has been widely proposed in recent years with the development of the internet and the rapid evolution of interactive experience platforms, and its definition and content have shown diversified characteristics. User experience tends to encompass a broader range of human experience dimensions (such as pleasure, fun, and other emotions), and may also have temporal or longitudinal components. It includes the connotations of emotions and perceptions, involving a continuous feedback loop that recurs throughout the entire usage lifecycle. Efforts to enhance user experience can improve user retention and brand loyalty. Chen Juan et al. believe that user experience refers to the different experiences generated when individuals interact with products, including the extent to which users feel their needs are met, the meanings they attribute to products, and the feelings and emotions generated in the process. Significant influencing factors include usefulness, emotion, and user value. Additionally, the APEC model of user experience includes aesthetic, practical, emotional, and cognitive dimensions; the user experience honeycomb model includes six dimensions: usefulness, usability, satisfaction, findability, credibility, and accessibility; Yu Guoming believes that user experience emphasizes the total experience established by users during the use of products or services, encompassing cognition, emotion, and attitude aspects. From the perspective of cognitive neuroscience, the perceived factors affecting media user experiences can be divided into three dimensions: usefulness, ease of use, and satisfaction, and explained in terms of sensory experience, interaction experience, and emotional experience. Based on this study’s focus on the interaction effects of the image-behavior similarity of intelligent agent digital twins, image similarity and behavior similarity are closer to users’ sensory and interaction experiences during usage, and these two experiences ultimately lead to enhanced emotional experiences and satisfaction. Therefore, this study will analyze the impact of the image-behavior similarity of intelligent agent digital twins on user experience using sensory experience, interaction experience, and emotional experience.
The Transformative Effects of Image-Behavior Similarity in Intelligent Agent Digital Twins
3.1 Image Similarity: Internal and External Image Multimodal Deep Simulation Fusion
Similar to real humans, when we talk about the image of intelligent agent digital twins, we think of both their internal persona and external presentation. Many use cases of large language models create a convincing feeling of being with a human-like conversational partner through displaying a variety of behaviors and dialogues. By pre-training and setting large language models, we can intervene in the language generation context of the machine entity, which is its internal image; thus, its external image is essentially the continuation of content generated in line with the training data. Intelligent agent digital twins are essentially avatars of real people in the real world, and all their data is a digital extraction of real human performances. When constructing intelligent agent digital twins, we need to extract and encode the internal and external images of real people, achieving mirror transfer through the logic of data language, which is the foundation for achieving deep simulation fusion.
The technology for achieving external image similarity has already matured, and the restoration degree of digital twins in terms of visual and auditory aspects compared to real humans is relatively high. It is sufficient to record one’s body, face, and voice in a well-lit and quiet environment, and through the backend system’s capture of sample features and element fusion, a relatively consistent external image with a real person can be generated. However, practical experience shows that due to differences in environmental quality during real person recording and the quality of recorded video or audio, generated virtual images can exhibit modeling flaws such as edge blurriness, insufficient detail, facial feature deformation, and inadequate voice texture. The rapid development of generative artificial intelligence’s text-to-image and video capabilities in the past six months has significantly improved the quality of generated effects, especially in realistic works that can achieve a high degree of simulation. Empowered by generative artificial intelligence, intelligent agent digital twins can automate the supplementation and optimization of flaws during the external image modeling phase, improving the quality of image generation and enhancing the realism of generated works.
The internal image is invisible, but it possesses rich external expressions. This provides a feasible path for intelligent agent digital twins to simulate the internal images of real people, namely, constructing their internal persona characteristics through logical reasoning from the outside in, and ultimately generating external image presentations with distinct personal characteristics based on the constructed persona characteristics as the background for text generation. Researchers studying agents have almost all created their model representations, and each of them must link these models with animation and voice synthesis components, indicating a strong mapping and functional relationship between the internal and external images of agents. With the popularization of large language models like ChatGPT and the convenience of operation, people can set and modify the image construction of intelligent agent digital twins in textual form. Although the empowerment of agents gives them more autonomous reflective abilities and goal achievement capabilities, it is undeniable that the foundational image behind post-training machine learning is essential. Therefore, we can foresee that intelligent agent digital twins empowered by large language models will achieve deep simulation fusion in the pre-intervention and divergent generation stages of internal and external images, improving the alignment of the internal and external images of digital twins, responding to each other, and achieving a more logical and humanized overall image.
3.2 Behavioral Similarity: Personalized Models + Emotional Chain Reinstating Subject Presence
Behavioral similarity here is not merely about the similarity of actions and postures, but rather the triggering mechanisms of any behavior of intelligent agent digital twins that are similar to the actual behavioral logic mechanisms of the real humans they represent. This mechanism is expressed in the form of generative logic presented after autonomous learning by the machine, but at the same time, it raises a question: how can each intelligent agent digital twin achieve autonomous learning and behavior generation in a way that more closely restores the real subject? The customization of personalized models and emotional thinking chains will correspond to the foundational background setting and generative logic setting of intelligent agent digital twins, endowing them with autonomous thinking and social empathy abilities while achieving a highly realistic avatar presentation throughout the entire process, which is of transformative significance for human-machine interaction or agent interaction.
Personalized models give an identity to an intelligent agent digital twin, with different digital twins containing different and unique models. The application of large language models has created convenient methods for creating personalized models. Existing research has shown that large language models can be used to explore actual user preferences and generate personal recommendations that are more likely to be accepted by users, which is of fundamental significance for future character exploration. The language habits, behavioral habits, and personal preferences of real humans can be trained into the models of intelligent agent digital twins. However, once the model of the intelligent agent digital twin is set, it will evolve itself during interactions and communications with the external world, a process akin to that of the real humans it represents, where communication with others continuously generates changes in thinking, essentially representing the full-process operation and subject role of personalized intelligent twins.
If personalized models make intelligent agent digital twins avatars with identities, then emotional thinking endows intelligent agent digital twins with true humanity, making them necessary subjects for achieving agent interactions or even human/silicon-based symbiosis. Numerous studies in the field of emotion recognition have proven that large language models can help machines discover and identify user emotion types and generate targeted replies through certain pathways and computational logic. However, intelligent agent digital twins can increasingly demonstrate the potential of large language models to simulate human cognitive processes based on accurately recognizing user emotions. The generative logic of the emotional chain-of-thought (ECoT) can produce emotionally resonant behaviors, enhancing the performance of large language models in various emotional generation tasks by aligning with human emotional intelligence criteria. Moreover, the emotional chain can not only recognize and simulate human emotions but also control the performance of multimodal behaviors, embedding more text generation patterns to achieve more forms of emotional output. In this way, a vibrant intelligent agent digital twin can approach humanization, improving the fluidity and anthropomorphism of the interaction process with users, thereby digitally restoring the presence of real human subjects and achieving realistic simulations of interaction fields, realizing a full-factor digital twin.
Empowering User Experience Innovation Development with Intelligent Agent Digital Twins
4.1 Sensory Experience: From Remote to Close Sensory Technology Acceptance
Sensory marketing is defined as a marketing approach that attracts consumers’ senses and influences their perceptions, judgments, and behaviors. However, only certain senses can detect distant stimuli during the development of media, thus some scholars divide senses into proximal and distal senses, where touch, smell, and taste are referred to as proximal senses, while vision and hearing are referred to as distal senses. Due to the scene limitations of intelligent agent digital twins, their sensory information output in digital form lacks direct interaction of proximal senses, but surprisingly, their distal sensory expression technology is making progress. In terms of modeling, the enhancement of hyper-realistic appearance portrayal and voiceprint simulation accuracy brings audiovisual distal sensory twin experiences, which are typically the first order of user experience and the first elements users perceive, laying a solid foundation for the entire experience process. Douyin blogger @Yan Bojun applied digital human simulation modeling technology, using “AI Yan Bojun” to appear and narrate content, successfully making most viewers unable to detect the use of AI at first instinct, only able to distinguish real people from AI through subtle differences in movements. With the technological integration of intelligent agent digital twins empowered by AIGC, future distal sensory experiences will approach real humans, especially with the support of the metaverse and extended reality (XR), greatly narrowing the interaction distance between interactive subjects and the entire interaction scene, achieving immersive experiences in the digital world.
The enhancement of distal sensory experiences will lead to indirect optimization of proximal sensory experiences. Although consumers cannot physically touch and perceive objects in videos, through intuitive audiovisual experiences, audiences can have an indirect and imaginative understanding of their tactile, taste, and olfactory experiences. For example, in the future, extremely realistic intelligent agent digital twins tasting food will allow users in front of the screen to feel the deliciousness of the food, making them drool and successfully arousing their desire to taste. In fact, the optimization of distal sensory experiences and the enhancement of proximal sensory experiences will effectively bring about smooth sensations and psychological effects in users during interactions, and this usefulness and ease of use will greatly enhance users’ acceptance of the technology behind intelligent agent digital twins.
4.2 Interaction Experience: From Assistants to Partners in Intelligent Agent Social Experiences
Even though traditional digital twins possess strong imitative abilities and information processing capabilities, we still regard them as effective assistants in work, as they primarily fulfill our safety needs, which are the stable and secure work requirements mentioned in Maslow’s hierarchy of needs. As the level of demand satisfaction rises, these digital avatars lacking autonomy and behavioral language processing capabilities have not reached a level that allows for free interaction with users. With the advancement of machine learning and large language models, human-machine interaction (HMI) has been elevated to a new level, where machines can exhibit autonomous behavior, prompting human-machine interaction to evolve into a form of “cooperation” rather than merely being a tool application. Intelligent agents empowered by generative artificial intelligence not only provide a more refined technological foundation for the simulation images of digital twins but can also achieve cognitive exchanges with real human users in terms of value alignment and behavioral decision-making. Therefore, the interaction behaviors between humans and intelligent agent digital twins have entered the realm of human-agent interaction (HAI).
Human-agent interactions are an important premise for intelligent agent socialization, encompassing both social situations between real human users and intelligent agents and virtual social situations among multiple agents. Research has found that intelligent agents empowered by artificial intelligence possess strong interactivity, capable of interacting with their environment and other agents, which can include direct and indirect interactions, and can be cooperative or competitive. Currently, the focus of development applications is still on social interactions between real human users and intelligent agents. As the digital subjects most capable of establishing genuine social relationships with real human users, intelligent agent digital twins play a crucial role in optimizing human-machine interaction collaboration, enhancing user experience, and collecting learning data for the future construction of multi-agent social networks.
4.3 Emotional Experience: Constructing New Relationship Frameworks from Traditional to New
Whether it is platforms, brands, or products, the ultimate goal of engaging users is to provide unparalleled emotional experiences to enhance user satisfaction and affection, even dependency. While Siri can provide human-like voices and response content, without a complete background of human images and behaviors, it can only play the role of a question-and-answer assistant in a master-servant relationship; although ChatGPT is currently the most human-like cognitive and logical thinking capable large language model, the absence of human-like images and actual behaviors means that human-agent interactions have not transcended traditional social interaction forms. Intelligent agent digital twins are not only avatars of physical real humans but also embody the emotional relationships between users and them. The emotional interactions with intelligent agent digital twins are based on the emotional relationships with real humans, changing in real-time with the actual interaction situations, a model that is entirely different from human behavior, essentially presenting the construction of a new social relationship framework for human-agent interactions.
Bilibili blogger @Wu Wuliu utilized AI technology to restore and generate the facial image and voice characteristics of his deceased grandmother for virtual conversations; musician Bao Xiaobai used AI to “revive” his daughter… These cases demonstrate that the emotional relationships between users and intelligent agent digital twins are based on real emotional relationships and are significantly enhanced in virtual digital environments, with users’ emotional experiences being influenced by dual factors, becoming even more intense. The subtle differences between these two relationships create a new relationship framework, as the emotional relationship changes brought about by the emergence and evolution of intelligent agent digital twins empowered by AIGC in their interactions with real human users will become a hot research topic in the field of computational social science.
Conclusion
This article combines AIGC-empowered intelligent agents and digital twin technology, proposing the new form of digital avatar known as “intelligent agent digital twin,” and explores the similarities in image and behavior with physical real humans. Ultimately, based on the transformative effects of the image and behavior similarity of intelligent agent digital twins, it analyzes the innovative optimization of user experience during interactions with users from sensory, interaction, and emotional dimensions. Through this analysis, we believe that intelligent agent digital twins are a new type of anthropomorphic digital twin formed by the organic integration of “agents” and “digital twins.” Compared to traditional digital twins, they possess human-like thinking (cognition), capable of deep learning to achieve a high degree of anthropomorphism in consciousness; they possess human-like emotions, capable of bridging the emotional gap between humans and machines to realize “agent socialization”; they possess human-like behaviors, capable of achieving value alignment with humans to realize the future of “human-machine symbiosis.” Intelligent agent digital twins not only achieve deep simulation and integration of internal and external images with physical real humans but also autonomously generate behaviors based on personalized models and emotional chains, effectively restoring the presence of the subject. During communication and exchanges with intelligent agent digital twins, the optimization effects of distal sensory experiences enhance users’ psychological sensitivity to proximal sensory experiences, gradually increasing their acceptance of this realistic digital human form; machines will no longer merely serve as assistants to enhance users’ productivity but will transform user experiences into social experiences akin to friendships between humans and intelligent agents through innovative interaction methods. Ultimately, diverse emotional connections will arise between humans and intelligent agent digital twins, fostering the innovative development of new relationship frameworks and emerging forms of computational society. The emergence and development of new interaction forms such as human-agent interactions and multi-agent interactions based on intelligent agent digital twin technology will provide more closed and secure simulation scenarios for solving a series of social communication problems, achieving sustainable and stable development of diverse social relationships.
(Author Information: Duan Chunlin, Professor and PhD Supervisor at the School of Journalism and Communication, South China University of Technology; Yao Haowen, Master’s Student of 2023 at the School of Journalism and Communication, South China University of Technology)
[References & Annotations]
[1]Zha Qinjun, 20 million people watched AI Liu Qiangdong, Interface News[EB/OL], https://baijiahao.baidu.com/s?id=1796540693928513816&wfr=spider&for=pc, accessed May 7, 2024.
[2]J. Chen et al., A Revolution of Personalized Healthcare: Enabling Human Digital Twin with Mobile AIGC, IEEE Network, 2024: 1-1.
[3]Yee N., Bailenson J. N., Ducheneaut N., The Proteus Effect: Implications of Transformed Digital Self-Representation on Online and Offline Behavior, Communication Research, 2009, 36(2): 285-312.
[4]Beauregard R., Corriveau P., User Experience Quality: A Conceptual Framework for Goal Setting and Measurement, 1st International Conference on Digital Human Modeling, ICDHM 2007, Springer, Berlin, Heidelberg, 2007.
[5]Beauregard R., Younkin A., Corriveau P., Doherty R., Salskov E., Assessing the Quality of User Experience, Intel Technology Journal, 2007, 11(1): 77–87.
[6]Guo J., Digital twins are shaping future virtual worlds, Service Oriented Computing and Applications, 2021, 15: 93–95.
[7]Tao Fei, Liu Weiran, Liu Jianhua et al., Digital Twin and Its Application Exploration[J], Computer Integrated Manufacturing Systems, 2018, 24(1): 1-18.
[8]Loveys Kate, Sagar Mark, Antoni Michael, Broadbent Elizabeth, The Impact of Virtual Humans on Psychosomatic Medicine, Psychosomatic Medicine, 2023, 85(7): 619-626.
[9]Baicun Wang, Pai Zheng, Yue Yin, Albert Shih, Lihui Wang, Toward human-centric smart manufacturing: A human-cyber-physical systems (HCPS) perspective, Journal of Manufacturing Systems, 2022, 63: 471-490.
[10]S. D. Okegbile, J. Cai, D. Niyato, C. Yi, Human Digital Twin for Personalized Healthcare: Vision, Architecture and Future Directions, IEEE Network, 2023, 37(2): 262-269.
[11]Baicun Wang, Huiying Zhou, Xingyu Li, Geng Yang, Pai Zheng, Ci Song, Yixiu Yuan, Thorsten Wuest, Huayong Yang, Lihui Wang, Human Digital Twin in the context of Industry 5.0, Robotics and Computer-Integrated Manufacturing, 2024, 85: 102626.
[12]Hafez W., Human Digital Twin: Enabling Human-Multi Smart Machines Collaboration, IntelliSys 2019, London, 2019.
[13]Wei Shengli, Is Human Digital Twin possible?, Computer Methods and Programs in Biomedicine Update, 2021, 1: 100014.
[14]Xu Ruiping et al., Digital Twins and Human Liberation[J], Journal of Foshan University of Science and Technology (Social Science Edition), 2024, 42(02): 30-36.
[15]Wang Jiwei, A Brief History of AI Agent Development: From Philosophical Enlightenment to the Realization of Artificial Intelligence Entities, The Era of Big Data, 2023(12): 6-19.
[16]Sun Yifeng, Liao Shufan, Wu Jiang et al., Situation Awareness Intelligent Agents Based on Large Models, Command Control and Simulation, 2024, 46(2): 1-7.
[17][U.S.] David Vernon, Zhou Yufeng, Wei Shuxia, Introduction to Artificial Cognitive Systems[M], Beijing: Peking University Press, 2021.
[18]Huang Y., Levels of AI Agents: from Rules to Large Language Models, arXiv preprint, 2024, arXiv:2405.06643.
[19]Andrej Karpathy et al., “Generative models”, Retrieved May 2nd, 2024 from https://openai.com/research/generative-models.
[20]LeCun Y, Bengio Y, Hinton G, Deep learning, Nature, 2015, 521(7553): 436-444.
[21]J. Chen et al., A Revolution of Personalized Healthcare: Enabling Human Digital Twin with Mobile AIGC, IEEE Network, 2024.
[22]Manal Alamir, Manal Alghamdi, The Role of Generative Adversarial Network in Medical Image Analysis: An In-depth Survey, ACM Computing Surveys, 2022, 55(5): 36.
[23]X. Wang, Guest Editorial Special Issue on Social Computing and Societies 5.0: Toward Social Intelligence via Cyber Movement Organizations, IEEE Transactions on Computational Social Systems, 2023, 10(4): 1810-1812.
[24]J. Han, Z. Zhang, N. Cummins, B. Schuller, Adversarial Training in Affective Computing and Sentiment Analysis: Recent Advances and Perspectives, IEEE Computational Intelligence Magazine, 2019, 14(2): 68-81.
[25]P. Tzirakis et al., End-to-End Speech Emotion Recognition Using Deep Neural Networks, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, 2018.
[26]Cícero dos Santos et al., Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts, the 25th International Conference on Computational Linguistics, Dublin, 2014.
[27]Krugmann, J.O., Hartmann, J., Sentiment Analysis in the Age of Generative AI. Customer Needs and Solutions, 2024, 11: no.3.
[28]Lu F., Liu B., Affective Digital Twins for Digital Human: Bridging the Gap in Human-Machine Affective Interaction, arXiv preprint, 2023, arXiv:2308.10207.
[29]Hu Zhengrong, Yan Jiaqi, Comparative Study on Value Alignment of Generative Artificial Intelligence—Based on Experiments of the Top Ten International News Generating Comments from 2012 to 2023, Journalism University, 2024(03): 1-17+117.
[30]Wenger E., THE ALIGNMENT PROBLEM: Machine Learning and Human Values, Perspectives on Science & Christian Faith, 2021, 73(4): 245–247.
[31]Cudeiro D. et al., Capture, Learning, and Synthesis of 3D Speaking Styles, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 2019.
[32]Liu H. et al., EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling, arXiv preprint, 2024, arXiv:2401.00374.
[33]David Westerman et al., The effects of static avatars on impression formation across different contexts on social networking sites, Computers in Human Behavior, 2015, 53: 111-117.
[34]Zhang Huichuan et al., Research on the Mechanism of Self-avatar Similarity on Social Media Users’ Willingness to Self-disclose, Information Science, 2023, 41(11): 176-184.
[35]Lortie C. L., Guitton M. J., Looking similar promotes group stability in a game-based virtual community, GAMES FOR HEALTH: Research, Development, and Clinical Applications, 2012, 1(4): 274-278.
[36]Tang L, Bashir M., Effects of Self-avatar Similarity on User Trusting Behavior in Virtual Reality Environment, International Conference on Human-Computer Interaction, Cham: Springer Nature Switzerland, 2023: 313-316.
[37]Pan Y., Steed A., The impact of self-avatars on trust and collaboration in shared virtual environments. PLOS ONE, 2017, 12(12): e0189078.
[38]Schultze, U., Embodiment and Presence in Virtual Worlds: A Review, Journal of Information Technology, 2010, 25(4): 434-449.
[39]Wang Haizhong, Li Binglian, Xie Tao, Theoretical Construction of Self-Avatar in the Digital World, Management Science, 2022, 35(03): 116-130.
[40]Zhu R., Yi C., Avatar design in Metaverse: the effect of avatar-user similarity in procedural and creative tasks, Internet Research, 2024, 34(1): 39-57.
[41]Beauregard R., Younkin A., Corriveau P., Doherty R., Salskov E., Assessing the Quality of User Experience, Intel Technology Journal, 2007, 11(1): 77-87.
[42]Anon, User experience (UX), Independent Banker, 2023, 73(5): 13.
[43]Chen Juan et al., Empirical Analysis of Factors Influencing User Experience on Social Q&A Platforms—Taking Zhihu as an Example, Library and Information Work, 2015, 59(24): 102-108.
[44]Gerrit van der Veer, Dhaval Vyas, “APEC: A Framework for Designing Experience”, Retrieved May 11th, 2024 from https://www.academia.edu/282319/APEC%20A%20Framework%20for%20Designing%20Experience.
[45]Peter Morville, “User Experience Design”, Retrieved May 11th, 2024 from http://semanticstudios.com/user_experience_design/.
[46]Yu Guoming, On the Model and Quantitative Research of Media User Experience—A Logical Framework of Cognitive Neuroscience Study, Journal of Xinjiang Normal University (Philosophy and Social Sciences Edition), 2018, 39(06): 53-60+2.
[47]Liang Shuang, Yu Guoming, The Impact of Media Usage Motivation and Scenarios on User Experience—Based on Cognitive Neuroscience Effects Measurement, Journalism University, 2021(01): 89-102+121.
[48]Shanahan M., McDonell K., Reynolds L., Role play with large language models, Nature, 2023, 623(7987): 493–498.
[49]Allbeck, J., Badler, N., Toward representing agent behaviors modified by personality and emotion, Embodied conversational agents at AAMAS,2002: 2(6), 15-19.
[50]Lu J., Pan B., Chen J., Feng Y., Hu J., Peng Y., Chen W., AgentLens: Visual Analysis for Agent Behaviors in LLM-based Autonomous Systems, IEEE Transactions on Visualization and Computer Graphics, 2024: 1-17.
[51]Joko H., Chatterjee S., Ramsay A., et al., Doing Personal LAPS: LLM-Augmented Dialogue Construction for Personalized Multi-Session Conversational Search, arXiv preprint, 2024, arXiv:2405.03480.
[52]Hou Y., Tamoto H., Miyashita H., “My agent understands me better”: Integrating Dynamic Human-like Memory Recall and Consolidation in LLM-Based Agents, arXiv preprint, 2024, arXiv:2404.00573.
[53]Regan C., Iwahashi N., Tanaka S., Oka M., Can Generative Agents Predict Emotion?, arXiv preprint, 2024, arXiv:2402.04232.
[54]Pico A., Taverner J., Vivancos E., Botti V., García-Fornes A., Towards an Affective Intelligent Agent Model for Extrinsic Emotion Regulation, Systems, 2024, 12(3): 77.
[55]Binz M., Schulz E., Using cognitive psychology to understand GPT-3, Proceedings of the National Academy of Sciences, 2023, 120(6): e2218523120.
[56]Li Z., Chen G., Shao R., et al., Enhancing the Emotional Generation Capability of Large Language Models via Emotional Chain-of-Thought, arXiv preprint, 2024, arXiv:2401.06836.
[57]Croissant M., Frister M., Schofield G., McCall C., An appraisal-based chain-of-emotion architecture for affective language model game agents, PLoS ONE, 2024, 19(5): e0301033.
[58]Krishna, Aradhna, An Integrative Review of Sensory Marketing: Engaging the Senses to Affect Perception, Judgment and Behavior, Journal of Consumer Psychology, 2012, 22(3): 332–51.
[59]Elder Ryan S., Anne E. Schlosser, Morgan Poor, Lidan Xu, So Close I Can Almost Sense It: The Interplay Between Sensory Imagery and Psychological Distance, Journal of Consumer Research, 2017, 44(4): 877–94.
[60]Marks, Laura U, Thinking Multisensory Culture, Paragraph, 2008, 31(2): 123–37.
[61]Zu Yang, From Similarity in Form to Similarity in Spirit, AI Digital Humans Welcome a New Business Model | Dialogue with Silicon-based Intelligence, Deep Echo[EB/OL], https://new.qq.com/rain/a/20240124A07HOG00, accessed May 16, 2024.
[62]Xu Dingyi, Sensory Marketing: Content Expression Research of Food Evaluation Short Videos, Science and Technology Communication, 2024, 16(05): 100-104.
[63]Davis F.D., Bagozzi R.P., Warshaw P.R., User acceptance of computer technology: a comparison of two theoretical models, Management Science, 1989, 35(8): 982-1003.
[64]Schmidt P., Loidolt S., Interacting with Machines: Can an Artificially Intelligent Agent Be a Partner?, Philosophy & Technology, 2023, 36(3): 55.
[65]W. Hafez, Human Digital Twin—Enabling Human-Agents Collaboration, 2021 4th International Conference on Intelligent Robotics and Control Engineering (IRCE), Lanzhou, 2021.
[66]Chen Changfeng, The Rise of Intelligent Platforms and the Emergence of Intelligent Agents: Large Models Will Transform Society and Civilization, Journalism, 2024(02): 15-24+48.
[67]Jonathan Gratch, The Social Psychology of Human-agent Interaction, the 7th International Conference on Human-Agent Interaction, New York, 2019.
[68]Xiao Yan, Using AI Technology to “Revive” Relatives, Is It a New Opportunity or an Ethical Challenge?, The Paper[EB/OL], https://www.thepaper.cn/newsDetail_forward_26663812, accessed May 16, 2024.
Click below to view submission details
↓↓↓
Call for Papers