Research on the Construction and Application of Teaching Intelligent Agents Based on Large Models

Lu Yu, Yu Jinglei, Chen Penghe

Abstract: With the rapid development of generative artificial intelligence, intelligent agents based on large models have gradually acquired capabilities such as multimodal perception, retrieval-augmented generation, reasoning and planning, interaction, and evolution. This research proposes the basic concept and framework of teaching intelligent agents based on large models, with the “large model” as the technological core, focusing on building multiple functional modules such as “educational task setting,” “educational task planning,” “educational capability realization and expansion,” “educational content memory and reflection,” and “interactive collaboration and dynamic evolution,” supporting interaction with various types of objects and achieving dynamic evolution, covering human-computer interaction, multi-agent interaction, and environmental interaction. Based on the proposed framework, the research uses project-based learning tasks as application scenarios to elaborate on the role of teaching intelligent agents as “teaching assistant agents” and “peer agents” in the personalized driving of problem posing, collaborative design of project plans, collaborative completion of project works, and multi-role evaluation of project works, along with related supporting technologies. Finally, the research further explores the development direction and future prospects of teaching intelligent agents.

Keywords: Teaching Intelligent Agents; Large Models; Generative Artificial Intelligence; Project-Based Learning

Classification Number: G434 Document Identification Code: A

* This article is a research achievement of the key project of the “14th Five-Year Plan” of Beijing Education Science in 2021, titled “Research on the Construction of a New Generation Intelligent Guidance System Driven by Artificial Intelligence” (Project Number: CHAA21036).

Please cite the following information:

Lu Yu, Yu Jinglei, Chen Penghe. Research on the Construction and Application of Teaching Intelligent Agents Based on Large Models [J]. China Educational Technology, 2024, (7): 99-108.

1. Introduction As generative artificial intelligence evolves rapidly, multimodal large models increasingly demonstrate their advantages in multimodal content understanding and generation. A multimodal large model (hereinafter referred to as “large model”) refers to an artificial intelligence model capable of processing and understanding various modal data inputs such as text, images, and audio-visual content. AI models represented by GPT-4 belong to multimodal large models. Large models typically have ultra-large-scale parameters, support reasoning and decision-making through methods such as prompt engineering and fine-tuning, and exhibit outstanding performance in tasks such as natural language processing and audio-visual analysis. To further unleash the application potential of large models, researchers in the field of artificial intelligence have begun to attempt to construct intelligent agents based on large models. Intelligent agents, also known as Autonomous Agents, refer to adaptive systems that can perceive the environment and react to it to achieve their own goals. Since the 20th century, designing and implementing intelligent agents has become one of the main goals of research in the field of artificial intelligence, but this research has long been limited by the intelligence level of core models. The emergence of large models provides a feasible technical path for constructing and realizing intelligent agents, and the interaction capability of agents with the external environment can also facilitate the downstream task adaptation of large models. Therefore, the construction of intelligent agents based on large models is receiving increasing attention in both the field of artificial intelligence and various vertical domains, but there is a lack of systematic design and comprehensive discussion in the field of education. In the education sector, research on intelligent agents is still in its infancy. Early studies focused on constructing agents aimed at optimizing knowledge sharing and organizational management models, and some studies have attempted to design educational intelligent agents that can interact with learners to achieve student-centered personalized online learning. In recent years, researchers have conducted studies on intelligent agents from the perspectives of cognitive learning mechanisms and social cue design, emphasizing high-level meaning processing and interactive innovation of learning subjects. As generative artificial intelligence technologies such as large models mature, researchers are attempting to construct general intelligent agents based on large language models (LLM) to use them as user interaction assistants for mathematical formula formatting. Additionally, researchers have attempted to utilize multiple general intelligent agents to simulate classroom interactions between teachers and students, with experimental results indicating high similarity to real classroom environments in dimensions such as teaching methods, curriculum, and student performance. Current research in the education field often remains at the level of attempting educational applications of general intelligent agents, lacking clear definitions and systematic architectural designs. Therefore, this research proposes the concept of “teaching intelligent agents,” aiming to fully leverage the autonomous adaptation capabilities of generative artificial intelligence in environmental perception, reasoning planning, learning improvement, and action decision-making to provide intelligent services for teaching and learning to all stakeholders in education across various scenarios such as classroom teaching, after-school tutoring, teacher training, and home-school cooperation. This research will conduct studies on the design and implementation paths of teaching intelligent agents based on the latest technological advancements in the field of generative artificial intelligence, combined with the actual needs of the current education sector.2. Intelligent Agents Based on Large Models(1) Basic Concept In the field of artificial intelligence, intelligent agents need to be able to perceive their external environment and take actions to affect the external environment. Typically, the perception and action between intelligent agents and the external environment continuously cycle, forming close interactions to accomplish specific task goals. Figure 1 shows the basic framework of intelligent agents based on large models, where the large model serves as the foundational support and core of the agents. For a given task, the agent first collects information through its environmental perception capabilities to understand the dynamically changing external environment. Subsequently, the agent can use reasoning and planning methods to solve specific problems in the given task step by step based on logical thinking. During this process, the memory retrieval mechanism supports the storage and retrieval of past experiences, helping to improve the quality of reasoning and planning. At the same time, the agent can also leverage massive data to discover objective patterns for learning improvement, thereby enhancing its own performance and storing it in memory. On this basis, the agent makes reasonable decisions and selects execution tools by weighing the pros and cons of actions in reasoning and planning, transforming decisions into actual effects and impacts on the external environment to achieve task goals. Compared to traditional intelligent agents in the field of artificial intelligence, intelligent agents based on large models possess several core capabilities.(2) Core Capabilities 1. **Multimodal Perception and Generation** Multimodal perception and generation integrate various data channels such as vision, speech, and text to achieve external information acquisition either individually or in combination, thereby endowing the agent with the ability to understand and analyze its environment. Relying on visual and linguistic understanding and generation capabilities, agents can perform various tasks such as Visual Question Answering (VQA) and Text-to-Image Generation. For example, the visual question answering task requires the agent to answer questions related to scene understanding such as “counting,” “attribute recognition,” and “object detection” based on image information. On this basis, the outside knowledge visual question answering task requires the agent to answer questions that cannot be understood solely based on image information, encouraging it to generate answers by retrieving external knowledge in conjunction with image understanding. Currently, the multimodal perception and generation capabilities of agents have expanded to the video domain, supporting the interpretation and construction of continuous image frames, thereby accurately answering questions related to video content or generating logically coherent high-quality videos. Additionally, for embodied operational entities such as physical robots or virtual environment characters, multimodal perception can support real-time information updates for environmental interactions, assisting agents in completing complex tasks such as task planning, path planning, reasoning, and decision-making. 2. **Retrieval-Augmented Generation** To alleviate issues such as the “hallucination” of large models generating contradictory information and the timeliness of data updates, retrieval-augmented generation (RAG) technology has gradually been widely adopted. RAG technology can provide large models with reliable external knowledge related to the problems to be solved based on knowledge base retrieval. This technology primarily consists of three basic steps: index establishment, question retrieval, and content generation. The index establishment step requires constructing a knowledge base based on specified task information, which can include document types, web pages, knowledge graphs, etc., and utilizing methods such as word embeddings from language models to extract textual semantic features of the resources, which are then stored in the knowledge base either online or offline. The question retrieval step takes user questions as retrieval requests, extracts their textual semantic features, retrieves matching data content from the knowledge base as auxiliary information for the questions. In the content generation step, the large model generates answers based on user question information and retrieved auxiliary information through data augmentation or constructing attention mechanisms. The Carnegie Mellon University team has provided an interface document query function for agents based on the retrieval-augmented generation concept. By querying the interface document of a liquid processor, the intelligent agent can accurately control the liquid processor based on user textual commands using ultraviolet-visible spectroscopy measurement tools and Python computational tools, enabling automatic design, planning, and execution of scientific experiments. 3. **Reasoning and Planning** The task planning of intelligent agents aims to decompose complex tasks into multiple executable sub-tasks, primarily relying on the reasoning capabilities of large models. By constructing prompt information that stimulates the reasoning capabilities of large models, agents can autonomously complete multi-step reasoning, breaking down complex tasks into multiple executable sub-tasks, and autonomously planning a sequence of specific actions to achieve task goals. Various effective methods for constructing prompt information have been validated, including typical single-path reasoning methods such as Chain of Thought (CoT), multi-path reasoning methods such as Self-consistency (CoT-SC), Tree of Thought (TOT) that integrates behavior reflection, and planning methods like ReAct. The Chain of Thought, as one of the earliest proposed reasoning methods, aims to stimulate large models to use multi-step thinking during the reasoning process, guiding the model to execute specific tasks along a single path. On this basis, the Self-consistency method considers the randomness of the content generated by large models, constructing multiple parallel chains of thought by repeatedly executing single-path reasoning chains, ultimately selecting the result with the highest consistency as the answer. The Tree of Thought further refines the reasoning process and integrates the advantages of the aforementioned two, constructing a tree-structured problem-solving solution. Each path in the tree structure represents a solution, and each node represents an intermediate step. The Tree of Thought decomposes the intermediate steps of task reasoning based on different problem attributes, aiming for each step to be executed with relatively reliable small steps, thereby generating the next possible solution at the current node. The Tree of Thought provides heuristic path selection criteria for subsequent search algorithms by setting independent quality assessment mechanisms for each node or voting mechanisms among multiple nodes. Finally, based on different tree structures, breadth-first or depth-first search algorithms select the optimal problem-solving path. The ReAct planning method further integrates feedback information from task execution during the reasoning process, allowing agents to autonomously interact with external resources, update task planning, and handle anomalies. In the specific task planning process, the ReAct planning method consists of three basic actions: Thought, Action, and Observation of Action Results. By cyclically executing this combination of actions, a final solution containing multiple rounds of reasoning steps is generated. The ReAct method has demonstrated good performance across various language and decision-making tasks such as question answering, fact verification, text games, and web navigation, with the generated task trajectories being interpretable and credible. 4. **Interaction and Evolution** Agents can collaborate and complete complex tasks through interaction with external environments, humans, and other agents, thereby achieving autonomous evolution. In interaction with the external environment, agents can autonomously set tasks and explore the external world based on their data, available tools, inventory resources, and environmental descriptions. During the exploration process, agents can construct skill libraries to accumulate skills based on a self-check mechanism of machine language, achieving complex and meaningful autonomous evolutionary behaviors. For example, Voyager, as a large model-driven agent, has achieved autonomous world exploration and skill evolution in sandbox survival games. In interaction with humans, agents support users in customizing the integration of various large language models and tools, engaging in autonomous planning and human-computer collaboration, effectively solving various tasks across programming, mathematics, experimental operations, online decision-making, and Q&A. For instance, agents can solve complex number theory problems through automatic problem-solving, and when the answers are incorrect, they can obtain human feedback through human-computer collaboration to improve and correct errors in the automatic answers. Agents can complete tasks such as organic synthesis, drug discovery, and materials design based on user-input instructions, achieving the automatic synthesis of a mosquito repellent and three organic catalysts in testing, guiding humans to discover new chromophores through human-computer collaboration. Different agents can also interact and collaborate to advance the resolution of complex tasks and self-evolution. For instance, by constructing a “Plan, Execute, Inspect, and Learn” (PEIL) guiding mechanism, multi-agent task planning and tool execution, visual perception and memory management, active learning and scheme optimization can be achieved, demonstrating outstanding performance in tasks such as visual question answering and reasoning question answering. At the same time, debate-style interactions among multiple agents can enhance their ability to solve complex reasoning problems and have shown effectiveness in tasks such as commonsense machine translation and counterintuitive arithmetic reasoning. Furthermore, agents can simulate and implement specific business processes and their task objectives through role division and collaboration among multiple agents. For example, the MetaGPT multi-agent collaboration framework can assign roles (such as product managers and software development engineers) to multiple agents and set workflows, achieving automated software development processes through the introduction of human job mechanisms. Additionally, the memory mechanism of agents plays a crucial role in interaction and evolution, supporting the review of interaction histories, knowledge acquisition, experience reflection, skill accumulation, and self-evolution. The generative agents proposed by Stanford University have constructed a virtual town scene in a sandbox game engine, enabling dynamic behavior planning for virtual individuals and simulating credible human behaviors. This generative agent has built a memory flow mechanism that can store perceived virtual environmental information and individual experiences in the memory flow. Agents can make behavioral decisions based on individual memories while forming long-term behavioral plans and higher-level reflections, providing memory reserves for subsequent behavioral decisions. For example, in the behavioral decision related to “whether to attend a party,” agents first retrieve relevant memory records from the memory flow, calculating a comprehensive score for each record based on its timeliness, relevance, and importance to the decision task through weighted summation, with the top-ranked memories serving as decision-making references, incorporated into prompt information to assist in behavioral decisions.(3) Implementation Methods Intelligent agents use large models as core controllers, emphasizing the dynamic interaction between agents and information, the integration of reasoning and planning capabilities, the establishment of memory and reflection mechanisms, the realization of tool use and task execution capabilities, and the continuous evolution of abilities during external interactions. These characteristics collectively endow agents with high-level information understanding and processing capabilities, making their decision-making methods closer to those of humans and demonstrating their understanding and handling of complex situations. To support the practical implementation of intelligent agents based on large models, several engineering frameworks have been developed and open-sourced, such as LangChain, which supports single-agent implementation, and Auto-GPT, as well as AutoGen, BabyAGI, and CAMEL, which support multi-agent collaboration. These frameworks provide essential resources for researchers and developers, facilitating the development and testing of multi-scenario applications of intelligent agents. Among the aforementioned implementation frameworks, LangChain and AutoGen are the most widely used frameworks for single-agent and multi-agent implementations, respectively. LangChain provides a structured application process for large models, facilitating the engineering implementation of agents, with its technical components including model I/O, retrieval, agents, chains, and memory. LangChain supports a rich variety of tool and toolkit calls and can realize multiple core capabilities of agents, such as retrieval-augmented generation and ReAct planning. AutoGen allows users to flexibly define interaction modes and human-computer collaboration modes among multiple agents based on needs, such as a dynamic group discussion mode led by one agent with human participation or a collaborative coding mode where two agents are responsible for coding and debugging. AutoGen supports the reading and writing of interactive memory among multiple agents, can utilize third-party Python toolkits for tool usage (such as calling the Matplotlib library for mathematical plotting), and supports converting tasks into machine language solutions, executing tasks step by step through code, and ensuring the successful operation of programs through code execution and debugging among agents. Both LangChain and AutoGen provide viable solutions for the implementation of intelligent agents, and their combined use can leverage the advantages of both. For example, AutoGen can flexibly construct and implement the interaction framework of agents and machine language-based task execution, while LangChain can assist in connecting external rich tool libraries (such as ArXiv, Office365, Wolfram Alpha, etc.) and custom tools (by implementing user-provided tool function descriptions, method implementation codes, input-output formats, etc.), thereby expanding the capability boundaries of agents.3. Construction of Teaching Intelligent Agents Based on Large Models Based on the rapid evolution and development of large models in the current education sector, this research proposes constructing teaching intelligent agents based on large models. As shown in Figure 2, teaching intelligent agents center around the “large model,” and their main functional modules include “educational task setting,” “educational task planning,” “educational capability realization and expansion,” and “educational content memory and reflection.” Meanwhile, teaching intelligent agents support interaction with various types of objects and achieve dynamic evolution, covering human-computer interaction, multi-agent interaction, and environmental interaction.(1) Educational Task Setting The “educational task setting” module encompasses the provision of key information such as the educational scenario setting, educational demand setting, and educational role setting. Among them, the educational scenario setting provides background information for the intelligent agent regarding educational tasks, such as project-based learning scenarios centered on students, online self-learning scenarios, traditional classroom teaching scenarios, etc.; the educational demand setting provides specific goal descriptions for educational tasks, such as providing strategic scaffolding for project-based learning, evaluating learners’ problem-solving abilities, coordinating group collaborative learning, etc.; the educational role setting assigns specific role information to the agent that needs to be played in educational tasks, such as teaching assistants, student companions, research assistants, family helpers, etc. The setting of educational roles helps the agent interact more effectively with educational users, providing personalized interaction experiences and improving support effectiveness. Multiple teaching intelligent agents can also collaborate by playing different educational roles, fulfilling key teaching needs in specific educational scenarios through division of labor, collaborative debate, and human-computer cooperation.(2) Educational Task Planning Based on the established educational task information, teaching intelligent agents can achieve autonomous task planning, with the basic steps and sequence being “task scheme consideration,” “scheme decomposition planning,” and “execution result perception.” First, the “task scheme consideration” step involves reasoning and generating the overall scheme based on the established educational scenario, demands, and roles, in conjunction with relevant educational standards or frameworks, educational resources, and auxiliary tools; the “scheme decomposition planning” step breaks down the generated overall scheme into multiple executable and manageable sub-tasks, including planning specific teaching activities, teaching resources, teaching tools, and educational evaluation methods. Teaching intelligent agents can also adjust each sub-task in real-time based on feedback from teachers or learners to ensure the adaptability and effectiveness of educational task planning. After the planned sub-tasks are executed, the “execution result perception” step is responsible for obtaining execution results and multidimensional interaction information. By introducing evaluation feedback mechanisms, based on task execution results, agents can autonomously reason and judge or manually evaluate the quality of sub-task completion. If problems are found or planning goals are not achieved, the “task scheme consideration” step will restart until the goal is achieved before exiting the loop mechanism. Utilizing the aforementioned educational task planning process, agents can iteratively optimize the execution process and strategies of educational tasks, meeting the demands for efficient personalized education.(3) Educational Capability Realization and Expansion Teaching intelligent agents can realize and expand multiple basic capabilities to execute the planned educational tasks. First, teaching intelligent agents can call external professional teaching tools and their operating environments, including but not limited to mathematical computing tools, educational software, and collaborative learning tools. For example, teaching intelligent agents can call the mathematical computing tool Wolfram Alpha, interacting through natural language or mathematical formulas to solve precise calculation problems required by multiple disciplines. These external tools can provide teaching intelligent agents with professional capabilities that large models do not possess, thereby assisting them in solving professional problems within the planned educational tasks. Teaching intelligent agents can also avoid outputting incorrect educational information through retrieval-augmented generation, expanding their knowledge and capability boundaries. Providing educational services typically requires high accuracy and interpretability, so agents need to be provided with real-time updated and reliable information sources. For instance, agents can obtain and integrate the latest educational resources and real-time educational data from sources such as national educational resource public service platforms, professional educational academic journals, and educational news websites, achieving retrieval-augmented generation of educational content and clearly explaining the basis for the content provided, ensuring the timeliness and accuracy of the educational services offered. Furthermore, after perceiving and understanding the elements of educational scenarios, teaching intelligent agents can automatically generate educational content and products in the form of teaching text dialogues, audio-visual teaching resources, etc., providing full-process support for the established educational roles. For example, when the execution of educational tasks involves programming and logical reasoning, agents can leverage the code generation and debugging capabilities of large models to translate tasks into machine languages such as Python and assist learners in completing programming tasks. For educational tasks requiring embodied operations, teaching intelligent agents can generate operational processes based on physical environment perception capabilities and complete real-time control of hardware and software according to educational user instructions.(4) Educational Content Memory and Reflection The educational content memory of teaching intelligent agents primarily serves to store and retrieve important data during the planning and execution of educational tasks, supporting the self-reflection of agents. Specifically, educational content memory can store foundational data from all steps of educational task planning and execution, such as educational task solution data, interaction Q&A data between agents and learners, external tool invocation processes and results data, and hardware-software control and operation data. Based on the stored foundational data, agents can reflect and process educational knowledge and experience into higher-level information through self-questioning or summarization methods using large models. For example, teaching intelligent agents can reflect to understand the personalized characteristics of the learners they serve and the effectiveness of their teaching interactions. Combining trial-and-error mechanisms or interactive feedback, teaching intelligent agents can summarize failed or inefficient teaching experiences, using them as references for autonomous optimization and improvement of teaching strategies when encountering similar educational tasks again. Additionally, the rich educational memory and reflections stored by agents can serve as important reference knowledge and resources to support the expansion of their teaching capabilities. Based on permission dimensions, educational content memory can be divided into public memory and private memory. Public memory refers to the educational knowledge and teaching resources accumulated by teaching intelligent agents, including subject knowledge graphs, teaching methods, curriculum standards, teaching materials, and auxiliary materials; private memory pertains to individual information closely related to educational users and their roles, such as individual learners’ historical interaction and learning evaluation data, individual teachers’ teaching videos, teaching plans, and evaluation data. Teaching intelligent agents need to reasonably utilize memory data with different permissions, respect the privacy of educational users, and establish corresponding educational data usage norms.(5) Interactive Collaboration and Dynamic Evolution Teaching intelligent agents can achieve collaborative planning and execution of educational tasks through interaction with different roles of educational users, other agents, and educational environments, promoting their dynamic evolution. In interaction with educational users, teaching intelligent agents can fully understand the intentions of different roles of educational users, thereby providing various forms and modalities of human-computer interaction services. For instance, in the online self-learning scenario for learners, they can provide various types of scaffolded intelligent guidance interaction services, supporting multimodal teaching content, including recommended texts, videos, and audio teaching resources, while providing real-time progress evaluations and feedback information. In interactions with other agents, agents can realize supervisory guidance, discussion exchanges, division of labor, collaboration, and even orderly debate modes based on the role-playing of different agents and educational tasks. For example, engaging in debate-style interactions among multiple agents can achieve scientific decomposition and reasonable planning of complex educational tasks. Moreover, educational subjects can be introduced into the multi-agent interaction process, achieving educational goals under human-computer collaborative modes. For instance, during collaborative exam paper construction, based on the subject, knowledge points, and differentiation requirements provided by teachers, different agents can serve as question setters, test takers, and graders to complete the exam paper construction, which is ultimately submitted to the teacher for quality review. In interactions with educational environments, agents can fully utilize external hardware and software tools and their human-computer interaction capabilities based on a comprehensive perception and understanding of educational scenarios and environmental elements, achieving embodied operations and human-computer collaboration. For example, agents can collaborate with learners to complete complex experimental operations or practical processes in subjects such as physics and chemistry through real-time perception of experimental instrument states and precise control of robotic arms. Through continuous collection and analysis of process and outcome data, feedback information in interactions with educational users, other agents, and educational environments, teaching intelligent agents can gradually form educational experiences and reflective knowledge. This experience and knowledge can be stored and retrieved by agents’ memories, used for future educational task planning and execution of educational capabilities, thus achieving dynamic improvement and evolution of their problem-solving abilities. For example, by reflecting on human-computer interaction process information and summarizing scientific instrument control processes, teaching intelligent agents can more efficiently plan scientific experimental steps and provide real-time scaffolding for experimental operations and collaborative services for scientific exploration.4. Applications of Teaching Intelligent Agents Based on Large Models Based on the previously proposed framework, this research illustrates the application of teaching intelligent agents using project-based learning scenarios as an example. Project-based learning is an effective teaching model for cultivating students’ core competencies and higher-order abilities. In a typical project-based learning process, learners usually require continuous support from teachers and peers to complete project works. Teaching intelligent agents can set project-based learning tasks, conduct specific task planning for the completion of project works, support memory and reflection related to project-based learning content, provide multimodal project resource generation, retrieval-augmented generation scaffolding for learning, high-quality code generation and feedback, and expand various capabilities, while supporting human-computer interaction and multi-agent interaction modes. As shown in Figure 3, teaching intelligent agents can assume the roles of “teaching assistant agents” and “peer agents,” each with different task settings, expansion capabilities, and individual memories in different project stages, thus exhibiting differences in capabilities and functions to provide various interactive support for learners. Taking the interdisciplinary theme of “garbage classification” commonly found in information technology or artificial intelligence courses as an example, we elaborate on the role of teaching intelligent agents in various stages of project-based learning.(1) Personalized Problem Posing Driven by Learners Project-based learning requires posing driving questions based on real situations, enabling learners to genuinely feel the urgency and feasibility of solving the problem, thereby stimulating their intrinsic motivation to engage in in-depth exploration and complete the project. Therefore, in the problem-posing stage, the “teaching assistant agent” can first establish a driving question guidance framework based on the preset learning context. On this basis, the “teaching assistant agent” can engage in multimodal online discussions with learners and, according to the characteristics and learning intentions of the learners, adopt personalized dialogue paths and interaction strategies, ultimately guiding learners to autonomously propose the driving questions for the project. The “teaching assistant agent” can utilize the intelligent agent module (Agents Module) within the previously mentioned LangChain open-source technology framework to implement this primary function by treating the preset guidance framework as the target question for each round of dialogue and the learner as a necessary consulting tool in each round of task planning, actively prompting learners for questions and engaging in discussions. For example, regarding the environmental theme of “garbage classification,” the “teaching assistant agent” can create real scenarios for learners, using the external ERNIE-ViLG multimodal large model’s image generation capabilities to illustrate scenes such as “ocean garbage vortex” and “non-degradable plastic waste.” Meanwhile, the “teaching assistant agent” can engage in online discussions about the urgency of garbage management based on the learners’ feedback, continuing to suggest possible necessary steps and methods for garbage management, thereby guiding students to think independently and clarify specific project activities to undertake, such as “how to promote garbage classification environmental awareness” or “how to create a smart garbage bin for garbage classification.”(2) Collaborative Design of Project Plans To address the personalized driving questions posed by learners, teaching intelligent agents can establish dynamic discussion groups between learners and agents, helping learners determine specific solutions based on their educational task planning capabilities and decompose the plans. Group discussions can be conducted in either an “agent-led” or “learner-led” mode, depending on the project goals and learner styles. In the “agent-led” mode, teaching intelligent agents can utilize the previously mentioned AutoGen open-source technology framework to construct multiple “peer agents,” simulating and playing different roles of human group members in the project-based learning process, facilitating multi-role interactions between human learners and multiple “peer agents.” During this process, the “teaching assistant agent” primarily selects the speaker for each round (either a human learner or a “peer agent”) based on the historical dialogue content of the group and the project goals, broadcasting the spoken content to all group members, thereby achieving collaborative design of the project implementation plan through multiple rounds of speaking and information dissemination. In the “learner-led” mode, learners can directly choose to engage in dialogue with different “peer agents.” Specifically, in the “agent-led” mode, to solve the driving question of “how to promote garbage classification environmental awareness,” teaching intelligent agents can first use their task planning capabilities to decompose the solution to the driving question into multiple executable sub-tasks such as “understanding garbage classification rules,” “collecting examples of various typical garbage,” and “creating promotional materials and carriers.” Based on the planned sub-tasks, teaching intelligent agents can construct multiple “peer agents” to discuss specific project plans with learners, providing strategic scaffolding and understanding learners’ opinions in real-time while providing timely feedback, gradually guiding learners to collaboratively complete the design of the project plan. For example, regarding the sub-task of “creating promotional materials and carriers,” multiple “peer agents” can propose different solutions for promotional formats such as posters, web pages, or WeChat mini-programs during group discussions. If learners propose to support a webpage format based on their interests and expertise, the “teaching assistant agent” can select the “peer agent” with relevant capabilities to speak, elaborating on the design and discussion of the “garbage classification promotional website.” Subsequently, the “teaching assistant agent” can broadcast the obtained plan within the group and select other “peer agents” to refine suggestions, such as first clarifying the rules of “garbage classification” and displaying them prominently on the website.(3) Collaborative Completion of Project Works Based on the designed project plan, teaching intelligent agents can construct corresponding “peer agents” to collaborate with learners in the production of project works. The production of project works first requires the collection of relevant materials and information. For example, in the sub-task of “understanding garbage classification rules,” learners need to collect the latest garbage classification standards from their locality. Since garbage classification standards vary and change across the world, the “teaching assistant agent” can provide accurate content generation using the RAG method. As shown in Figure 4, the “peer agents” utilize various functions provided by the LangChain framework to efficiently implement the RAG process. First, the “index establishment” step involves the agents automatically crawling or manually screening resources from government environmental department official websites on the internet, collecting reliable information using the Document Loaders method within LangChain, and employing the Text Splitter method to segment long texts into semantically relevant short sentences. On this basis, the “question retrieval” step selects the text feature vectors extracted by the large model and stores them in the Chroma vector database to construct a feature retrieval knowledge base for “garbage classification standards.” Subsequently, the retrieval-based question answering method (Retrieval QA) extracts the text features of user questions and retrieves information from the vector database that is most relevant to the question based on feature similarity. Finally, the “content generation” step inputs the retrieved information and user question information into a prompt template to construct complete prompt information, using the large model to ultimately generate the latest and correct garbage classification rules from various regions. After the relevant materials are collected, the “teaching assistant agent” can further assist learners in creating the “garbage classification promotional website.” In this process, learners can interact with the agents in multimodal ways, presenting hand-drawn design drafts of the website front-end or communicating the back-end design concepts to the agents using text. The “teaching assistant agent” can call multiple external web design scripting language libraries to automatically generate corresponding webpage code. Meanwhile, teaching intelligent agents can utilize the embedded machine language execution environment within the AutoGen framework to directly execute the generated code and provide feedback on the execution results and error prompts to the “teaching assistant agent,” guiding it to further automatically modify and improve the code. Learners can also provide feedback on modifications based on the already generated pages through screenshots or natural language, allowing the “teaching assistant agent” to further adjust and optimize the website based on the feedback content.(4) Multi-Role Evaluation of Project Works In the project work presentation and evaluation stage, both the “teaching assistant agent” and “peer agents” can conduct evaluations of the project works from their respective perspectives, providing teacher evaluations and peer evaluations. The agents generate corresponding process and outcome evaluation rubrics in advance based on the personalized driving questions and project plans. During the presentation of the project works collaboratively produced by human and machine, the “teaching assistant agent” and “peer agents” can conduct process evaluations of the presentation content from the perspectives of teachers and external peers based on the different process information stored in their respective memory modules and the corresponding evaluation rubrics. For instance, regarding the project work of the “garbage classification promotional website,” agents can objectively evaluate based on the learners’ contributions during group discussions and the website production process. Additionally, agents can leverage their environmental interaction capabilities to interactively test the webpage by clicking through it, conducting quantitative statistics on the design of the website from perspectives such as the number of elements, color choices, and multimedia usage, thereby obtaining a result-oriented evaluation of the project work. Furthermore, agents can use their multimodal perception capabilities to input the learners’ presentation content in video form, evaluating aspects such as the fluency of the learners’ language, logical coherence of the content, and completeness of the presentation. Based on the evaluation information from this round of project works, both the “teaching assistant agent” and “peer agents” can combine the memories they have stored to reflectively question the learners regarding their knowledge mastery, skill acquisition, and interaction effectiveness, promoting simultaneous improvement in their educational task planning and teaching interaction capabilities. Thus, in the next round of project-based learning, the agents can more effectively conduct project-based learning under the same theme for new groups of learners, achieving evolution in the educational capabilities of the agents.5. Conclusion and Prospects Teaching intelligent agents based on large models are one of the significant future research directions and application forms of generative artificial intelligence in the field of education, as well as a core technical path to address human-computer collaboration models in education. The teaching intelligent agent architecture proposed in this research, centered around large models and their various capabilities, combines the diverse scenario demands and multi-role service characteristics in the education sector, aiming to inspire and assist in the design and realization of future highly intelligent education systems. On this basis, the roles, functions, and collaborative practice paths of teaching intelligent agents in project-based learning scenarios have been elaborated in detail. Research on teaching intelligent agents is still in its infancy; this article proposes the following research prospects for their future development: 1. The design and development of teaching intelligent agents urgently need to be emphasized to ensure that the education sector can fully leverage cutting-edge technologies such as generative artificial intelligence to rapidly enhance the intelligence and interactivity of various educational products and services. The application of multi-agent technologies should be emphasized, utilizing agents to simulate and play different key educational roles to achieve efficient teaching interaction processes such as “discussion-practice-reflection.” At the same time, the respective wisdom advantages of agents and humans should be fully utilized to achieve a more reasonable and effective human-computer collaborative education model. 2. Compared to general or other vertical domain intelligent agents, the construction of educational intelligent agents has special demands and characteristics unique to the education sector, requiring careful consideration of the complexity of educational scenarios and teaching subjects, and the design of proprietary educational large models and their core educational capabilities. Educational large models need to deeply understand educational resources, teaching subjects, and teaching processes, supported by relevant educational theories and learning science theories. 3. The design of teaching intelligent agents needs to fully consider their impact on learners’ values and ethical concepts, ensuring that the behaviors of agents align with social moral standards and educational objectives. During the execution of educational tasks, teaching intelligent agents need to possess continuous learning and self-optimization capabilities, constantly accumulating experiences through interactions with educational stakeholders to enhance the reliability and credibility of their educational services, providing inclusive educational resources and teaching strategies while avoiding biases and discrimination.References:[1] Franklin S, Graesser A. Is it an Agent, or just a Program?: A Taxonomy for Autonomous Agents [A]. International Workshop on Agent Theories, Architectures, and Languages [C]. Berlin, Heidelberg: Springer Berlin Heidelberg, 1996. 21-35. [2] Xi Z, Chen W, Guo X, et al. The Rise and Potential of Large Language Model Based Agents: A Survey [DB/OL]. https://arxiv.org/abs/2309.07864, 2023-09-19. [3] Soller A, Busetta P. An Intelligent Agent Architecture for Facilitating Knowledge Sharing Communication [A]. Rosenschein S J, Wooldridge M. Proceedings of the Workshop on Humans and Multi-Agent Systems at the 2nd International Joint Conference on Autonomous Agents and Multi-Agent System [C]. New York: Association for Computing Machinery, 2003. 94-100. [4] Woolf B P. Building Intelligent Interactive Tutors: Student-centered Strategies for Revolutionizing e-Learning [M]. Burlington: Morgan Kaufmann, 2010. [5] Liu Qingtang, Ba Shen et al. A Review of the Role Mechanism of Educational Intelligent Agents in Cognitive Learning [J]. Journal of Distance Education, 2019, 37(5): 35-44. [6] Liu Qingtang, Ba Shen et al. Research on the Design of Social Cues for Educational Intelligent Agents in Video Courses [J]. Research on Educational Technology, 2020, 41(9): 55-60. [7] Liu Sannvya, Peng Zhan et al. Intelligent Education from the Perspective of New Data Elements: Models, Paths, and Challenges [J]. Research on Educational Technology, 2021, 42(9): 5-11+19. [8] Swan, M., Kido, T., Roland, E., Santos, R.P.D. Math Agents: Computational Infrastructure, Mathematical Embedding, and Genomics [DB/OL]. https://arxiv.org/abs/2307.02502, 2023-07-04. [9] Jinxin S, Jiabao Z, et al. CGMI: Configurable General Multi-agent Interaction Framework [DB/OL]. https://arxiv.org/abs/2308.12503, 2023-08-28. [10] Durante Z, Huang Q, et al. Agent AI: Surveying the Horizons of Multimodal Interaction [DB/OL]. https://arxiv.org/abs/2401.03568, 2024-01-25. [11] Marino K, Rastegari M, et al. Ok-vqa: A Visual Question Answering Benchmark Requiring External Knowledge [A]. Robert S. and Florian K.. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition [C]. Piscataway: IEEE Computer Society, 2019. 3195-3204. [12] Gao Y, Xiong Y, et al. Retrieval-augmented Generation for Large Language Models: A Survey [DB/OL]. https://arxiv.org/abs/2312.10997, 2024-01-05. [13] Li H, Su Y, et al. A Survey on Retrieval-augmented Text Generation [DB/OL]. https://arxiv.org/abs/2202.01110, 2022-02-13. [14] Boiko D A, MacKnight R, Gomes G. Emergent Autonomous Scientific Research Capabilities of Large Language Models [DB/OL]. https://arxiv.org/abs/2304.05332, 2023-04-11. [15] Wei J, Wang X, et al. Chain-of-thought Prompting Elicits Reasoning in Large Language Models [J]. Advances in Neural Information Processing Systems, 2022, 35: 24824-24837. [16] Wang X, Wei J, et al. Self-consistency Improves Chain of Thought Reasoning in Language Models [DB/OL]. https://arxiv.org/abs/2203.11171, 2023-03-07. [17] Yao S, Yu D, et al. Tree of Thoughts: Deliberate Problem Solving with Large Language Models [J]. Advances in Neural Information Processing Systems, 2024, 36: 1-11. [18] Yao S, Zhao J, et al. ReAct: Synergizing Reasoning and Acting in Language Models [DB/OL]. https://arxiv.org/abs/2210.03629, 2023-03-10. [19] Wang G, Xie Y, et al. Voyager: An Open-Ended Embodied Agent with Large Language Models [A]. Colas C, Teodorescu L, Ady N, Sancaktar C, Chu J. Intrinsically-Motivated and Open-Ended Learning Workshop@ NeurIPS2023 [C]. Cambridge, MA: MIT Press, 2023. [20] Wu Q, Bansal G, et al. Autogen: Enabling Next-gen LLM Applications via Multi-agent Conversation Framework [DB/OL]. https://arxiv.org/abs/2308.08155, 2023-10-03. [21] Bran A M, Cox S, et al. ChemCrow: Augmenting Large-language Models with Chemistry Tools [DB/OL]. https://arxiv.org/abs/2304.05376, 2023-10-02. [22] Gao D, Ji L, et al. AssistGPT: A General Multi-modal Assistant that Can Plan, Execute, Inspect, and Learn [DB/OL]. https://arxiv.org/abs/2306.08640, 2023-06-28. [23] Liang T, He Z, et al. Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate [DB/OL]. https://arxiv.org/abs/2305.19118, 2023-05-30. [24] Hong S, Zheng X, et al. Metagpt: Meta Programming for Multi-agent Collaborative Framework [DB/OL]. https://arxiv.org/abs/2308.00352, 2023-11-06. [25] Park J S, O’Brien J, et al. Generative Agents: Interactive Simulacra of Human Behavior [A]. Follmer S, Han J, Steimle J, Riche N H. Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology [C]. New York: Association for Computing Machinery, 2023. 1-22. [26] Wang L, Ma C, et al. A Survey on Large Language Model Based Autonomous Agents [J]. Frontiers of Computer Science, 2024, 18(6): 1-26. [27] LangChain. LangChain [EB/OL]. https://python.langchain.com/docs/get_started/introduction, 2023-11-12. [28] Auto-GPT. Auto-GPT [EB/OL]. https://docs.agpt.co/, 2023-12-29. [29] AutoGen. AutoGen [EB/OL]. https://microsoft.github.io/autogen/, 2023-12-28. [30] BabyAGI. BabyAGI [DB/OL]. https://github.com/yoheinakajima/babyagi, 2023-12-28. [31] Li G, Hammoud H, et al. Camel: Communicative Agents for “mind” Exploration of Large Language Model Society [J]. Advances in Neural Information Processing Systems, 2024, 36: 1-34. [32] Lu Yu, Yu Jinglei et al. Research and Prospects of Educational Applications of Multimodal Large Models [J]. Research on Educational Technology, 2023, 44(6): 38-44. [33] Wolfram. WolframAlpha [EB/OL]. https://www.wolframalpha.com/, 2023-11-11. [34] Ma Ning, Guo Jiahui et al. Evidence-based Project-based Learning Model and System in the Context of Big Data [J]. China Educational Technology, 2022, (2): 75-82. [35] Zhang Z, Han X, et al. ERNIE: Enhanced Language Representation with Informative Entities [A]. Korhonen A, Traum D, Màrquez L. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics [C]. Stroudsburg: Association for Computational Linguistics, 2019. 1441-1451.Author Information: Lu Yu: Associate Professor, Doctoral Supervisor, research direction in artificial intelligence and its educational applications. Yu Jinglei: Doctoral student, research direction in artificial intelligence and its educational applications. Chen Penghe: Lecturer, PhD, research direction in artificial intelligence and its educational applications.

ENDProduced by: Jiao YangProofread by: Li YaxuanReviewed by: Song Lingqing

Leave a Comment Cancel reply