Prompt Learning Models for Collaborative Problem Solving

Results Introduction

💡 Paper Title:

Application of Prompt Learning Models in Identifying the Collaborative Problem Solving Skills in an Online Task.

📚 Journal:

Computer-Supported Cooperative Work and Social Computing (CSCW), 2024

🔗 Paper Link:

http://arxiv.org/abs/2407.12487

1. Background and Motivation

Collaborative abilities, innovation capabilities, communication skills, and critical thinking are recognized as core skills that individuals should possess in the 21st century. Collaborative Problem Solving (CPS) ability, as a composite skill, encompasses practical, critical thinking, collaboration, communication, and innovation, making it an essential component of core competencies. Given the importance of CPS ability for personal academic and work achievements, CPS-related topics have garnered significant attention from researchers. They have proposed a series of theoretical frameworks that concretize the abstract concept of CPS and have designed and developed assessment tools to measure students’ CPS abilities.

For the assessment of CPS abilities, traditional measurement methods based on multiple-choice and open-ended questions can only provide students’ answers without offering insights into their problem-solving processes. Therefore, researchers have adopted scenario simulation methods to evaluate students’ CPS abilities within simulated real operational spaces. This assessment method typically involves interactions between students and computers, as well as interactions among students. The computer’s server can record detailed, timestamped interaction behavior data (such as keystrokes on the computer and communication between teammates) during task execution. These process behavior data can provide external evidence of students’ cognition and thinking, allowing researchers to analyze students’ CPS abilities with more refined data.

However, the collected process behavior data is diverse and lacks a specific structure, which complicates subsequent analysis of the process data. Thus, the first step in data processing typically involves coding the raw data to transform it into structured data. However, coding the thousands of recorded overt behavior data according to specific theoretical frameworks is challenging. Traditional manual coding methods are time-consuming and labor-intensive, and they do not support real-time analysis; existing studies that utilize machine learning or deep learning for automated coding rely on large amounts of training data, but the data collected from experimental scenarios is limited; moreover, the accuracy of existing model coding is relatively low. To address these issues, this research leverages the advantages of prompt learning methods in few- and zero-shot tasks, employing pre-trained prompt learning models to automate the coding of CPS process data (mainly the dialogue data among team members).

2. Data

This study collected process behavior data from students during CPS activities using a three-resistor task. This task requires three students to collaborate, and Figure 1 shows the operation interface of one student on the computer. Each student is responsible for adjusting one resistor value to ensure that the voltage assigned to the corresponding resistor meets the required value. Since this circuit is a series circuit, any one person’s operation on the resistor value will affect the voltage values at the other two students’ resistor ends. To meet the task requirements, the three students can discuss and communicate through the dialogue box on the interface. The dataset for this study includes 50,817 behavior data points generated by 126 groups of 378 students, of which 15,950 are communication data and 34,867 are operational data on the computer.

Figure 1: Operation Cross-Section of the Three-Resistor Task

Next, based on the CPS cognitive framework, the collected behavior data will be manually coded. This framework includes two dimensions: cognitive and social. The cognitive dimension consists of seven skills: exploring and understanding (CEU), representing and formulating (CRF), planning (CP), executing actions (CE), executing chats (CEC), monitoring actions (CM), and monitoring chats (CMC); the social dimension comprises four skills: maintaining communication (SMC), sharing information (SSI), establishing shared understanding (SESU), and negotiating (SN). Table 1 lists some examples of the collected process data and their coding results.

Table 1: Collected Process Data and Coding Examples

Since operational data is enumerable and most operational data can be directly mapped to the theoretical framework, this study primarily focuses on the automated coding of communication data among students. Table 2 presents the distribution of dialogue data in CPS skills after coding.

Table 2: Distribution of Dialogue Data in CPS Skills

3. Model Method

The model structure used in this study is shown in Figure 2, which mainly consists of three parts:

Template Construction T: Each piece of communication data is taken as model input and concatenated with a prompt template to construct the concatenated template T;
Input the concatenated template T into the PLM model: For the constructed template T , use the PLM model to predict the generated word at the [MASK] positionprobability distribution;

Label Word Mapping: Map the predicted word distribution generated by the model to label words, taking the word with the highest mapping probability as the final coding result, where the formula indicates thatrepresents the label wordthat can be mapped togenerated words.

Prompt Learning Models for Collaborative Problem Solving

Figure 2: Structure of Prompt Pre-trained Model

4. Experimental Setup and Results

This study implemented three experiments. The first experiment explored how prompt generation strategies and pre-trained model selection affect encoding results, as shown in Table 3. The results indicate that the coding effect is best when using manually designed prompts with T5 as the pre-trained model.

Table 3: Prediction Results with Different Prompt Generation Strategies and Pre-trained Models

Prompt Learning Models for Collaborative Problem Solving

The second experiment compared the encoding effects of the prompt-based pre-trained learning model used in this study with other classification models based on N-gram, deep learning, and finetuning. The prediction results of each model are shown in Table 4. The results indicate the ranking of prediction effects as follows: prompt > finetune > deep learning > n-grams.

Table 4: Automated Coding Results of Different Classification Models

Prompt Learning Models for Collaborative Problem Solving

The third experiment compared the encoding effects of the prompt-based pre-trained model with other models on training sets of different sizes, as shown in Figure 3. The results indicate that the model used in this study performed well on accuracy, macro F1 score, and kappa across all three evaluation metrics, achieving usable results even on small sample training sets (less than 10% of the total sample size).

Prompt Learning Models for Collaborative Problem Solving

Figure 3: Prediction Performance of Different Models on Training Sets of Varying Sizes

5. Contributions of This Paper

This study proposes the use of a prompt-based pre-trained model for the automated coding of process data generated in CPS tasks, achieving better prediction results across three metrics compared to existing models in the task dataset;
This study validates the superiority of the prompt pre-trained model in automated coding under small sample training sets, which helps address the low prediction accuracy issues associated with small datasets in CPS tasks;
The results provide significant convenience for researchers in the field of CPS, facilitating future work on process data analysis and providing data support for real-time feedback. Moreover, the findings are also enlightening for data coding in other areas of humanities and social sciences, such as implementing data coding in text streams, audio streams, and even video streams from interview records.

Leave a Comment Cancel reply