A Comprehensive Review of Multimodal Learning Analytics

Abstract: Multimodal learning analytics provides a new perspective for measuring and evaluating learning in complex environments. This article analyzes the research community, themes, fields, evolution, and hotspots of multimodal learning analytics using various analytical methods, based on a database that covers various academic organizations and papers on multimodal learning analytics. The research results show that the multimodal learning analytics research has initially formed several academic groups, but they are scattered; major research themes include student programming modeling, convolutional and symmetrical deep neural networks, sensor data collection, human behavior recognition, collaborative construction, and automatic evaluation; the three main research fields are multimodal perception and emotion analysis, multimodal data fusion and mining, and multimodal representation and object recognition; the evolution has progressed from knowledge model and framework design to multimodal feature extraction and emotion analysis, and further to multimodal representation learning and deep learning; research hotspots include multimodal learning data collection and system design, multimodal learning cognition and behavior analysis, and multimodal learning analysis method and tool development.

Keywords: Multimodal Learning Analytics; Multimodal Data; Artificial Intelligence; Data Fusion

In recent years, artificial intelligence technology has become the focus of various industries, and its development plans have also been elevated to the level of national strategy. Relying on machine learning, deep learning, and high computing power technologies in artificial intelligence, human-computer interaction is gradually transitioning from traditional single interaction modes to integrated interaction modes that include visual, auditory, and tactile modalities. In the process of human-computer intelligent interaction, technologies such as speech recognition, semantic analysis, emotion analysis, motion capture, and image recognition are widely used, and intelligent interaction technologies that integrate multiple information modalities promote the development of multimodal fusion technologies. Multimodal artificial intelligence can promote intelligent human-computer interaction by perceiving different information dimensions and sources. Moreover, learning is often multimodal, as individuals interact with learning objects through physical or digital tools. Therefore, applying multimodal fusion technology can help perceive and understand learners from multiple data sources, aiding them in better thinking and learning. With the intersection of learning analytics, multimodal fusion, biometric recognition, semantic analysis, and other technologies, multimodal learning analytics has become a new research direction, attracting significant attention from researchers. Currently, multimodal learning analytics research has made preliminary progress, where relevant researchers collect learning trace data from learners through EEG, eye-tracking, human-computer interaction, etc., to analyze their learning performance in complex environments. To further understand the development context and research hotspots in this field, this study quantitatively and qualitatively analyzes existing research outcomes, exploring research patterns and uncovering potential research value, aiming to provide insights for promoting multimodal learning analytics research.

1. Sources of Literature and Research Design for Multimodal Learning Analytics

1. Sources of Literature

As multimodal learning analytics belongs to an interdisciplinary research direction, many research organizations have discussed research in this area. The main research organizations involved in multimodal learning analytics include the International Conference on Multimodal Interaction, the International Conference on Learning Analytics and Knowledge, the International Conference on Technological Ecosystems for Multimodal Culture, the International Conference on Knowledge Discovery and Data Mining, the International Conference on Cloud Computing and Internet of Things, the Association for Computing Machinery Special Interest Group on Computer Science Education, the International Conference on Human Factors in Computer Systems, the International Conference of the Learning Sciences, the International Conference on Computer-Supported Collaborative Learning, and the Multimedia International Conference. Among them, the International Conference on Multimodal Interaction and the International Conference on Learning Analytics and Knowledge have held many discussions around multimodal learning analytics.

Additionally, regarding the selection of databases, this study uses Web of Science, Elsevier ScienceDirect, SpringerLink, and ACM Digital Library as analysis sources, which cover various educational journals and conference papers. For keyword selection, this study uses “Multimodal Learning Analytics,” “Multimodal Data,” “Multimodal Behavioral Analytics,” “Eye-Tracking Learning Analytics,” “EEG Learning Analytics,” “Multimodal Information Learning and Analytics,” “Multimodal Deep Learning,” “Multi-Sensor Analytics,” and “Wearable Motion Sensor Analytics” as keywords for filtering, extracting literature directly related to multimodal learning analytics through EndNote, ultimately obtaining a sample of 174 papers.

2. Research Methods and Process

This study employs co-occurrence analysis, cluster analysis, dimensional analysis, and time series analysis: co-occurrence analysis is used to analyze the co-occurrence relationships of keywords to identify core keywords; cluster analysis is used to analyze the categories of keywords, focusing on the core research content of the field; dimensional analysis aims to identify the distribution state of the research field; time series analysis is dedicated to determining the direction and trends of the research. The specific analysis process is as follows: (1) summarize the literature related to multimodal learning analytics published since 2000 and filter the literature data; (2) extract key information from the literature data, such as authors, research institutions, article titles, keywords, publication years, etc., using the Bicomb bibliographic co-occurrence system to filter and clean keywords, merging synonyms and singular/plural forms; (3) visualize the sample data using knowledge graph analysis software CiteSpace 5.5.2 and data analysis software SPSS 22.0, generating corresponding graphs; (4) combine the analysis results and literature content to sort out the current academic community, research themes, research fields, and evolution of multimodal analysis, revealing its research hotspots.

2. Academic Community of Multimodal Learning Analytics

1. Analysis of Research Institutions

In terms of the total number of papers, Stanford University ranks first with 11 publications; followed by the University of Southern California with 6 publications; and the University of British Columbia and the University of Valparaiso tie for third with 5 publications each. In terms of institutional collaboration, multiple research organization groups have formed led by these universities. For example, Stanford University researchers have collaborative relationships with researchers from the University of California, the University of Southern California, Carnegie Mellon University, and many other universities. It is worth mentioning that research laboratories in internet companies also have a place among research institutions, with strong collaborative relationships with university research institutions, such as Incaa Designs with Stanford University, Microsoft Research with Carnegie Mellon University, Tencent Lab with Wuhan University, and Nokia Bell Lab with the University of Edinburgh. The collaboration between schools and enterprises to promote research and development in this field helps to solve technical challenges and facilitate the transformation of research outcomes.

2. Analysis of Academic Communities

To further understand the formation state of academic groups, this study analyzes the collaborative relationships among different researchers, generating a researcher collaboration relationship graph containing 318 nodes and 620 connections; simultaneously, to facilitate viewing the main academic groups, this study extracts researchers with a frequency > 2, resulting in the displayed graph. As shown in the graph, the research on multimodal learning analytics has initially formed several academic groups, including groups led by Marcelo Worsley and Mutlu Cukurova, Rodolfo Villarroel and Roberto Munoz, Marcus Specht and Hendrik Drachsler, Laura Allen and Caitlin Mills. However, these academic groups are scattered, lacking communication and collaboration, which is somewhat related to the development status of this research direction and the inherent characteristics of interdisciplinary studies. By comparing the academic groups with the previous research institutions, it can be seen that different research institutions participate in different academic groups, reflecting a strong intersectionality. In terms of team influence, different groups emphasize different research directions, such as Marcelo Worsley focusing on multimodal learning analysis methods and their practical applications, while Rodolfo Villarroel focuses on multimodal data analysis tools and visualization. These academic groups promote the continuous deepening of multimodal learning analytics through relevant theories, methods, technologies, and practices.

Figure 1: Analysis of Collaborative Relationships in Academic Groups of Multimodal Learning Analytics (Frequency > 2)

3. Research Themes of Multimodal Learning Analytics

1. Types of Multimodal Data and Analysis Methods

Understanding the data characteristics and analysis methods of multimodal learning analytics can grasp its unique research attributes. From the data analysis process perspective, the primary feature of multimodal learning analytics research is the diversification of data types, which involves analyzing two or more data types, such as analyzing individuals’ emotional states and their potential polarity through comments, images, videos, blogs, and human-computer interactions. Summarizing and categorizing the data used in existing research shows that the data sources mainly include physiological data (expressions, heartbeat, EEG, gaze, arousal), behavioral data (speech, writing, gestures, posture, steps), learning trajectory data (learning platform logins, learning records, learning artifacts), and multimedia data (audio and video, digital documents, digital pens). Data fusion and mining is another important feature of multimodal learning analytics research. In data aggregation and fusion, researchers develop multimodal learning centers to collect and integrate system data from different learning scenarios. In data mining and processing, methods include machine learning, random forests, linear mixed-effect models, and traditional statistical methods such as descriptive analysis, inferential analysis, and multivariate analysis.

2. Keyword Co-occurrence Analysis of Multimodal Learning Analytics

This study finds that the top ten keywords are: Deep Learning, Learning Analytics, Machine Learning, Classification, Fusion, Neural Network, Data Mining, Wireless Sensor Network, Emotion Recognition, Wearable Sensor. Through keyword co-occurrence analysis, it can be seen that multimodal learning analytics is closely integrated with artificial intelligence represented by deep learning and machine learning, while wearable devices and perception technologies provide data source support for the collection of multiple modalities of learners’ data, and neural networks, pattern recognition, and predictive classification support the algorithms of multimodal learning analytics. Moreover, data fusion is also an important direction of multimodal learning analytics research, which involves processing learning data obtained from different sensors and then fusing them (e.g., data layer fusion, feature layer fusion, decision layer fusion) to form a unified dataset. During the data fusion process, dimensionality reduction is also needed to reduce data with low relevance, speeding up machine learning and model building.

3. Theme Clustering Analysis of Multimodal Learning Analytics

The theme clustering in this study is mainly used to focus on and analyze current research hotspots and themes. Through clustering analysis of research themes, it is possible to understand current research hotspots, with clustering results as shown in the figure. The figure shows that the themes of multimodal learning analytics research can be roughly divided into 15 categories: Cluster #0 Modeling students programming and Cluster #9 Understanding regulation focus on modeling students’ program design and learning regulation. Cluster #1 Convolutional neural network, Cluster #2 Symmetrical deep neural network, and Cluster #11 Rule-based learning are commonly used analysis methods in multimodal learning analytics, where Convolutional neural network is used for processing sequential data and recognizing image-based texts, Symmetrical deep neural network is used for analyzing and recognizing human movements, while Rule-based learning is mainly used for analyzing eye movement data. Cluster #3 Scandent decision forest and Cluster #8 Fog network are multimodal data analysis algorithms, mainly used to compare the effectiveness of different algorithms. Cluster #4 Multivariate machine and Cluster #12 Effective digital learning emphasize the importance of multimodal analysis, providing scientific evidence for effective digital learning through multidimensional data analysis. Cluster #5 Sensor data acquisition and Cluster #14 Human action recognition focus on the collection of EEG data, action data, location data, and behavior recognition. Cluster #6 Pair programming and Cluster #13 Collaborative construction revolve around collaborative programming and collaborative learning. Cluster #7 Automated writing evaluation system provides immediate learning feedback and grade tracking through text analysis in writing systems. Cluster #10 Video classification primarily studies multimodal performance learning in video classification, a form of machine learning. The clustering results indicate that multimodal learning analytics focuses on both specific technical issues such as data recognition and acquisition, analysis algorithms, and how to leverage analysis to enhance learning, showcasing its practical application value.

Figure 2: Theme Clustering Map of Multimodal Learning Analytics Research

4. Research Fields and Evolution of Multimodal Learning Analytics

1. Distribution of Research Fields in Multimodal Learning Analytics

To grasp the overall status of multimodal learning, this study explores the distribution of research fields in multimodal learning analytics from the perspective of multidimensional scaling analysis. Specifically, this study uses Bicomb 2.0 to convert the data, obtaining a data format supported by SPSS 25.0, selecting Multidimensional Scaling (ALSCAL), choosing “data as distance” in the “distance” option, and “asymmetric square” in the measurement method, generating the distribution of research fields as shown in the figure. Here, Dimension 1 and Dimension 2 represent the horizontal and vertical coordinates of the figure, and the values on the coordinate axes represent the distance of keywords to the origin— the closer to the origin, the higher the centrality, indicating a more important position in the field of multimodal learning analytics. The proximity of each field is calculated based on the coordinate values of more concentrated data. As shown in the figure, multimodal learning analytics can be divided into three major research fields:

Figure 3: Distribution of Research Fields in Multimodal Learning Analytics

The first field is multimodal perception and emotion analysis, mainly distributed in the first and fourth quadrants (upper right, lower right), with a relatively low centrality. This field primarily relies on sensor networks, tactile sensors, semantic features, data mining, natural language processing, image retrieval, human movement measurement, and EEG for multimodal perception and emotion analysis.

The second field is multimodal data fusion and mining, mainly concentrated in the second quadrant (upper left), relatively close to the origin, indicating high centrality, and is a more mature research area in multimodal learning analytics. This field primarily relies on online learning platforms, starting from learning emotions and learning performance, conducting interactions and fusions across various modalities such as visual and auditory, with analysis techniques including multimodal neuroimaging, mixed signal compression, self-organizing feature mapping, human-computer interaction interfaces, and decision forests.

The third field is multimodal representation and object recognition, mainly distributed in the third quadrant (lower left), with a relatively low centrality and scattered keywords, indicating that researchers have adopted various representation methods when conducting multimodal learning analytics. This field processes multimodal data through deep neural networks, deep recurrent neural networks, etc., ultimately forming visual representations, specifically including multimodal representation learning, activity and object recognition, image retrieval based on long texts, and face recognition.

From the distribution of these three research fields, it can be seen that multimodal learning analytics research covers cross-modal matching and generation technologies based on heterogeneous data from multiple sources, multimodal fusion architectures and object recognition based on deep learning, and learning state analysis based on information processing and visual representation.

2. Evolution of Multimodal Learning Analytics

To further understand the overall changes in multimodal learning analytics research, this study analyzes the evolution of focus points over the years. In the CiteSpace workspace, this study sets the node type of “Note Types” to “Keyword,” runs it, and selects the visualization view type as “Timezone View,” generating results as shown in the figure. It can be seen that in 2002, there was only the term “Algorithm,” originating from Barbara Hammer et al.’s paper “Learning Vector Quantization for Multimodal Data.” At that time, the concept of “multimodal learning analytics” had not yet emerged, and only a few researchers analyzed the processing of multimodal data. By 2010, research on multimodal learning analytics primarily focused on data frameworks and knowledge models. Since 2014, with the gradual development of blended learning, learning analytics, and big data technologies, a large amount of online and offline learning data has become available for collection and use, where researchers extract and analyze multimodal feature data to identify learning emotions, etc. After 2018, artificial intelligence technology has been widely applied in multimodal learning analytics. In the direction of educational artificial intelligence development, Cukurova proposed using multimodal machine learning to create and replicate human cognition, designing artificial artifacts based on artificial intelligence technology through multimodal learning analytics, ultimately supporting, aiding, and extending human cognitive abilities. Overall, multimodal learning analytics research has started relatively late but has developed rapidly, gaining widespread attention from various academic groups in different disciplines in recent years, with its evolution reflecting a progression from knowledge model and framework design to multimodal feature extraction and emotion analysis, and further to multimodal representation learning and deep learning, enriching the ideas and outcomes of learning analytics and learning sciences.

Figure 4: Evolution of Multimodal Learning Analytics

5. Research Hotspots in Multimodal Learning Analytics

1. Multimodal Learning Data Collection and System Design

This hotspot focuses on data acquisition and organization, primarily centering on the design of data collection and analysis systems based on multiple sensors: (1) The development and application of low-cost sensors have, to some extent, promoted multimodal learning analytics, where researchers use low-cost sensors to assess students’ attention levels in classroom environments, skill levels in workshops, and posture classifications in oral presentations. (2) Researchers have conducted comparative analyses of current architectural solutions for multimodal learning analytics from the perspective of software architecture, finding that existing architectures do not effectively support the data value chain (DVC) for analyzing different learning activities, indicating that further exploration is needed in the distribution, flexibility, and scalability of multimodal learning analytics system architecture (focusing more on data organization and decision-making).

In the future, the design of multimodal learning analytics systems should further integrate specific learning structures and descriptions of learning activities in learning scenarios to deepen learning context analysis. Currently, many devices still have their proprietary data formats that do not support relevant configurations. Therefore, in terms of architecture, the flexibility and scalability of the system should be enhanced to adapt to different learning systems. Additionally, the complexity of multimodal data requires a high level of data literacy from data stakeholders. Thus, in human-computer interaction design, simpler and more user-friendly interfaces are needed to present analysis results based on educational theories, contextualized, and meaningful.

2. Multimodal Learning Cognition and Behavior Analysis

This hotspot is a key area of multimodal learning analytics research, mainly including three aspects: (1) Analysis of learning psychological states based on multimodal data, namely analyzing learners’ cognitive processes, human reasoning, and emotional states through multimodal data, and optimizing learning environment design accordingly to enable learners to learn more effectively and deeply. (2) Collaborative learning and problem-solving assessment; for instance, Kyllonen et al. designed and developed a computational model framework for assessing learning states from the perspective of multimodal measurement—this framework can capture, analyze, and measure complex human behaviors, analyzing noisy, unstructured, multimodal data using multimodal data including audio, video, and activity log files, constructing a hierarchical analysis method to model the temporal dynamics of human behavior; Riquelme et al. collected students’ voice data using responders to analyze the continuity and motivational aspects of group collaboration. (3) Assessment of learning performance and problem diagnosis analysis; existing learning analytics and educational data mining mainly focus on behavior analysis in computer-mediated interactions such as online courses and cognitive tutoring, while multimodal learning analytics provides new insights for diagnosing learning performance in more complex and open environments. Currently, the applications of multimodal perception and assessment mainly include three areas: assessing student knowledge, assessing student emotions and physiological states, and assessing student intentions and beliefs.

In the future, multimodal learning cognition and behavior analysis will explore learning habits, learning patterns, cognitive characteristics, and social preferences in different learning environments based on the integration of comprehensive learning process data, revealing cognitive and social processes in learning to design effective learning activities and the learning environments and resources that support these activities. By integrating multimodal neuroimaging, educational neuroscience, brain science, and other disciplines, it will help uncover the mechanisms of learning occurring in digital environments, forming new learning metaphors.

3. Development of Multimodal Learning Analytics Methods and Tools

This hotspot focuses on the development of analytical methods and tools, mainly including multimodal learning analytics frameworks, artificial intelligence analysis methods, and multimodal learning analytics tool design. A core component of multimodal learning analytics is the development of models and frameworks to establish standards and norms for data analysis in different contexts. To address the growing need for multimodal data, it is necessary to construct an intelligent multimodal analysis framework capable of effectively extracting information from multiple modalities. Poria et al. adopted integrated feature extraction methods to develop a new multimodal information extraction agent, inferring and aggregating semantic and emotional information from user-generated multimodal data using text, audio, and video features. In terms of analysis methods, Scherer utilized multimodal sequence classifiers to analyze expressions related to laughter during multi-party dialogues in natural environments, simultaneously extracting frequency and spectrum features from audio streams and motion-related behavioral features from video streams, achieving a highly accurate classification model using hidden Markov models and recurrent state networks for recognizing paralinguistic behaviors during communication. Overall, the functionalities and analytical content of multimodal learning analytics tools developed by early researchers are relatively limited, relying on custom-developed scripts, and there is an urgent need for new tools that are universal and robust in current multimodal learning analytics.

In the future, multimodal learning analytics methods should, on one hand, draw on mature and cutting-edge analysis algorithms from the field of computer science for educational context applications; on the other hand, during educational research, further optimize algorithms based on the complexity of specific contexts, analysis dimensions, time and space complexities, and robustness, forming unique methods applicable to educational contexts. In terms of tool development, universal analysis frameworks should be designed to establish unified data standards and interface specifications, developing pluggable analysis modules and creating a plugin library for different analytical contents to enhance the reusability of tools.

6. Conclusion

As a direction formed by interdisciplinary intersections, multimodal learning analytics explores cognitive activities and their neural mechanisms in human learning processes through data mining and artificial intelligence technologies, analyzing the processes and essence of learning, cognition, and development. This study clarifies the main academic communities, outlines the main research themes, sketches the main research fields, surveys the evolution of research, and analyzes research hotspots through a review and analysis of relevant literature. In the future, multimodal learning analytics needs further development in tracking and measuring with multiple sensors, aggregating and fusing cross-space data, analyzing and modeling across modalities, and optimizing artificial intelligence technologies and analytical methods to better optimize learners’ learning experiences, enabling them to engage in learning more effectively and deeply.

References

[1]Soleymani M, Garcia D, Jou B, et al. A survey of multimodal sentiment analysis[J]. Image and Vision Computing, 2017,(9):3-14.

[2]Mitri D D, Scheffel M, Drachsler H, et al. Learning pulse: A machine learning approach for predicting performance in self-regulated learning using multimodal data[A]. The Seventh International Learning Analytics & Knowledge Conference[C]. New York: ACM, 2017:188-197.

[3]Berg A M, Mol S T, Kismihók G, et al. The role of a reference synthetic data generator within the field of learning analytics[J]. Journal of Learning Analytics, 2016,(1):107-128.

[4]Ruffaldi E, Dabisias G, Landolfi L, et al. Data collection and processing for a multimodal learning analytic system[A]. The Fourth Science and Information Computing Conference[C]. New York: IEEE, 2016:858-863.

[5]Schneider J, Di Mitri D, Limbu B, et al. Multimodal learning hub: A tool for capturing customizable multimodal learning experiences[A]. Lifelong Technology-Enhanced Learning[C]. Berlin: Springer, 2018:45-58.

[6]Segal A, Hindi S, Prusak N, et al. Keeping the teacher in the loop: Technologies for monitoring group learning in real-time[A]. International Conference on Artificial Intelligence in Education[C]. Berlin: Springer, 2017:64-76.

[7]Echeverría V Domínguez F, Chiluiza K. Towards a distributed framework to analyze multimodal data[A]. Proceeding of Workshop Cross-LAK–held at LAK’16[C]. New York: ACM, 2019:52-57.

[8]Cukurova M. Learning analytics as AI extenders in education: Multimodal machine learning versus multimodal learning analytics[A]. International Conference on Artificial Intelligence and Adaptive Education[C]. New York: ACM, 2019:1-3.

[9]Raca M, Tormey R, Dillenbourg P. Sleepers’ lag-study on motion and attention[A]. Proceedings of the Fourth International Conference on Learning Analytics and Knowledge[C]. New York: ACM, 2014:36-43.

[10]Echeverría V, Avendaño A, Chiluiza K, et al. Presentation skills estimation based on video and Kinect data analysis[A]. Proceedings of Multimodal Learning Analytics Workshop and Grand Challenge[C]. New York: ACM, 2014:53-60.

[11]Munoz R, Villarroel R, Barcelos T S, et al. Development of a software that supports multimodal learning analytics: A case study on oral presentations[J]. Journal of Universal Computer Science, 2018,(2):149-170.

[12]Shankar S K, Prieto L P, Rodríguez-Triana M J, et al. A Review of multimodal learning analytics architectures[A]. IEEE 18th International Conference on Advanced Learning Technologies[C]. New York: IEEE, 2018:212-214.

[13]Worsley M, Abrahamson D, Blikstein P, et al. Situating multimodal learning analytics[A]. The Twelfth International Conference of the Learning Sciences: Transforming Learning, Empowering Learners[C]. Berlin: Springer, 2016:1346-1349.

[14]Kyllonen P C, Zhu M, Von Davier A A . Introduction: Innovative assessment of collaboration[M]. Berlin: Springer, 2017:1-18.

[15]Riquelme F, Munoz R, Mac Lean R, et al. Using multimodal learning analytics to study collaboration on discussion groups[J]. Universal Access in the Information Society, 2019, (3):633-643.

[16]Blikstein P, Worsley M. Multimodal learning analytics and education data mining: Using computational technologies to measure complex learning tasks[J]. Journal of Learning Analytics, 2016,(2):220-238.

[17]Prieto L P, Rodriguez Triana M J, Kusmin M, et al. Smart school multimodal dataset and challenges[A]. Proceedings of the Sixth Multimodal Learning Analytics Workshop[C]. Aachen: CEUR, 2017:53-59.

[18]Poria S, Cambria E, Hussain A, et al. Towards an intelligent framework for multimodal affective data analysis[J]. Neural Networks, 2015,(3):104-116.

[19]Scherer S. Multimodal behavior analytics for interactive technologies[J]. Artificial Intelligence, 2016,(1):91-92.

Leave a Comment Cancel reply