Research Progress on Multimodal Named Entity Recognition Methods

Research Progress on Multimodal Named Entity Recognition Methods

Wang Hairong1,2, Xu Xi1, Wang Tong1, Jing Boxiang1

1. School of Computer Science and Engineering, Northern Minzu University; 2. Key Laboratory of Intelligent Processing of Image and Graphics, Northern Minzu University

Click “Read the Original” at the end of the article to view the literature!

Table of Contents

Research Progress on Multimodal Named Entity Recognition Methods

01

Abstract

Research Progress on Multimodal Named Entity Recognition Methods

02

Chart Appreciation

Research Progress on Multimodal Named Entity Recognition Methods

03

Cite This Article

Research Progress on Multimodal Named Entity Recognition Methods

04

Author Introduction

Abstract

In order to address the issues of insufficient semantic representation of text features, lack of semantic representation of visual features, and difficulties in fusing text and image features in multimodal named entity recognition research, a series of multimodal named entity recognition methods have been proposed. First, we summarize the overall framework of multimodal named entity recognition methods and the commonly used techniques in each part. Then, we categorize them into BiLSTM-based MNER methods and Transformer-based MNER methods, and further classify them into four types of model structures: pre-fusion model, post-fusion model, Transformer single-task model, and Transformer multi-task model based on their architecture. Next, experiments were conducted on two datasets, Twitter-2015 and Twitter-2017, for these two types of methods. The experimental results show that the collaborative representation of multiple features can enhance the semantics of each modality, and multi-task learning can promote the fusion of modal features or result fusion, thereby improving the accuracy of MNER. Finally, in future research on MNER, it is suggested to focus on enhancing modal semantics through collaborative representation of multiple features and promoting the fusion of modal features or result fusion through multi-task learning.

Chart Appreciation

Research Progress on Multimodal Named Entity Recognition Methods

Figure 1 – Structure of YOLOv5

Figure 1 – Structure of YOLOv5

Research Progress on Multimodal Named Entity Recognition Methods

Figure 2 – Normal Convolution and Ghost Convolution

Figure 2 – Normal convolution and Ghost convolution

Research Progress on Multimodal Named Entity Recognition Methods

Figure 3 – GhostBottleneck Module

Figure 3 – GhostBottleneck module

Research Progress on Multimodal Named Entity Recognition Methods

Figure 4 – GhostBottleneck Used in This Article

Figure 4 – GhostBottleneck in this article

Research Progress on Multimodal Named Entity Recognition Methods

Figure 5 – C3-Ghost Module Based on C3

Figure 5 – C3-Ghost module based on C3

Research Progress on Multimodal Named Entity Recognition Methods

Figure 6 – Structures of FPN, PANet and BiFPN

Figure 6 – Structures of FPN, PANet and BiFPN

Research Progress on Multimodal Named Entity Recognition Methods

Figure 7 – Structure of Lightweight YOLOv5

Figure 7 – Structure of lightweight YOLOv5

Research Progress on Multimodal Named Entity Recognition Methods

Figure 8 – 24 Traffic Sign Categories

Figure 8 – 24 traffic sign categories

Research Progress on Multimodal Named Entity Recognition Methods

Figure 9 – Training Process Curve

Figure 9 – Training process curve

Research Progress on Multimodal Named Entity Recognition Methods

Figure 10 – Results Before and After Improvement

Figure 10 – Results before and after improvement

Research Progress on Multimodal Named Entity Recognition Methods

Research Progress on Multimodal Named Entity Recognition Methods

Research Progress on Multimodal Named Entity Recognition Methods

Research Progress on Multimodal Named Entity Recognition Methods

Research Progress on Multimodal Named Entity Recognition Methods

Cite This Article

Research Progress on Multimodal Named Entity Recognition Methods

Figure 1 – Basic Framework of Multimodal Named Entity Recognition

Figure 1 – Basic framework of MNER

Research Progress on Multimodal Named Entity Recognition Methods

Figure 2 – Pre-fusion Model

Figure 2 – Pre-fusion model

Research Progress on Multimodal Named Entity Recognition Methods

Figure 3 – Post-fusion Model

Figure 3 – Pos-fusion model

Research Progress on Multimodal Named Entity Recognition Methods

Figure 4 – Transformer Single-task Model

Figure 4 – Transformer single-task model

Research Progress on Multimodal Named Entity Recognition Methods

Figure 5 – Transformer Multi-task Model

Figure 5 – Transformer multi-task model

Research Progress on Multimodal Named Entity Recognition Methods

Research Progress on Multimodal Named Entity Recognition Methods

Research Progress on Multimodal Named Entity Recognition Methods

Research Progress on Multimodal Named Entity Recognition Methods

Research Progress on Multimodal Named Entity Recognition Methods

Cite This Article

Wang Hairong, Xu Xi, Wang Tong, et al. Research progress of multimodal named entity recognition [J]. Journal of Zhengzhou University (Engineering Science), 2024, 45(2):60-71.

WANG H R, XU X, WANG T, et al. Research progress of multimodal named entity recognition [J]. Journal of Zhengzhou University (Engineering Science), 2024, 45(2):60-71.

Author Introduction

1 Wang Hairong

PhD, Associate Dean of the School of Computer Science and Engineering, Northern Minzu University, Professor, Master’s Supervisor.

Main Research and Teaching Experience: Visiting scholar at Northeastern University in 2010, exchange study at Shanghai University and Fudan University in 2019. Since July 2015, Associate Professor at Northern Minzu University, mainly responsible for teaching courses such as “Natural Language Processing” and “Software Testing”.

Research Achievements: Published more than 40 academic papers in international journals, conferences, and domestic core journals, led and completed more than 10 various research projects, and applied for 8 invention patents. Representative papers: “Multimodal Named Entity Recognition Method with Enhanced Text and Image Semantics”, “Research of Vertical Domain Entity Linking Method Fusing Bert-Binary”, “Research on the Construction Method of Rice Knowledge Graph”, “Knowledge Inference Model of OCR Conversion Error Rules Based on Chinese Character Construction Attributes Knowledge Graph”, “Multi-level Relationship Analysis and Mining Method of Text and Image Data”, etc.

Recruitment Direction and Examination Requirements: Recruiting master’s students in the direction of artificial intelligence and big data processing, multimodal knowledge mining, etc.

Contact Information: Email: [email protected]

Contact Information for Journal of Zhengzhou University (Engineering Science):

Submission Website: http://gxb.zzu.edu.cn

WeChat Public Account: zdxbgxb

Contact Email: [email protected]

Contact Number: 0371-67781276

Contact Number: 0371-67781277

For more exciting content, please follow us

Research Progress on Multimodal Named Entity Recognition Methods

Copyright Statement:

This article is an original content of the editorial department of “Journal of Zhengzhou University (Engineering Science)”, and reprints are welcome!

Click here “Read the Original” to get the journal content
Research Progress on Multimodal Named Entity Recognition Methods

Leave a Comment