A Decade of Research Progress on Knowledge Graphs in NLP

With the development of artificial intelligence research, knowledge graphs (KGs) have attracted wide attention from both academia and industry. As a representation of semantic relationships between entities, knowledge graphs play an important role in natural language processing (NLP) and have seen rapid promotion and widespread adoption in recent years. Given the increasing workload of research in this field, the NLP research community has investigated several KG-related methods. However, there is still a lack of comprehensive studies that classify established topics and review the maturity of individual research streams. To bridge this gap, we systematically analyzed 507 papers in the literature related to KGs in NLP. Our survey includes multiple aspects of tasks, research types, and contributions. Therefore, we propose a structured research overview that provides a classification of tasks, summarizes our findings, and highlights directions for future work.

https://www.zhuanzhi.ai/paper/d3a164b388877b723eec8789fd081c3d

The acquisition and application of knowledge are inherent characteristics of natural language. Humans use language as a means to communicate facts, argue decisions, or question beliefs. Therefore, computational linguists began studying how to represent knowledge as relationships between concepts in semantic networks as early as the 1950s and 1960s (Richens, 1956; Quillian, 1963; Collins and Quillian, 1969). Recently, knowledge graphs (KGs) have become a method for semantically representing knowledge about real-world entities in a machine-readable format. They originated from research on semantic networks, domain-specific ontologies, and linked data, so they are not a completely new concept (Hitzler, 2021). Despite the growing popularity of KGs, there is still no universal understanding of what KGs are and what tasks they are suitable for. Although previous works have attempted to define KGs (Pujara et al., 2013; Ehrlinger and Wöß, 2016; Paulheim, 2017; Färber et al., 2018), the term has not been uniformly used by researchers. Most studies implicitly adopt a broad definition of KGs, understanding them as “data graphs aimed at accumulating and conveying knowledge about the real world, where nodes represent entities of interest and edges represent relationships between these entities” (Hogan et al., 2022).

Since the introduction of Google’s KG in 2012 (Singhal, 2012), KGs have attracted significant research attention in both academia and industry. Particularly in the study of NLP, the use of KGs has become increasingly popular over the past five years, and this trend seems to be accelerating. The fundamental paradigm is that the combination of structured and unstructured knowledge can benefit various NLP tasks. For instance, structured knowledge from knowledge graphs can be injected into contextual knowledge found in language models, enhancing the performance of downstream tasks (Colon-Hernandez et al., 2021). Moreover, as the importance of knowledge bases continues to grow, efforts to build new knowledge bases from unstructured texts are also expanding.

Google coined the term knowledge graph in 2012, and a decade later, scholars have proposed numerous novel methods. Therefore, it is important to gather insights, consolidate existing results, and provide a structured overview. However, to our knowledge, there has not yet been a study that comprehensively outlines the entire research landscape of knowledge graphs in the field of NLP. To bridge this gap, we conducted a comprehensive survey that classifies established topics, identifies trends, and outlines areas for future research, analyzing all studies conducted in this field. Our three main contributions are as follows:

1. We systematically extracted information from 507 papers, reporting insights regarding tasks, research types, and contributions.

2. This paper provides a classification of tasks in the literature on knowledge graphs in NLP, as shown in Figure 1.

3. We assess the maturity of individual research streams, identify trends, and highlight directions for future work.

This paper reviews the evolution and current research progress of knowledge graphs in NLP. Although we cannot cover all relevant papers on this topic, our aim is to provide a representative overview that can serve as a starting point for NLP scholars and practitioners. Additionally, the multifaceted analysis can guide the research community in bridging existing gaps and finding new ways to integrate KGs with NLP.

Results of the KG Research Review

Overall Overview of Research (RQ1)

For the literature on KGs in NLP, we began our analysis by using the number of studies as an indicator of research interest. Figure 2 illustrates the distribution of publications over the decade of observation. Although the first publications appeared in 2013, the annual publications grew slowly from 2013 to 2016. From 2017 onwards, the number of publications nearly doubled each year. Due to the significant increase in research interest over these years, over 90% of all included publications come from these five years. Although the growth trend seems to have stopped in 2021, this may be due to data extraction occurring in the first week of 2022, missing many studies from 2021 that were included in the database later in 2022. Nevertheless, the trend in Figure 2 clearly indicates that KGs are receiving increasing attention from the NLP research community. When considering the 507 papers, conference papers (402) are nearly four times more than journal papers (105).

Overview of KG Tasks (RQ2)

Based on the tasks identified in the literature on KGs in NLP, we developed an empirical classification as shown in Figure 1. These two top-level categories include knowledge acquisition and knowledge application. Knowledge acquisition includes NLP tasks that build knowledge bases from unstructured texts (knowledge graph construction) or reason over already constructed knowledge bases (knowledge graph reasoning). The knowledge base construction tasks are further divided into two subclasses: knowledge extraction, which populates knowledge bases with entities, relationships, or attributes, and knowledge integration, which updates knowledge bases. Knowledge application is the second major top-level concept, which includes common NLP tasks enhanced by structured knowledge from knowledge bases.

As expected, the frequency of tasks in our classification varies greatly in the literature. Table 2 summarizes the most popular tasks, and Figure 5 compares their popularity over time. Figure 4 shows the number of domains detected for the most prominent tasks. It indicates that certain tasks are more suitable for domain-specific contexts than others.

Research Types and Contributions (RQ3)

Table 3 shows the distribution of papers based on different research and contribution types defined in Appendix Tables 4 and 5. It indicates that most papers conduct verification research, investigating new techniques or methods that have yet to be implemented in practice. A considerable number of papers, although noticeably fewer, focus on proposing solutions, demonstrating their merits and applicability through a small example or argument. However, these papers often lack profound empirical evaluation.

Convenient Access to Knowledge

Convenient Download, please follow the Zhuanzhi WeChat official account (click the above blue Zhuanzhi to follow)

Reply “KG10” in the background to obtain“A Decade of Research Progress on Knowledge Graphs in NLP” download link from Zhuanzhi

A Decade of Research Progress on Knowledge Graphs in NLP

Zhuanzhi, a professional and reliable AI knowledge distribution platform, enables cognitive collaboration faster and better! Welcome to register and log in to Zhuanzhi www.zhuanzhi.ai to obtain 100000+ AI (AI in military, medicine, public security, etc.) themed knowledge materials!

Scan the QR code to join the Zhuanzhi AI Knowledge Community to get the latest AI professional knowledge tutorials and materials and consult with experts!

Click “Read the Original” to learn how to use Zhuanzhi, and view and obtain 100000+ AI themed knowledge materials

Leave a Comment Cancel reply