With the development of artificial intelligence research, knowledge graphs (KGs) have attracted wide attention from both academia and industry. As a representation of semantic relationships between entities, knowledge graphs play an important role in natural language processing (NLP) and have seen rapid promotion and widespread adoption in recent years. Given the increasing workload of research in this field, the NLP research community has investigated several KG-related methods. However, there is still a lack of comprehensive studies that classify established topics and review the maturity of individual research streams. To bridge this gap, we systematically analyzed 507 papers in the literature related to KGs in NLP. Our survey includes multiple aspects of tasks, research types, and contributions. Therefore, we propose a structured research overview that provides a classification of tasks, summarizes our findings, and highlights directions for future work.
https://www.zhuanzhi.ai/paper/d3a164b388877b723eec8789fd081c3d
The acquisition and application of knowledge are inherent characteristics of natural language. Humans use language as a means to communicate facts, argue decisions, or question beliefs. Therefore, computational linguists began studying how to represent knowledge as relationships between concepts in semantic networks as early as the 1950s and 1960s (Richens, 1956; Quillian, 1963; Collins and Quillian, 1969). Recently, knowledge graphs (KGs) have become a method for semantically representing knowledge about real-world entities in a machine-readable format. They originated from research on semantic networks, domain-specific ontologies, and linked data, so they are not a completely new concept (Hitzler, 2021). Despite the growing popularity of KGs, there is still no universal understanding of what KGs are and what tasks they are suitable for. Although previous works have attempted to define KGs (Pujara et al., 2013; Ehrlinger and Wöß, 2016; Paulheim, 2017; Färber et al., 2018), the term has not been uniformly used by researchers. Most studies implicitly adopt a broad definition of KGs, understanding them as “data graphs aimed at accumulating and conveying knowledge about the real world, where nodes represent entities of interest and edges represent relationships between these entities” (Hogan et al., 2022).
Since the introduction of Google’s KG in 2012 (Singhal, 2012), KGs have attracted significant research attention in both academia and industry. Particularly in the study of NLP, the use of KGs has become increasingly popular over the past five years, and this trend seems to be accelerating. The fundamental paradigm is that the combination of structured and unstructured knowledge can benefit various NLP tasks. For instance, structured knowledge from knowledge graphs can be injected into contextual knowledge found in language models, enhancing the performance of downstream tasks (Colon-Hernandez et al., 2021). Moreover, as the importance of knowledge bases continues to grow, efforts to build new knowledge bases from unstructured texts are also expanding.
Google coined the term knowledge graph in 2012, and a decade later, scholars have proposed numerous novel methods. Therefore, it is important to gather insights, consolidate existing results, and provide a structured overview. However, to our knowledge, there has not yet been a study that comprehensively outlines the entire research landscape of knowledge graphs in the field of NLP. To bridge this gap, we conducted a comprehensive survey that classifies established topics, identifies trends, and outlines areas for future research, analyzing all studies conducted in this field. Our three main contributions are as follows:
Overall Overview of Research (RQ1)
Overview of KG Tasks (RQ2)
Based on the tasks identified in the literature on KGs in NLP, we developed an empirical classification as shown in Figure 1. These two top-level categories include knowledge acquisition and knowledge application. Knowledge acquisition includes NLP tasks that build knowledge bases from unstructured texts (knowledge graph construction) or reason over already constructed knowledge bases (knowledge graph reasoning). The knowledge base construction tasks are further divided into two subclasses: knowledge extraction, which populates knowledge bases with entities, relationships, or attributes, and knowledge integration, which updates knowledge bases. Knowledge application is the second major top-level concept, which includes common NLP tasks enhanced by structured knowledge from knowledge bases.
As expected, the frequency of tasks in our classification varies greatly in the literature. Table 2 summarizes the most popular tasks, and Figure 5 compares their popularity over time. Figure 4 shows the number of domains detected for the most prominent tasks. It indicates that certain tasks are more suitable for domain-specific contexts than others.
Research Types and Contributions (RQ3)
Table 3 shows the distribution of papers based on different research and contribution types defined in Appendix Tables 4 and 5. It indicates that most papers conduct verification research, investigating new techniques or methods that have yet to be implemented in practice. A considerable number of papers, although noticeably fewer, focus on proposing solutions, demonstrating their merits and applicability through a small example or argument. However, these papers often lack profound empirical evaluation.
Convenient Access to Knowledge
Convenient Download, please follow the Zhuanzhi WeChat official account (click the above blue Zhuanzhi to follow)
Reply “KG10” in the background to obtain“A Decade of Research Progress on Knowledge Graphs in NLP” download link from Zhuanzhi


