What Exactly Is a Knowledge Graph?

Why Do We Need Knowledge Graphs?

What Is a Knowledge Graph?

The order of the title may be somewhat unfamiliar to some readers. Typically, when introducing an unfamiliar concept, we provide its definition first. The reason for changing the order is to avoid presenting readers with a cold and rigid conceptual description at the outset (I will try to express it with more specific and accurate examples later). On the other hand, I aim to introduce the concept of knowledge graphs naturally through real-life examples. I hope this method deepens readers’ impressions and understanding. To reduce the burden of understanding, I will avoid introducing too many concepts and technical details, reserving them for future articles.

Back to the main topic, this article is mainly divided into three parts. The first part introduces why we need knowledge graphs, the second part discusses the relevant concepts and their formal representations. Finally, a simple summary will be made, and the topics of subsequent articles in this column will be introduced.

Seeing More Than Just Strings

What comes to your mind when you see the following string of text?

Ronaldo Luís Nazário de Lima

Most Chinese people probably do not understand what the above text represents. No worries, let’s look at its corresponding Chinese:

罗纳尔多·路易斯·纳萨里奥·德·利马

Now most people know this is a person’s name, and of course, it’s a foreigner. However, some people may still not know who this person is specifically. Here’s a picture of him:

From this image, we gain additional information: he is a football player. Those unfamiliar with football might still not have any impression of him. Now let’s look at the following image:

I’ll also add that catchy advertisement slogan: “To protect your throat, use Jin Sangzi lozenges. Guangxi Jin Sangzi!” Now, many people should know who he is, as this catchy slogan has tortured us for a long time years ago.

The reason for this example is that computers have always faced the dilemma of being unable to obtain semantic information from web text. Despite significant advancements in artificial intelligence in recent years, surpassing human performance in certain tasks, we are still far from the goal of machines possessing the intelligence of a two or three-year-old child. A large part of this gap is due to the lack of knowledge in machines. Just like the example above, the machine’s reaction to text is no different from ours when we see Ronaldo’s original Portuguese name. To enable machines to understand the meaning behind the text, we need to model describable entities, fill in their attributes, and expand their connections with other entities, i.e., construct the machine’s prior knowledge. Using Ronaldo as an example, when we expand around this entity, we can obtain the following knowledge graph.

With such prior knowledge, when the machine sees Ronaldo Luís Nazário de Lima again, it will “think”: “This is a Brazilian football player named Ronaldo Luís Nazário de Lima.” This is very similar to how we humans make associations and inferences when we see familiar things.

Notice: It should be noted that the knowledge graph above does not represent the actual organizational form of a knowledge graph. On the contrary, it may lead readers to misunderstand the knowledge graph. In the next part, I will provide a more formal representation of the content contained in this diagram within the knowledge graph. In fact, I have seen many articles introducing knowledge graphs that like to present this type of diagram without providing corresponding explanations, which may mislead readers from the start.

In order to improve the quality of the answers returned by the search engine and the efficiency of user queries, Google released the Knowledge Graph on May 16, 2012. With the knowledge graph as an aid, search engines can gain insight into the semantic information behind user queries, returning more accurate, structured information, thereby better meeting users’ query needs. Google’s knowledge graph slogan “things not strings” encapsulates the essence of the knowledge graph, which is, do not return meaningless strings, but obtain the objects or things implied behind the strings.

Still using Ronaldo as an example, if we want to know relevant information about Ronaldo (in many cases, the user’s search intent may also be vague, here we input the query as “Ronaldo”), in the previous versions, we could only get relevant web pages containing this string as the return result, and then had to enter certain web pages to find the information we were interested in; now, in addition to relevant web pages, the search engine will also return a “knowledge card,” containing basic information about the query object and its related other objects (Cristiano Ronaldo’s name is also Ronaldo, the search engine just returns “Fat Ronaldo” basic information based on the referential probability of “Ronaldo,” but perhaps you need information about Cristiano Ronaldo, so the search engine lists this entity as an alternative), as shown in the content of the red box in the following image. If we just want to know Ronaldo’s nationality, age, marital status, and children’s information, we do not need to perform any additional operations. In the shortest time, we obtain the most concise and accurate information.

Of course, this is just one application scenario of knowledge graphs in search engines. I give this example to illustrate that the concept or technology of knowledge graphs is in line with the trends of computer science and internet development.

The Past and Present of Knowledge Graphs

Through the above example, readers should have a preliminary impression of knowledge graphs, which essentially represent knowledge. In fact, the concept of knowledge graphs is not new; the underlying idea can be traced back to a form of knowledge representation proposed in the 1950s and 1960s—Semantic Networks. Semantic networks consist of interconnected nodes and edges, where nodes represent concepts or objects and edges represent the relationships between them (e.g., is-a relationships, such as: a cat is a mammal; part-of relationships, such as: a spine is part of a mammal), as shown in the following diagram. In terms of representation, semantic networks are similar to knowledge graphs, but semantic networks focus more on describing the relationships between concepts (similar to the biological hierarchical classification system—kingdom, phylum, class, order, family, genus, species), while knowledge graphs focus more on describing the relationships between entities.

In addition to semantic networks, branches of artificial intelligence—expert systems, the semantic web proposed by Tim Berners Lee in 1998, and linked data proposed in 2006 are all intricately related to knowledge graphs and can be considered as precursors to knowledge graphs.

Currently, there is no standard definition of knowledge graphs (gold standard definition). Here, I will borrow the definition of knowledge graphs from the book “Exploiting Linked Data and Knowledge Graphs in Large Organisations”:

A knowledge graph consists of a set of interconnected typed entities and their attributes.

That is, knowledge graphs are composed of interconnected entities and their attributes. In other words, knowledge graphs consist of pieces of knowledge, each represented as an SPO triple (Subject-Predicate-Object).

In knowledge graphs, we use RDF to formally represent this triple relationship. RDF (Resource Description Framework) is a standard data model developed by W3C for describing entities/resources. There are three types in RDF graphs: International Resource Identifiers (IRIs), blank nodes, and literals. Below are the type constraints for each part of the SPO:

Subject can be an IRI or blank node.
Predicate is an IRI.
Object can be any of the three types.

IRI—can be seen as a generalization and promotion of URI or URL, it uniquely defines an entity/resource in the entire network or graph, similar to our ID number.

A literal is a literal value that can be seen as plain text with a data type, for example, the original name of Ronaldo mentioned in the first part can be expressed as “Ronaldo Luís Nazário de Lima”^^xsd:string.

A blank node is simply a resource without an IRI and literal, or an anonymous resource. Regarding its function, interested readers can refer to W3C documentation, and I will not elaborate further. Personally, I believe the existence of blank nodes is somewhat redundant, as it not only adds extra difficulty in understanding RDF but also introduces some issues during processing. I usually prefer to use nodes with IRIs to act as blank nodes to perform their functions, somewhat similar to the concept of compound value types (CVT) in Freebase. The final reference materials will provide a blog on the flaws of blank nodes, which interested readers can check out.

So, the triple “Ronaldo’s Chinese name is 罗纳尔多·路易斯·纳萨里奥·德·利马” can be represented in RDF as:

“www.kg.com/person/1″ is an IRI, used to uniquely represent the entity “Ronaldo.” “kg:chineseName” is also an IRI, used to represent the property “Chinese name.” “kg:” is the prefix defined in the RDF file, as shown below:

@prefix kg: <http://www.kg.com/ontology/>

That is, kg:chineseName is actually a shorthand for “http://www.kg.com/ontology/chineseName.”

Let’s illustrate the above knowledge graph in a more formal manner:

We can actually think of knowledge graphs as containing two types of nodes: resources and literals. Borrowing the concept of trees from data structures, literals are similar to leaf nodes, with an out-degree of 0. Now readers should understand why I said the previous diagram was inaccurate and could mislead everyone’s understanding of knowledge graphs. The phrase “Ronaldo Luís Nazário de Lima” as a literal cannot have edges pointing to external nodes, and moreover, the previous diagram does not intuitively reflect the extremely important concept of resources/entities (represented by IRIs) in knowledge graphs.

Summary

This article introduces the practical need for knowledge graphs through the example of Ronaldo, then provides the definition and relevant concepts of knowledge graphs, and discusses the RDF formal representation of knowledge graphs. As a popular science article, many technical details are omitted. In subsequent articles, I will introduce the specific technologies needed in the process of implementing knowledge graphs based on the Semantic Web Stack (as shown in the following image). Additionally, I may combine practice to explain how to build a knowledge graph using data from relational databases and set up a simple knowledge graph-based question-and-answer system (KBQA).

W3C: RDF 1.1 Concepts and Abstract Syntax
Exploiting Linked Data and Knowledge Graphs in Large Organisations
Google: Introducing the Knowledge Graph: things, not strings
Blog: Problems of the RDF model: Blank Nodes
Blog: Compound Value Types in RDF
Chen, L., Zhang, H., Chen, Y., & Guo, W. (2012). Blank nodes in rdf. Journal of Software, 7(9).

Note: The source of this article is online, and the copyright belongs to the original author.

Leave a Comment Cancel reply