GNN Emerges as a Powerful Tool for Causal Reasoning in Deep Learning

Source: New Intelligence

This article contains 4263 words, suggested reading time is 8 minutes.

This article discusses the recent paper published by 27 authors from DeepMind, Google Brain, MIT, and other institutions, proposing the “Graph Network” (GNN), which combines end-to-end learning with inductive reasoning, aiming to solve the problem of relational reasoning that deep learning cannot address.

As a benchmark in the industry, DeepMind’s movements have always been a hot topic in the AI field. Recently, this world-leading AI laboratory seems to have shifted its focus to exploring “relationships”, having published several papers related to “relationships” since June, such as:

Relational Inductive Bias for Physical Construction in Humans and Machines
Relational Deep Reinforcement Learning
Relational Recurrent Neural Networks

There are many papers, but if there is one that is most worth reading, it must be this one—“Relational Inductive Bias, Deep Learning, and Graph Networks.”

This paper is co-authored by 27 authors from DeepMind, Google Brain, MIT, and the University of Edinburgh (22 of whom are from DeepMind), and it comprehensively elaborates on relational inductive bias and graph networks (GNN) in a 37-page document.

DeepMind’s research scientist and expert Oriol Vinyals has rarely promoted this work on Twitter (he is also one of the authors) and stated that this review is “pretty comprehensive.”

Many well-known AI scholars have also commented on this article.

Denny Britz, who interned at Google Brain and worked on deep reinforcement learning research, expressed his happiness to see someone combining first-order logic and probabilistic reasoning with graphs, suggesting that this field may experience a revival.

GNN Emerges as a Powerful Tool for Causal Reasoning in Deep Learning

Chris Gray, founder of chip company Graphcore, commented that if this direction continues and truly achieves results, it will create a more promising foundation for AI than the current deep learning.

GNN Emerges as a Powerful Tool for Causal Reasoning in Deep Learning

Seth Stafford, a PhD in mathematics from Cornell University and a postdoc at MIT, believes that Graph Neural Networks (GNNs) may address the core problem of causal reasoning that Turing Award winner Judea Pearl pointed out deep learning cannot solve.

Paving a More Promising Direction Than Solely Deep Learning

So, what is this paper about? DeepMind’s viewpoints and key points are clearly stated in this paragraph:

This is both an opinion piece and a review, as well as a unification. We believe that if AI is to achieve human-like capabilities, combinatorial generalization must be prioritized, and structured representation and computation are key to achieving this goal.

Just as innate and acquired factors work together in biology, we believe that “hand-engineering” and “end-to-end” learning are not mutually exclusive; we advocate for combining the advantages of both and benefiting from their complementary strengths.

In the paper, the authors explore how to use relational inductive biases in deep learning structures (such as fully connected layers, convolutional layers, and recurrent layers) to promote learning about entities, relationships, and the rules that compose them.

They propose a new AI module—Graph Networks (GNNs), which is a generalization and extension of various neural network methods for operating on graphs. GNNs have strong relational inductive biases, providing a direct interface for manipulating structured knowledge and generating structured behavior.

The authors also discuss how GNNs support relational reasoning and combinatorial generalization, laying the foundation for more complex, interpretable, and flexible reasoning patterns.

Turing Award Winner Judea Pearl: The Tragedy of Causal Reasoning in Deep Learning

In early 2018, following the discussions at NIPS 2017 regarding “Deep Learning Alchemy”, deep learning faced a significant critic.

Turing Award winner and father of Bayesian networks Judea Pearl published his paper “Seven Sparks of Causal Revolution and Theoretical Obstacles in Machine Learning”, discussing the current limitations of machine learning theory and presenting seven insights from causal reasoning. Pearl pointed out that current machine learning systems operate almost entirely on statistical or blind models, which cannot serve as a foundation for strong AI. He believes the breakthrough lies in the “causal revolution,” which, by drawing on structural causal reasoning models, can uniquely contribute to automated reasoning.

In a recent interview, Pearl bluntly stated that current deep learning is merely “curve fitting.” “This sounds blasphemous… but from a mathematical perspective, no matter how sophisticated your means of manipulating data are, and how much information you extract from it, what you are doing is still just fitting a curve.”

DeepMind’s Proposal: Merging Traditional Bayesian Causal Networks and Knowledge Graphs with Deep Reinforcement Learning

How to solve this problem? DeepMind believes that we should start with “Graph Networks”.

Dr. Deng Kan, founder of Dazhong Yida and a PhD from CMU, explained the research background of DeepMind’s paper to us.

Dr. Deng introduced that there are three main schools in the machine learning community:

Symbolicism
Connectionism
Actionism

The origin of Symbolicism emphasizes the study of knowledge representation and logical reasoning. After decades of research, the main achievements of this school include Bayesian causal networks and knowledge graphs.

The flag bearer of Bayesian causal networks is Professor Judea Pearl, the 2011 Turing Award winner. However, it is said that during his speech at the NIPS conference in 2017, the audience was sparse. In 2018, he published a new book, “The Book of Why,” defending causal networks while criticizing deep learning for its lack of rigorous logical reasoning processes. Knowledge graphs are mainly promoted by search engine companies, including Google, Microsoft, and Baidu, aiming to advance search engines from keyword matching to semantic matching.

The origin of Connectionism is biomimicry, using mathematical models to mimic neurons. Professor Marvin Minsky received the Turing Award in 1969 for his contributions to neuron research. By assembling a large number of neurons, deep learning models are formed, with Geoffrey Hinton as the flag bearer. The most criticized flaw of deep learning models is their lack of interpretability.

Actionism introduces cybernetics into machine learning, with the most notable achievement being reinforcement learning. Professor Richard Sutton is the leader in reinforcement learning. In recent years, Google DeepMind researchers have merged traditional reinforcement learning with deep learning, achieving AlphaGo, which has defeated all human Go champions.

DeepMind’s paper published the day before yesterday proposes merging traditional Bayesian causal networks and knowledge graphs with deep reinforcement learning, and summarizes the research progress related to this theme.

What Exactly Is DeepMind’s Proposed “Graph Network”?

Here, it is necessary to provide a more detailed introduction to the much-discussed “Graph Network.” Of course, you can skip this section and go directly to the interpretations that follow.

In the paper “Relational Inductive Bias, Deep Learning, and Graph Networks”, the authors explain their “Graph Networks” in detail. The Graph Network (GNN) framework defines a class of functions for relational reasoning used in graphical structure representations. The GNN framework generalizes and extends various graph neural networks, MPNN, and NLNN methods, supporting the construction of complex structures from simple building blocks.

The main computational unit of the GNN framework is the GNN block, which takes a graph as input, performs computations on the structure, and returns a graph as output. As described in Box 3 below, an entity is represented by the nodes of the graph, the relationships of the edges, and global attributes.

The paper authors use “graph” to refer to a directed (directed), attributed multi-graph with global attributes. A node is represented as, an edge is represented as, and global attributes are represented as u. and represent the indices of the sender and receiver nodes. Specifically:

Directed: Unidirectional, from the “sender” node to the “receiver” node.
Attribute: Attributes can be encoded as vectors, sets, or even another graph.
Attributed: Edges and vertices have attributes associated with them.
Global attribute: Graph-level attributes.
Multi-graph: Multiple edges between vertices.

The organization of blocks in the GNN framework emphasizes customizability and integrates new architectures that represent the desired relational inductive biases.

To explain GNN more concretely, consider predicting the motion of a set of rubber balls in any gravitational field, where they are not colliding but are connected to each other with one or more springs. We will refer to this running example in the definitions below to illustrate the graphical representation and the computations performed on it.

Definition of “graph”

In our GNN framework, a graph is defined as a 3-tuple of.

u represents a global attribute; for example, u may represent a gravitational field.

is a set of nodes (cardinality is), where each represents the attributes of a node. For example, V may represent each ball with attributes such as position, velocity, and mass.

is a set of edges (cardinality is), where each represents the attributes of an edge, and represents the index of the receiving node, while represents the index of the sending node. For example, E can represent the springs existing between different balls and their corresponding spring constants.

Algorithm 1: Steps for a Complete GNN Block Calculation

The internal structure of the GNN block

A GNN block contains three “update” functions, and three “aggregation” functions:

Where:

Figure: Updates in the GNN block. Blue indicates the elements being updated, and black indicates other elements involved in the update.

The Challenges of Combining Knowledge Graphs and Deep Learning

To combine knowledge graphs with deep learning, Dr. Deng Kan identifies several major challenges.

1. Node Vectors:

Knowledge graphs are composed of nodes and edges, where nodes represent entities, and entities contain attributes and values. Traditional entities in knowledge graphs are usually composed of conceptual symbols, such as vocabulary in natural language.

Traditional edges in knowledge graphs connect two single nodes, i.e., two entities, expressing relationships, with the strength of the relationship expressed by weights, which are typically constants.

If we want to integrate traditional knowledge graphs with deep learning, the first step is to achieve differentiability of nodes. Using numeric word vectors to replace natural language vocabulary is an effective way to achieve node differentiability, and a common approach is to use language models to analyze large amounts of text to find the most contextually relevant word vectors. However, in graphs, traditional word vector generation algorithms do not work effectively and need modification.

2. Hypernodes:

As mentioned earlier, traditional edges in knowledge graphs connect two single nodes, expressing the relationship between them. This assumption constrains the expressive power of graphs because, in many scenarios, multiple single nodes combined together relate to other single nodes or combinations of single nodes. We refer to these combinations of single nodes as hypernodes.

The question is, which single nodes combine to form a hypernode? Human prior specification is one method. Automatically learning the composition of hypernodes from large training data through dropout or regulation algorithms is also a viable approach.

3. Hyperedges:

Traditional edges in knowledge graphs express relationships between nodes, with the strength of relationships expressed by weights, typically constants. However, in many scenarios, weights are not constants. As the values of nodes change, the weights of edges also change, and they may change non-linearly.

Using non-linear functions to express edges in graphs is called hyperedges. Deep learning models can be used to simulate non-linear functions. Therefore, each edge in the knowledge graph can be viewed as a deep learning model. The input for the model is a hypernode composed of several single nodes, and the output is another hypernode. If we view each deep learning model as a tree, where the root is the input and the leaves are the output, then the entire knowledge graph can be seen as a forest of deep learning models.

4. Paths:

Training knowledge graphs, including training node vectors, hypernodes, and hyperedges, often involves a path walked through the graph. By fitting a vast number of paths, we obtain the most accurate node vectors, hypernodes, and hyperedges.

Using fitted paths to train graphs presents a challenge: the disconnection between the training process and the final evaluation. For example, if you are given outlines of several articles along with corresponding model essays to learn how to write, the fitting process emphasizes word-for-word imitation. However, evaluating the quality of the article is not about strictly following every word but about the overall coherence of the article.

How to resolve the disconnection between the training process and the final evaluation? A promising approach is to use reinforcement learning. The essence of reinforcement learning is to provide evaluations of each intermediate state in the path process based on the final evaluation through backtracking and discounting.

However, reinforcement learning faces difficulties in that the number of intermediate states cannot be too large. When the number of states is too many, the training process of reinforcement learning cannot converge. A solution to the convergence problem is to use a deep learning model to estimate the potential values of all states. In other words, it is not necessary to estimate the potential values of all states; it is sufficient to train the limited parameters of one model.

DeepMind’s paper published the day before yesterday proposes merging deep reinforcement learning with knowledge graphs and summarizes a wealth of related research. However, the paper does not explicitly state which specific approach DeepMind favors.

Perhaps different scenarios will have different plans, and there is no universal best solution.

Is Graph Deep Learning the Next Hot Topic in AI Algorithms?

Many important real-world datasets appear in the form of graphs or networks, such as social networks, knowledge graphs, and the World Wide Web. Currently, an increasing number of researchers are beginning to focus on how neural network models handle these structured datasets.

Combined with a series of papers on graph deep learning published by DeepMind, Google Brain, and others, does this indicate that “graph deep learning” is the next hot topic in AI algorithms?

In summary, let’s start with this paper.

Address: https://arxiv.org/pdf/1806.01261.pdf

References

Interview with Judea Pearl:

https://www.quantamagazine.org/to-build-truly-intelligent-machines-teach-them-cause-and-effect-20180515/

Graph Convolutional Networks:

http://tkipf.github.io/graph-convolutional-networks/

Relational RNN:

https://arxiv.org/pdf/1806.01822v1.pdf

Relational Deep Reinforcement Learning:

https://arxiv.org/abs/1806.01830

Relational Inductive Bias:

https://arxiv.org/pdf/1806.01203.pdf

Leave a Comment Cancel reply