Can CNN Handle Absolute Position Information in Images?

Click on the above “Beginner Learning Vision” to select “Star Mark” or “Top“

Important content delivered immediately

Paper Overview

Paper Title: “How much Position Information Do Convolutional Neural Networks Encode? “

Paper Link: https://openreview.net/forum?id=rJeB36NKvB

This article explains how CNNs learn the absolute position information within images. The article comes from Canadian scholars and is included in ICLR2020. It mainly describes two things:

Convolutional neural networks can incorporate absolute position information, and the degree varies among different networks.
Absolute position information is introduced through zero-padding, and the representation effect is better in deeper layers.

What is Absolute Position in Images?

Previously, the concepts of CNN and absolute position were rarely discussed together.

I believe there are two reasons for this: first, it is generally believed that CNNs are translation invariant (for classification tasks), or translation equivariant (for segmentation and detection tasks); second, there is no specific task demand. For example, in the three major object perception tasks in computer vision: classification, segmentation, and detection, object classification is unrelated to position; semantic segmentation, as pixel-level semantic classification, also does not rely on position; the object detection task, which is most likely related to absolute position, has been decoupled from absolute position by mainstream methods, turning into a regression of local relative positions relative to anchor boxes or anchor points. Thus, the network itself does not need to know the absolute position of objects, and position information is used as a prior in preprocessing and postprocessing for coordinate conversion. However, absolute position information is valuable in many tasks, such as instance segmentation, where object + absolute position can uniquely determine an instance.

A very obvious observation is that the human visual system can easily know absolute positions, for example: “There is a bird in the upper left corner, and it has flown to the right.” Moreover, for objects in images, they are essentially distinguished by position and shape.

How Do CNNs Learn Absolute Position?

This article first makes an assumption:

First, we know that the original features learned by convolutional networks can be visualized using CAM for salient regions. The article conducted a simple experiment by cropping an image to test the change in salient regions before and after cropping.

Theoretically, since the features are the same before and after cropping, the salient regions of common objects should remain unchanged, but it was found that the salient regions shifted after cropping, which is difficult to explain with the translation invariance of convolutional neural networks, leading to suspicion that it is due to position information.

Thus, the author analyzed through experiments whether convolutional neural networks can learn position information.