Reprinted from: Jishi Platform
As one of the representative algorithms of deep learning, Convolutional Neural Networks (CNN) have achieved the best results in fields such as computer vision.
In 1998, Yann LeCun proposed LeNet-5, applying the BP algorithm to train the neural network structure, forming the prototype of contemporary CNNs. In 2012, during the ImageNet image recognition competition, the paper by Hinton’s group mentioned AlexNet, which introduced a brand new deep structure and dropout method, reducing the error rate from over 25% to 15%, revolutionizing the field of image recognition. Since then, CNNs have gained fame and flourished. In 2016, CNN surprised people once again: The intelligent robot “AlphaGo”, developed by Google based on deep neural networks and search trees, defeated humans in Go. Subsequently, utilizing the ideas of ResNet and Faster-RCNN, a year later, Master completely dominated all human Go masters, achieving a god-like realm.
It can be said that convolutional neural networks are one of the most successful applications of deep learning algorithms.
Studying the classic papers on convolutional neural networks is essential for learning and researching CNNs. Based on relevant algorithms, the AMiner platform for big data mining and services in the field of artificial intelligence extracted keywords related to “convolutional neural networks” from top international conferences and journals, filtering and recommending 40 classic must-read papers, which cover the detection/recognition/classification/segmentation/tracking in various fields of theory and practice and are sorted by citation count.
These 100 papers were mostly published between 2015 and 2019, primarily in top computer vision conferences such as CVPR, ICCV, ICML, and NeuIPS. Among the scholars who published the most papers in this field, the “father of neural networks” and “pioneer of deep learning”, Hinton and Bengio, both made the list, continuously contributing to the research of deep learning.
The following text will sort these 100 papers by citation count, and provide brief comments on some of them:
*1. Fully Convolutional Networks for Semantic Segmentation | CVPR2015 | Citation Count: 13136
Author Information: UC Berkeley | Jonathan Long, Evan Shelhamer, Trevor Darrell
This paper is the representative work of neural network expert Jonathan Long, his PhD student Evan Shelhamer, and mentor Trevor Darrell, which won the Best Paper Award at CVPR 2015. The core contribution of this paper is the introduction of the concept of Fully Convolutional Networks (FCN), which is a type of fully convolutional neural network that can accept images of arbitrary size and output images of the same size as the input, training an end-to-end, pixel-to-pixel network for semantic segmentation, achieving state-of-the-art results. This is the first time an end-to-end FCN was trained for pixel-level predictions; it is also the first time a FCN was trained using supervised pre-training methods.
*2. Convolutional Neural Networks for Sentence Classification | EMNLP 2014 | Citation Count: 5978 Paper Information: New York University | Yoon Kim
*3. Large-Scale Video Classification with Convolutional Neural Networks | CVPR2014 | Citation Count: 4145 Author Information: Google, Stanford University | Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, Li Fei-Fei
This paper studied various methods to extend the connectivity of CNNs in the temporal domain to leverage local spatiotemporal information. The authors proposed a multi-resolution, novel framework to accelerate training (computational efficiency). The main contributions of the paper are three:
1. Extending CNNs for video classification; 2. Using two different resolutions of frames as inputs, fed into two CNNs, which are unified in the last two fully connected layers; the two streams are a low-resolution content stream and a high-resolution stream using the intermediate parts of each frame;
3. Transferring the CNN structure learned from a self-built database to the UCF-101 dataset.
*4. How transferable are features in deep neural networks? | NIPS 2014 | Citation Count: 3414 Author Information: Carnegie Mellon University, University of Wyoming, University of Montreal | Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson
This paper is the research of the Bengio team on transfer learning. The article experiments with the generalization performance and specificity of different layer neurons in deep neural networks, studying the two main influencing factors on the model’s transferability, which is of great significance for studying the transferability of deep neural network features.
*5. Learning Spatiotemporal Features with 3D Convolutional Networks | ICCV2015 | Citation Count: 2711
Author Information: Facebook, Dartmouth College | Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, Manohar Paluri
This paper introduces a method to train a simple and efficient 3D Convolutional Neural Network to learn spatiotemporal features under large-scale supervised video datasets.
The advantages of 3D convolutional networks are three:
1) Compared to 2D convolutional networks, 3D convolutional networks are more suitable for learning spatiotemporal features;
2) The structure of the convolution kernels in each layer of the 3D convolutional network is homogeneous and applicable in many structures;
3) The learned features are called C3D, which, with a simple linear classifier, achieved the best performance on 4 different benchmarks and is comparable to the best methods on 2 other benchmarks.
*6. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation | Citation Count: 2373 Author Information: University of Cambridge | Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla
This paper proposes a deep fully convolutional neural network architecture called SegNet for pixel-level semantic segmentation. The innovation of SegNet lies in the decoder’s method of upsampling the low-resolution feature maps of the input. Specifically, the decoder uses the pooling indices computed during max-pooling to compute the corresponding non-linear upsampling of the encoder. This operation eliminates the need for learning in the upsampling process. The upsampled map is sparse, and then a learnable filter is used to compute the dense feature map through convolution.
*7. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks | ECCV2016 | Citation Count: 1713 Author Information: Allen Institute for Artificial Intelligence, University of Washington | Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi
This paper proposes two effective approximate networks for standard convolutional neural networks: binary weight networks and XNOR networks. In the binary weight network, convolution kernels are approximated using two values, saving 32 times the storage space. In the XNOR network, both the convolution kernels and the inputs to the convolution layers are represented using two values (1 and -1). The XNOR network primarily uses binary operations for convolution operations. This significantly speeds up convolution operations by 58 times and saves 32 times the memory.
*8. Character-level Convolutional Networks for Text Classification | NIPS2015 | Citation Count: 1701 Author Information: New York University | Xiang Zhang, Junbo Zhao, Yann LeCun
*9. Towards End-To-End Speech Recognition with Recurrent Neural Networks | ICML2014 | Citation Count: 1339 Author Information: DeepMind, University of Toronto | Alex Graves, Navdeep Jaitly
*10. DRAW: A Recurrent Neural Network For Image Generation | ICML 2015 | Citation Count: 1186 Author Information: Google DeepMind | Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, Daan Wierstra
This paper introduces a Deep Recurrent Attentive Writer (DRAW) neural network model applicable to image generation, which can generate high-quality natural images and improve the performance of generative models on the MNIST dataset. Furthermore, the images generated by the DRAW model trained on the SVHN dataset are indistinguishable from real data to the naked eye.
*11. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps | ICLR2013 | Citation Count: 1170 Author: Karen Simonyan, Andrea Vedaldi, Andrew Zisserman
*12. Neural Collaborative Filtering | Citation Count: 1141 Author: Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, Tat-Seng Chua
*13. Image Style Transfer Using Convolutional Neural Networks | CVPR2016 | Citation Count: 1107 Author: Leon A. Gatys, Alexander S. Ecker, Matthias Bethge
*14. Image Super-Resolution Using Deep Convolutional Networks | IEEE2016 | Citation Count: 1035 Author: Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang
*15. Distilling the Knowledge in a Neural Network | Citation Count: 1021 Author: Geoffrey E. Hinton, Oriol Vinyals, Jeffrey Dean
*16. Recurrent Convolutional Neural Networks for Text Classification | AAAI2015 | Citation Count: 916 Author: Siwei Lai, Liheng Xu, Kang Liu, Jun Zhao
*17. Squeeze-and-Excitation Networks | CVPR2018 | Citation Count: 886 Author: Jie Hu, Li Shen, Gang Sun
*18. Convolutional Sequence to Sequence Learning | ICML2017 | Citation Count: 777 Author: Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann N. Dauphin
*19. Non-local Neural Networks | CVPR2018 | Citation Count: 751 Author: Xiaolong Wang, Ross B. Girshick, Abhinav Gupta, Kaiming He
*20. Residual Attention Network for Image Classification | CVPR2017 | Citation Count: 568 Author: Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, Xiaoou Tang
*21. Image Super-Resolution via Deep Recursive Residual Network | CVPR2017 | Citation Count: 559 Author: Ying Tai, Jian Yang, Xiaoming Liu
*22. PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization | ICCV2015 | Citation Count: 503 Author: Alex Kendall, Matthew Grimes, Roberto Cipolla
*23. Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks | Citation Count: 483 Author: Aliaksei Severyn, Alessandro Moschitti
*24. Deformable Convolutional Networks | ICCV2017 | Citation Count: 476 Author: Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei
*25. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting | Citation Count: 399 Author: Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, Wang-chun Woo
*26. Fast Training of Convolutional Networks through FFTs | Citation Count: 385 Author: Michaël Mathieu, Mikael Henaff, Yann LeCun
*27. Large Kernel Matters – Improve Semantic Segmentation by Global Convolutional Network | CVPR2017 | Citation Count: 377 Author: Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun
*28. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition | AAAI2018 | Citation Count: 353 Author: Sijie Yan, Yuanjun Xiong, Dahua Lin
*29. The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation | CVPR2017 | Citation Count: 333 Author: Simon Jégou, Michal Drozdzal, David Vázquez, Adriana Romero, Yoshua Bengio
*30. Multi-Oriented Text Detection with Fully Convolutional Networks | CVPR2016 | Citation Count: 313 Author: Zheng Zhang, Chengquan Zhang, Wei Shen, Cong Yao, Wenyu Liu, Xiang Bai
*31. Learning Efficient Convolutional Networks through Network Slimming | ICCV2017 | Citation Count: 310 Author: Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, Changshui Zhang
*32. Multi-View 3D Object Detection Network for Autonomous Driving | CVPR2017 | Citation Count: 276 Author: Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, Tian Xia
*33. Very Deep Convolutional Networks for End-to-End Speech Recognition | ICASSP2017 | Citation Count: 242 Author: Yu Zhang, William Chan, Navdeep Jaitly
*34. A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification | Citation Count: 229 Author: Yingjie Zhang, Byron C. Wallace
*35. Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks | ACL2015 | Citation Count: 212 Author: Yubo Chen, Liheng Xu, Kang Liu, Daojian Zeng, Jun Zhao
*36. ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression | ICCV2017 | Citation Count: 208 Author: Jian-Hao Luo, Jianxin Wu, Weiyao Lin
*37. DCAN: Deep Contour-Aware Networks for Accurate Gland Segmentation | CVPR2016 | Citation Count: 166 Author: Hao Chen 0011, Xiaojuan Qi, Lequan Yu, Pheng-Ann Heng
*38. Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition | CVPR2017 | Citation Count: 159 Author: Jianlong Fu, Heliang Zheng, Tao Mei
*39. Interpretable Convolutional Neural Networks | CVPR2018 | Citation Count: 154 Author: Quanshi Zhang, Ying Nian Wu, Song-Chun Zhu
*40. A systematic study of the class imbalance problem in convolutional neural networks | Citation Count: 148 Author: Mateusz Buda, Atsuto Maki, Maciej A. Mazurowski