Overview of Recent Advances in Deep Learning

Source: Machine Learning Academy

This article is about 10,500 words long, and it is recommended to read for over 20 minutes.
In this article, we will briefly discuss the latest advances in deep learning in recent years.

“Overview is always one of the quickest ways to get started in a new field!”

Abstract: Deep learning is one of the latest trends in machine learning and artificial intelligence research. It is also one of the most popular scientific research trends today. Deep learning methods have brought revolutionary advances in computer vision and machine learning. New deep learning technologies are constantly emerging, surpassing state-of-the-art machine learning and even existing deep learning technologies. In recent years, there have been many significant breakthroughs globally in this field. Due to the rapid development of deep learning, it has become difficult to keep track of its progress, especially for new researchers. In this article, we will briefly discuss the latest advances in deep learning in recent years.

1 Introduction

The term “Deep Learning” (DL) was first introduced in machine learning (ML) in 1986 and later used for artificial neural networks (ANN) in 2000. Deep neural networks consist of multiple hidden layers to learn data features with multiple levels of abstraction. DL methods allow computers to learn complex concepts through relatively simple ideas. For artificial neural networks (ANN), deep learning (DL), also known as hierarchical learning, uses deep architectures for multiple levels of abstraction, namely nonlinear operations; for example, ANNs with many hidden layers. In summary, deep learning is a subfield of machine learning that employs multilayered nonlinear information processing and abstraction for feature learning, representation, classification, regression, and pattern recognition across supervised, unsupervised, semi-supervised, self-supervised, and weakly supervised learning.

Deep learning, or representation learning, is a branch or subfield of machine learning, and most people believe that modern deep learning methods began to develop in 2006. This article is a review of the latest deep learning technologies, mainly recommended for researchers who are about to enter this field. This article includes the basic ideas of DL, main methods, latest advances, and applications.

Review papers are very beneficial, especially for new researchers in a specific field. If a research area has great value in the near future and related application fields, it is often challenging to keep track of the latest developments in real-time. Today, scientific research is a very attractive profession because knowledge and education are more accessible than ever before. For a trend in technology research, the only normal assumption is that it will have many improvements in various aspects. An overview of a field from a few years ago may now be outdated.

Considering the popularity and promotion of deep learning in recent years, we briefly outline deep learning and neural networks (NN), along with their major advances and significant breakthroughs over the years. We hope this article will help many novice researchers gain a comprehensive understanding of recent research and technologies in deep learning and guide them to start in the right way. At the same time, we hope to pay tribute to the top DL and ANN researchers of this era: Geoffrey Hinton, Juergen Schmidhuber, Yann LeCun, Yoshua Bengio, and many other research scholars whose work has built modern artificial intelligence (AI). Following their work to track the current best DL and ML research progress is also crucial for us.

In this paper, we will first summarize past research papers and study the models and methods of deep learning. Then, we will begin to describe the latest advances in this field. We will discuss deep learning (DL) methods, deep architectures (i.e., deep neural networks (DNN)), and deep generative models (DGM), followed by important regularization and optimization methods. Additionally, two brief sections will summarize open-source DL frameworks and important DL applications. In the last two chapters (i.e., discussion and conclusion), we will discuss the current state and future of deep learning.

2 Related Research

In recent years, there have been many review papers on deep learning. They have described DL methods, methodologies, their applications, and future research directions in a very good way. Here, we briefly introduce some excellent review papers on deep learning.

Young et al. (2017) discussed DL models and architectures primarily used in natural language processing (NLP). They showcased DL applications in various NLP domains, compared DL models, and discussed possible future trends.

Zhang et al. (2017) discussed the current best deep learning technologies for front-end and back-end speech recognition systems.

Zhu et al. (2017) reviewed the latest advances in DL remote sensing technology. They also discussed open-source DL frameworks and other technical details of deep learning.

Wang et al. (2017) described the evolution of deep learning models in chronological order. This paper briefly introduced the models and breakthroughs in DL research. It provides an evolutionary perspective on the origins of deep learning and interprets the optimization of neural networks and future research.

Goodfellow et al. (2016) discussed deep networks and generative models in detail, summarizing recent DL research and applications, starting from the basics of machine learning (ML) and the pros and cons of deep architectures.

LeCun et al. (2015) overviewed deep learning (DL) models from convolutional neural networks (CNN) and recurrent neural networks (RNN). They described DL from the perspective of representation learning, demonstrating how DL technology works, how it is successfully used in various applications, and how it learns to predict the future based on unsupervised learning (UL). They also pointed out the main advances in the literature on DL.

Schmidhuber (2015) provided an overview of deep learning from CNN, RNN, and deep reinforcement learning (RL). He emphasized the RNN for sequence processing while pointing out the limitations of basic DL and NN and techniques to improve them.

Nielsen (2015) described the details of neural networks with code and examples. He also discussed deep neural networks and deep learning to some extent.

Schmidhuber (2014) discussed time-series-based neural networks, classification using machine learning methods, and the history and progress of using deep learning in neural networks.

Deng and Yu (2014) described the categories and techniques of deep learning, as well as the applications of DL in several fields.

Bengio (2013) briefly summarized DL algorithms from the perspective of representation learning, i.e., supervised and unsupervised networks, optimization, and training models. He focused on many challenges of deep learning, such as scaling algorithms for larger models and data, reducing optimization difficulties, and designing effective scaling methods.

Bengio et al. (2013) discussed representation and feature learning, i.e., deep learning. They explored various methods and models from the perspectives of applications, techniques, and challenges.

Deng (2011) provided an overview of deep structured learning and its architecture from the perspective of information processing and related fields.

Arel et al. (2010) briefly summarized the recent DL technologies.

Bengio (2009) discussed deep architectures, i.e., neural networks and generative models in artificial intelligence.

All recent papers on deep learning (DL) have discussed the key points of deep learning from multiple angles. This is very necessary for DL researchers. However, DL is currently a rapidly developing field. Following the publication of recent DL overview papers, many new techniques and architectures have been proposed. In addition, past papers have studied from different perspectives. Our paper primarily targets learners and novices just entering this field. To this end, we will strive to provide a foundation and clear concepts of deep learning for new researchers and anyone interested in this field.

3 Recent Advances

In this section, we will discuss the main deep learning (DL) methods that have recently emerged from machine learning and artificial neural networks (ANN), with artificial neural networks being the most commonly used form of deep learning.

3.1 Evolution of Deep Architectures

Artificial neural networks (ANN) have made significant progress, bringing forth other deep models. The first generation of artificial neural networks consisted of simple perceptron layers (i.e., perceptrons) that could only perform limited simple computations. The second generation used backpropagation, where the core algorithm of the backpropagation algorithm is to use the chain rule of derivatives, i.e., the derivative (or gradient) of the objective function with respect to the output layer, which is propagated backward through the layers until it reaches the first layer (input layer). Finally, features are passed to a nonlinear activation function to obtain classification results. (The currently most popular nonlinear activation function is ReLU, which is faster in learning than the previously popular tanh and sigmoid activation functions, allowing deep networks to learn directly without pre-training, which was previously used to address the vanishing or exploding gradient problem). Then, support vector machines (SVM) emerged and surpassed ANN for a while. To overcome the limitations of backpropagation (vanishing and exploding gradients), restricted Boltzmann machines (RBM) were proposed to make learning easier (the second generation did not have concepts like ReLU, BN layers, etc., which were introduced later to tackle the issues of backpropagation). At this point, other techniques and neural networks also emerged, such as feedforward neural networks (FNN, which are commonly referred to as DNNs, with different names like MLP, etc.), convolutional neural networks (CNN), recurrent neural networks (RNN), as well as deep belief networks, autoencoders, GANs, etc. Since then, ANN has been improved and designed for various purposes in different aspects.

Schmidhuber (2014), Bengio (2009), Deng and Yu (2014), Goodfellow et al. (2016), Wang et al. (2017) provided detailed overviews of the evolution and history of deep neural networks (DNN) and deep learning (DL). In most cases, deep architectures are multilayer nonlinear repetitions of simple architectures, allowing for highly complex functions to be obtained from the input.

4 Deep Learning Methods

Deep neural networks have achieved great success in supervised learning. Moreover, deep learning models have also been very successful in unsupervised, mixed, and reinforcement learning.

4.1 Deep Supervised Learning

Supervised learning is applied when data is labeled, classifiers classify, or numerical predictions are made. LeCun et al. (2015) provided a concise explanation of supervised learning methods and the formation of deep structures. Deng and Yu (2014) mentioned many deep networks used for supervised and mixed learning and provided explanations, such as deep stacked networks (DSN) and their variants. Schmidhuber (2014) covered all neural networks, from early neural networks to the recently successful convolutional neural networks (CNN), recurrent neural networks (RNN), long short-term memory (LSTM), and their improvements.

4.2 Deep Unsupervised Learning

When input data is unlabeled, unsupervised learning methods can be applied to extract features and classify or label them. LeCun et al. (2015) predicted the future of unsupervised learning in deep learning. Schmidhuber (2014) also described the neural networks for unsupervised learning. Deng and Yu (2014) briefly introduced the deep architectures for unsupervised learning and provided detailed explanations of deep autoencoders.

4.3 Deep Reinforcement Learning

Reinforcement learning uses a reward-punishment system to predict the next step of the learning model. This is mainly used in games and robotics to solve common decision-making problems. Schmidhuber (2014) described the advances in deep learning in reinforcement learning (RL), as well as the applications of deep feedforward neural networks (FNN) and recurrent neural networks (RNN) in RL. Li (2017) discussed deep reinforcement learning (DRL), its architectures (e.g., Deep Q-Network, DQN), and applications in various fields. (For specific information, see the second edition of “Reinforcement Learning”).

Mnih et al. (2016) proposed a DRL framework that utilizes asynchronous gradient descent for DNN optimization.

van Hasselt et al. (2015) proposed a DRL architecture using deep neural networks (DNN).

5 Deep Neural Networks

In this section, we will briefly discuss deep neural networks (DNN) and their recent improvements and breakthroughs. The function of neural networks is similar to that of the human brain. They mainly consist of neurons and connections. When we talk about deep neural networks, we can assume that there are quite a few hidden layers that can be used to extract features from the input and compute complex functions.

Bengio (2009) explained the deep structures of neural networks, such as convolutional neural networks (CNN), autoencoders (AE), and their variants. Deng and Yu (2014) provided detailed insights into some neural network architectures, such as AE and its variants. Goodfellow et al. (2016) introduced and technically explained deep feedforward networks, convolutional networks, recurrent networks, and their improvements. Schmidhuber (2014) mentioned the complete history of neural networks from early neural networks to recently successful technologies.

5.1 Deep Autoencoders

Autoencoders (AE) are neural networks (NN) where the output is the same as the input. AEs take raw input, encode it into a compressed representation, and then decode it to reconstruct the input. In deep AEs, lower hidden layers are used for encoding, while higher hidden layers are used for decoding, with error backpropagation used for training.

5.1.1 Variational Autoencoders

Variational Autoencoders (VAE) can be considered as decoders. VAEs are built on standard neural networks and can be trained through stochastic gradient descent (Doersch, 2016).

5.1.2 Multi-layer Denoising Autoencoders

In early autoencoders (AE), the dimensionality of the encoding layer is smaller than that of the input layer (narrow). In multi-layer denoising autoencoders (SDAE), the encoding layer is wider than the input layer (Deng and Yu, 2014).

5.1.3 Transformable Autoencoders

Deep Autoencoders (DAE) can be transformable, meaning that features extracted from multilayer nonlinear processing can be changed according to the learner’s needs. Transformable Autoencoders (TAE) can apply transformation invariance attributes using either the input vector or the target output vector to guide the code in the desired direction (Deng and Yu, 2014).

5.2 Deep Convolutional Neural Networks

Four basic ideas constitute convolutional neural networks (CNN), namely: local connections, shared weights, pooling, and multilayer usage. The first part of CNN consists of convolutional layers and pooling layers, while the latter part mainly consists of fully connected layers. The convolutional layer detects local connections of features, while the pooling layer merges similar features into one. CNN uses convolution instead of matrix multiplication in the convolutional layers.

Krizhevsky et al. (2012) proposed a deep convolutional neural network (CNN) architecture, also known as AlexNet, which was a significant breakthrough in deep learning (Deep Learning, DL). The network consists of 5 convolutional layers and 3 fully connected layers. This architecture employs graphics processing units (GPUs) for convolution operations and uses the linear rectifier function (ReLU) as the activation function, with Dropout to reduce overfitting.

Iandola et al. (2016) proposed a small CNN architecture called “SqueezeNet.”

Szegedy et al. (2014) proposed a deep CNN architecture named Inception. Dai et al. (2017) proposed improvements to Inception-ResNet.

Redmon et al. (2015) proposed a CNN architecture called YOLO (You Only Look Once) for uniform and real-time object detection.

Zeiler and Fergus (2013) proposed a method for visualizing the internal activations of CNNs.

Gehring et al. (2017) proposed a CNN architecture for sequence-to-sequence learning.

Bansal et al. (2017) proposed PixelNet, which uses pixels for representation.

Goodfellow et al. (2016) explained the basic architecture and ideas of CNN. Gu et al. (2015) provided an excellent overview of the latest developments in CNN, various variants of CNN, CNN architecture, regularization methods, and functionalities, as well as applications in various fields.

5.2.1 Deep Max-Pooling Convolutional Neural Networks

Max-Pooling Convolutional Neural Networks (MPCNN) mainly operate on convolution and max-pooling, especially in digital image processing. MPCNN typically consists of three layers beyond the input layer. The convolutional layer obtains the input image and generates feature maps, then applies the nonlinear activation function. The max-pooling layer downsamples the image while retaining the maximum value in the sub-region. The fully connected layer performs linear multiplication. In deep MPCNN, convolution and mixed pooling are used periodically after the input layer, followed by fully connected layers.

5.2.2 Very Deep Convolutional Neural Networks

Simonyan and Zisserman (2014) proposed a very deep convolutional neural network (VDCNN) architecture, also known as VGG Net. VGG Net uses very small convolutional filters, with a depth of 16-19 layers. Conneau et al. (2016) proposed another VDCNN architecture for text classification, using small convolutions and pooling. They claimed that this VDCNN architecture was the first used in text processing, functioning at the character level. The architecture consists of 29 convolutional layers.

5.3 Network In Network

Lin et al. (2013) proposed Network In Network (NIN). NIN replaces the convolutional layers of traditional convolutional neural networks (CNN) with micro-neural networks of complex structures. It uses multilayer perceptrons (MLPConv) to process micro-neural networks and global average pooling layers instead of fully connected layers. The deep NIN architecture can consist of multiple overlapping NIN structures.

5.4 Region-based Convolutional Neural Networks

Girshick et al. (2014) proposed Region-based Convolutional Neural Networks (R-CNN) that use regions for recognition. R-CNN uses regions to locate and segment targets. The architecture consists of three modules: a category-independent region proposal that defines a set of candidate regions, a large convolutional neural network (CNN) that extracts features from the regions, and a set of class-specific linear support vector machines (SVM).

5.4.1 Fast R-CNN

Girshick (2015) proposed Fast Region-based Convolutional Networks (Fast R-CNN). This method leverages the R-CNN architecture to generate results quickly. Fast R-CNN consists of convolutional layers, pooling layers, a region proposal layer, and a series of fully connected layers.

5.4.2 Faster R-CNN

Ren et al. (2015) proposed Faster Region-based Convolutional Neural Networks (Faster R-CNN), which use a Region Proposal Network (RPN) for real-time object detection. RPN is a fully convolutional network capable of accurately and efficiently generating region proposals (Ren et al., 2015).

5.4.3 Mask R-CNN

He et al. (2017) proposed Mask R-CNN for instance target segmentation. Mask R-CNN extends the R-CNN architecture and uses an additional branch to predict target masks.

5.4.4 Multi-Expert R-CNN

Lee et al. (2017) proposed Multi-Expert Region-based Convolutional Neural Networks (ME R-CNN), leveraging the Fast R-CNN architecture. ME R-CNN generates regions of interest (RoI) from selective and exhaustive searches. It also uses per-RoI multi-expert networks instead of a single per-RoI network. Each expert is the same architecture as the fully connected layer from Fast R-CNN.

5.5 Deep Residual Networks

He et al. (2015) proposed Residual Networks (ResNet) consisting of 152 layers. ResNet has lower errors and is easy to train through residual learning. Deeper ResNets can achieve better performance. In the field of deep learning, ResNet is considered a significant advancement.

5.5.1 ResNet in ResNet

Targ et al. (2016) proposed ResNet in ResNet (RiR), which integrates ResNets and standard convolutional neural networks (CNN) into a deep dual-stream architecture.

5.5.2 ResNeXt

Xie et al. (2016) proposed the ResNeXt architecture. ResNeXt utilizes ResNets to reuse the split-transform-merge strategy.

5.6 Capsule Networks

Sabour et al. (2017) proposed Capsule Networks (CapsNet), which consist of two convolutional layers and one fully connected layer. CapsNet typically contains multiple convolutional layers, with capsule layers located at the end. CapsNet is considered one of the latest breakthroughs in deep learning, as it is said to address the limitations of convolutional neural networks. It uses a layer of capsules instead of neurons. The lower-level capsules make predictions, and after agreeing on multiple predictions, higher-level capsules become active. A protocol routing mechanism is used in these capsule layers. Hinton later proposed EM routing, improving CapsNet using the Expectation-Maximization (EM) algorithm.

5.7 Recurrent Neural Networks

Recurrent Neural Networks (RNN) are better suited for sequential inputs, such as speech, text, and sequence generation. A repeated hidden unit can be considered a very deep feedforward network with the same weights when unfolded over time. Due to the vanishing gradient and exploding dimension problems, RNNs were once difficult to train. To address this issue, many improvements have been proposed later.

Goodfellow et al. (2016) provided a detailed analysis of the details of recurrent and recursive neural networks and architectures, as well as related gated and memory networks.

Karpathy et al. (2015) used character-level language models to analyze and visualize predictions, representation training dynamics, types of errors in RNNs and their variants (such as LSTM).

Józefowicz et al. (2016) explored the limitations of RNN models and language models.

5.7.1 RNN-EM

Peng and Yao (2015) proposed using external memory (RNN-EM) to enhance the memory capacity of RNNs. They claimed to achieve state-of-the-art results in language understanding, outperforming other RNNs.

5.7.2 GF-RNN

Chung et al. (2015) proposed Gated Feedback Recurrent Neural Networks (GF-RNN), which extend standard RNNs by stacking multiple recurrent layers with global gating units.

5.7.3 CRF-RNN

Zheng et al. (2015) proposed Conditional Random Fields as Recurrent Neural Networks (CRF-RNN), which combine convolutional neural networks (CNN) and conditional random fields (CRF) for probabilistic graphical modeling.

5.7.4 Quasi-RNN

Bradbury et al. (2016) proposed Quasi-Recurrent Neural Networks (QRNN) for neural sequence modeling and parallel applications over time steps.

5.8 Memory Networks

Weston et al. (2014) proposed Question-Answering Memory Networks (QA). Memory networks consist of memory, input feature mapping, generalization, output feature mapping, and response.

5.8.1 Dynamic Memory Networks

Kumar et al. (2015) proposed Dynamic Memory Networks (DMN) for QA tasks. DMN consists of four modules: input, question, context memory, and output.

5.9 Enhanced Neural Networks

Olah and Carter (2016) effectively demonstrated attention and enhanced recurrent neural networks, namely Neural Turing Machines (NTM), attention interfaces, neural encoders, and adaptive computation time. Enhanced neural networks typically use additional attributes, such as logical functions, along with standard neural network architectures.

5.9.1 Neural Turing Machines

Graves et al. (2014) proposed Neural Turing Machines (NTM) architecture, consisting of a neural network controller and a memory bank. NTMs typically combine RNNs with external memory banks.

5.9.2 Neural GPUs

Kaiser and Sutskever (2015) proposed Neural GPUs, addressing the parallel issues of NTMs.

5.9.3 Neural Random Access Machines

Kurach et al. (2015) proposed Neural Random Access Machines, which use external variable-sized random access memory.

5.9.4 Neural Programmers

Neelakantan et al. (2015) proposed Neural Programmers, an enhanced neural network with arithmetic and logical functions.

5.9.5 Neural Programmer-Interpreter

Reed and de Freitas (2015) proposed a learnable Neural Programmer-Interpreter (NPI), which includes periodic kernels, program memory, and domain-specific encoders.

5.10 Long Short-Term Memory Networks

Hochreiter and Schmidhuber (1997) proposed Long Short-Term Memory (LSTM), which overcomes the error backpropagation problem of recurrent neural networks (RNN). LSTM is based on recurrent networks and gradient-based learning algorithms, introducing self-recurrent paths that allow gradients to flow.

Greff et al. (2017) conducted large-scale analyses of standard LSTM and 8 LSTM variants for applications in speech recognition, handwriting recognition, and polyphonic music modeling. They claimed that the 8 variants of LSTM did not improve significantly, while only the standard LSTM performed well.

Shi et al. (2016b) proposed Deep Long Short-Term Memory Networks (DLSTM), which is a stack of LSTM units for feature mapping learning representation.

5.10.1 Batch-Normalized LSTM

Cooijmans et al. (2016) proposed Batch-Normalized LSTM (BN-LSTM), which applies batch normalization to the hidden states of recurrent neural networks.

5.10.2 Pixel RNN

van den Oord et al. (2016b) proposed Pixel Recurrent Neural Networks (Pixel-RNN), consisting of 12 two-dimensional LSTM layers.

5.10.3 Bidirectional LSTM

Wöllmer et al. (2010) proposed Bidirectional LSTM (BLSTM) recurrent networks used in conjunction with Dynamic Bayesian Networks (DBN) for context-sensitive keyword detection.

5.10.4 Variational Bi-LSTM

Shabanian et al. (2017) proposed Variational Bi-LSTM, a variant of the bidirectional LSTM architecture. Variational Bi-LSTM creates an information exchange channel between LSTMs using Variational Autoencoders (VAE) to learn better representations.

5.11 Google Neural Machine Translation

Wu et al. (2016) proposed an automatic translation system called Google Neural Machine Translation (GNMT), which combines encoder networks, decoder networks, and attention networks, following a common sequence-to-sequence learning framework.

5.12 Fader Networks

Lample et al. (2017) proposed Fader Networks, a new type of encoder-decoder architecture that generates realistic input image variations by changing attribute values.

5.13 Hyper Networks

Ha et al. (2016) proposed Hyper Networks, which generate weights for other neural networks, such as static hyper-network convolutional networks and dynamic hyper-networks for recurrent networks.

Deutsch (2018) used hyper-networks to generate neural networks.

5.14 Highway Networks

Srivastava et al. (2015) proposed Highway Networks, which learn to manage information using gating units. Information flow across multiple layers is referred to as the information highway.

5.14.1 Recurrent Highway Networks

Zilly et al. (2017) proposed Recurrent Highway Networks (RHN), which extend the Long Short-Term Memory (LSTM) architecture. RHN uses Highway layers in periodic transitions.

5.15 Highway LSTM RNN

Zhang et al. (2016) proposed Highway Long Short-Term Memory (HLSTM) RNN, which extends deep LSTM networks with closed-direction connections (i.e., Highway) between memory units of adjacent layers.

5.16 Long-Term Recurrent CNN

Donahue et al. (2014) proposed Long-Term Recurrent Convolutional Networks (LRCN), which use CNN for input and then use LSTM for recursive sequence modeling and prediction generation.

5.17 Deep Neural SVM

Zhang et al. (2015) proposed Deep Neural SVM (DNSVM), which uses Support Vector Machine (SVM) as the top layer of deep neural networks (DNN) classification.

5.18 Convolutional Residual Memory Networks

Moniz and Pal (2016) proposed Convolutional Residual Memory Networks, incorporating memory mechanisms into convolutional neural networks (CNN). It enhances convolutional residual networks with a long short-term memory mechanism.

5.19 Fractal Networks

Larsson et al. (2016) proposed Fractal Networks, or FractalNet, as an alternative to residual networks. They claim to train ultra-deep neural networks without requiring residual learning. Fractals are architectures generated by simple expansion rules.

5.20 WaveNet

van den Oord et al. (2016) proposed a deep neural network called WaveNet for generating raw audio. WaveNet consists of a stack of convolutional layers and softmax distribution layers for output.

Rethage et al. (2017) proposed a WaveNet model for speech denoising.

5.21 Pointer Networks

Vinyals et al. (2017) proposed Pointer Networks (Ptr-Nets), which address the representation variable dictionary problem using a softmax probability distribution called a “pointer.”

6 Deep Generative Models

In this section, we will briefly discuss other deep architectures that utilize multiple abstract layers and representation layers similar to deep neural networks, also known as deep generative models (DGM). Bengio (2009) explained deep architectures such as Boltzmann machines (BM) and Restricted Boltzmann Machines (RBM) and their variants.

Goodfellow et al. (2016) provided detailed explanations of deep generative models, such as restricted and unrestricted Boltzmann machines and their variants, deep Boltzmann machines, deep belief networks (DBN), directed generative networks, and generative stochastic networks.

Maaløe et al. (2016) proposed Auxiliary Deep Generative Models, in which they extended deep generative models with auxiliary variables. Auxiliary variables utilize stochastic layers and skip connections to generate variational distributions.

Rezende et al. (2016) developed a single generalization of deep generative models.

6.1 Boltzmann Machines

Boltzmann machines are a connectionist approach to learning arbitrary probability distributions, using the principle of maximum likelihood for learning.

6.2 Restricted Boltzmann Machines

Restricted Boltzmann Machines (RBM) are a special type of Markov random field that contains a layer of random hidden units, i.e., latent variables, and a layer of observable variables.

Hinton and Salakhutdinov (2011) proposed a deep generative model for document processing using restricted Boltzmann machines (RBM).

6.3 Deep Belief Networks

Deep Belief Networks (DBN) are generative models with multiple layers of latent binary or real-valued variables.

Ranzato et al. (2011) established a deep generative model for image recognition using Deep Belief Networks (DBN).

6.4 Deep Lambertian Networks

Tang et al. (2012) proposed Deep Lambertian Networks (DLN), a multi-layer generative model where the latent variables are reflectance, surface normals, and light sources. DLN is a combination of Lambertian reflectance with Gaussian restricted Boltzmann machines and deep belief networks.

6.5 Generative Adversarial Networks

Goodfellow et al. (2014) proposed Generative Adversarial Networks (GAN), which evaluate generative models through adversarial processes. The GAN architecture consists of a generative model that targets an adversary (i.e., a learning model or a discriminative model of data distribution). Mao et al. (2016), Kim et al. (2017) proposed further improvements to GAN.

Salimans et al. (2016) proposed several methods for training GANs.

6.5.1 Laplacian Generative Adversarial Networks

Denton et al. (2015) proposed a deep generative model (DGM) called Laplacian Generative Adversarial Networks (LAPGAN), using the generative adversarial network (GAN) method. This model also utilizes convolutional networks within the Laplacian pyramid framework.

6.6 Recurrent Support Vector Machines

Shi et al. (2016a) proposed Recurrent Support Vector Machines (RSVM), utilizing recurrent neural networks (RNN) to extract features from input sequences for sequence-level target recognition using standard support vector machines (SVM).

7 Training and Optimization Techniques

In this section, we will briefly outline some key techniques used for regularization and optimization of deep neural networks (DNN).

7.1 Dropout

(Extended methods like dropconnect, etc. … too many)

Srivastava et al. (2014) proposed Dropout to prevent overfitting in neural networks. Dropout is a model averaging regularization method for neural networks, adding noise to its hidden units. During training, it randomly samples units and connections from the neural network. Dropout can be applied to graphical models like RBM (Srivastava et al., 2014) and any type of neural network. A recent improvement on Dropout is Fraternal Dropout, used for recurrent neural networks (RNN).

7.2 Maxout

Goodfellow et al. (2013) proposed Maxout, a new activation function for Dropout. Maxout outputs the maximum value of a set of inputs, benefiting the model averaging of Dropout.

7.3 Zoneout

Krueger et al. (2016) proposed Zoneout, a regularization method for recurrent neural networks (RNN). Zoneout randomly uses noise during training, similar to Dropout, but retains the hidden units instead of dropping them.

7.4 Deep Residual Learning

He et al. (2015) proposed a deep residual learning framework known as ResNet with low training errors.

7.5 Batch Normalization

(Including various variants of Bn and bnd…)

Ioffe and Szegedy (2015) proposed Batch Normalization, a method to accelerate deep neural network training by reducing internal covariate shift. Ioffe (2017) proposed Batch Re-normalization, extending previous methods.

7.6 Distillation

Hinton et al. (2015) proposed a method for transferring knowledge from a highly regularized model ensemble (i.e., neural networks) to a compressed small model.

7.7 Layer Normalization

Ba et al. (2016) proposed Layer Normalization, specifically targeting deep neural networks for RNN to accelerate training, addressing the limitations of batch normalization.

8 Deep Learning Frameworks

There are numerous open-source libraries and frameworks available for deep learning. Most of them are built for the Python programming language, such as Theano, TensorFlow, PyTorch, PyBrain, Caffe, Blocks and Fuel, CuDNN, Honk, ChainerCV, PyLearn2, Chainer, torch, etc.

9 Applications of Deep Learning

In this section, we will briefly discuss some outstanding recent applications in deep learning. Since the inception of deep learning (DL), DL methods have been widely applied in various fields in the form of supervised, unsupervised, semi-supervised, or reinforcement learning. Starting from classification and detection tasks, DL applications are rapidly expanding into every domain.

For example:

Image classification and recognition
Video classification
Sequence generation
Defect classification
Text, speech, image, and video processing
Text classification
Speech processing
Speech recognition and understanding
Text-to-speech generation
Query classification
Sentence classification
Sentence modeling
Vocabulary processing
Pre-selection
Document and sentence processing
Generating image captions
Photo style transfer
Natural image manifolds
Image colorization
Image question answering
Generating textures and stylized images
Visual and text question answering
Visual recognition and description
Object recognition
Document processing
Character action synthesis and editing
Song synthesis
Identity recognition
Face recognition and verification
Video action recognition
Human action recognition
Action recognition
Classification and visualization of motion capture sequences
Handwriting generation and prediction
Automation and machine translation
Named entity recognition
Mobile vision
Conversational agents
Genetic variation calls
Cancer detection
X-ray CT reconstruction
Seizure prediction
Hardware acceleration
Robotics

etc.

Deng and Yu (2014) provided a detailed list of DL applications in speech processing, information retrieval, object recognition, computer vision, multimodal, and multi-task learning.

Using deep reinforcement learning (Deep Reinforcement Learning, DRL) to master games has become a hot topic today. Currently, artificial intelligence robots are created using DNN and DRL, defeating human world champions and chess masters in strategic and other games, starting from just a few hours of training. For example, AlphaGo and AlphaGo Zero in Go.

10 Discussion

Despite the tremendous success of deep learning in many fields, there is still a long way to go. Many areas remain to be improved. Regarding limitations, there are quite a few examples. For instance, Nguyen et al. showed that deep neural networks (DNN) are easily deceived when recognizing images. Other issues, such as the transferability of learned features proposed by Yosinski et al., also exist. Huang et al. proposed an architecture for defending against neural network attacks, suggesting that future work needs to defend against these attacks. Zhang et al. proposed an experimental framework for understanding deep learning models, arguing that understanding deep learning requires rethinking and generalization.

Marcus provided an important review in 2018 on the role, limitations, and essence of deep learning (Deep Learning, DL). He strongly pointed out the limitations of DL methods, such as the need for more data, limited capacity, inability to handle hierarchical structures, lack of open-ended reasoning, insufficient transparency, inability to integrate with prior knowledge, and inability to distinguish causality. He also mentioned that DL assumes a stable world, achieving approximate methods, is difficult to engineer, and poses potential risks of overhyping. Marcus believes that DL needs to be reconceptualized and seeks possibilities in unsupervised learning, symbolic operations, and hybrid models, gaining insights from cognitive science and psychology, and embracing bolder challenges.

11 Conclusion

Despite the rapid advancement of deep learning (DL) in propelling the world forward, many aspects are still worth researching. We still do not fully understand deep learning and how to make machines smarter, closer to or smarter than humans, or learn like humans. DL has been solving many problems while applying technology to various aspects. However, humanity still faces many challenges, such as starvation and food crises, cancer, and other fatal diseases. We hope that deep learning and artificial intelligence will be more committed to improving human quality of life by undertaking the most challenging scientific research. Last but not least, may our world become a better place.

There are some omissions, but overall it is summarized well, nice.

Source:

https://zhuanlan.zhihu.com/p/85625555

Editor: Yu Tengkai

Proofreader: Tan Jiayao