A Comprehensive Overview of Deep Learning for Beginners

This article is reproduced from Machine Heart.This overview paper lists important research achievements in deep learning in recent years, covering methods, architectures, as well as regularization and optimization techniques. Machine Heart believes that this overview is a good reference for beginners in deep learning, helping to form a basic academic landscape and guiding literature searches.

Paper: Recent Advances in Deep Learning: An Overview

A Comprehensive Overview of Deep Learning for Beginners

Paper link: https://arxiv.org/pdf/1807.08169v1.pdf

Abstract: Deep learning is one of the latest trends in machine learning and artificial intelligence research. It is also one of the most popular scientific research trends today. Deep learning methods have brought revolutionary advances to computer vision and machine learning. New deep learning techniques are constantly emerging, surpassing state-of-the-art machine learning and even existing deep learning technologies. In recent years, there have been many significant breakthroughs in this field worldwide. As deep learning is rapidly developing, its progress is difficult to keep up with, especially for new researchers. In this paper, we will briefly discuss the latest advances in deep learning in recent years.

1. Introduction

The term “deep learning” (DL) was first introduced in machine learning (ML) in 1986, and later used for artificial neural networks (ANN) in 2000. Deep learning methods consist of multiple layers to learn data features with multiple levels of abstraction. DL methods allow computers to learn complex concepts through relatively simple concepts. For artificial neural networks (ANN), deep learning (DL) (also known as hierarchical learning) refers to the precise allocation of credit across multiple computational stages to transform aggregated activations in the network. To learn complex functions, deep architectures are used at multiple levels of abstraction, i.e., nonlinear operations; for example, ANNs with many hidden layers. In summary, deep learning is a subfield of machine learning that uses multi-level nonlinear information processing and abstraction for supervised or unsupervised feature learning, representation, classification, and pattern recognition.

Deep learning, or representation learning, is a branch or subfield of machine learning, and most people believe that modern deep learning methods began to develop around 2006. This paper is a review of the latest deep learning technologies, mainly recommended for researchers who are about to enter this field. It includes the basic ideas of DL, main methods, latest advancements, and applications.

Review papers are very beneficial, especially for new researchers in a specific field. If a research area has significant value in the near future and related application fields, it is often difficult to keep track of the latest developments in real-time. Nowadays, scientific research is a very attractive profession because knowledge and education are more accessible than ever. For a trend in technological research, the only normal assumption is that it will see many improvements in various aspects. An overview of a field from several years ago may now be outdated.

Considering the popularity and promotion of deep learning in recent years, we briefly summarize deep learning and neural networks (NN), as well as their major advancements and breakthroughs over the years. We hope this article will help many novice researchers gain a comprehensive understanding of recent deep learning research and technologies in this field and guide them to start in the right way. At the same time, we hope to pay tribute to the top DL and ANN researchers of this era: Geoffrey Hinton, Juergen Schmidhuber, Yann LeCun, Yoshua Bengio, and many other researchers whose work has built modern artificial intelligence (AI). Keeping up with their work to track current best DL and ML research progress is also crucial for us.

In this paper, we will first briefly describe past research papers and study the models and methods of deep learning. Then, we will begin to describe the latest advancements in this field. We will discuss deep learning (DL) methods, deep architectures (i.e., deep neural networks (DNN)), and deep generative models (DGM), followed by important regularization and optimization methods. Additionally, we will summarize open-source DL frameworks and important DL applications in two brief sections. We will discuss the current state and future of deep learning in the last two chapters (i.e., discussion and conclusion).

2. Related Research

In recent years, there have been many review papers on deep learning. They describe DL methods, methodologies, and their applications and future research directions in a good way. Here, we briefly introduce some excellent review papers on deep learning.

Young et al. (2017) discuss DL models and architectures, mainly used for natural language processing (NLP). They showcase DL applications in different NLP areas, compare DL models, and discuss possible future trends.

Zhang et al. (2017) discuss the current best deep learning techniques for front-end and back-end speech recognition systems.

Zhu et al. (2017) review the latest advancements in DL remote sensing technology. They also discuss open-source DL frameworks and other technical details of deep learning.

Wang et al. (2017) describe the evolution of deep learning models in chronological order. This short paper briefly introduces models and breakthroughs in DL research. It provides an evolutionary understanding of the origins of deep learning and interprets the optimization of neural networks and future research.

Goodfellow et al. (2016) discuss deep networks and generative models in detail, summarizing recent DL research and applications starting from the fundamentals of machine learning (ML) and the advantages and disadvantages of deep architectures.

LeCun et al. (2015) overview deep learning (DL) models from convolutional neural networks (CNN) and recurrent neural networks (RNN). They describe DL from the perspective of representation learning, showcasing how DL technologies work, how they are successfully used in various applications, and how to learn based on unsupervised learning (UL) to predict the future. They also point out the main advancements in DL in the literature.

Schmidhuber (2015) provides an overview of deep learning from CNN, RNN, and deep reinforcement learning (RL). He emphasizes the RNN for sequence processing while noting the limitations of basic DL and NN and techniques to improve them.

Nielsen (2015) describes the details of neural networks with code and examples. He also discusses deep neural networks and deep learning to some extent.

Schmidhuber (2014) discusses time-series neural networks, classification using machine learning methods, and the history and progress of deep learning in neural networks.

Deng and Yu (2014) describe the categories and techniques of deep learning, as well as its applications in several fields.

Bengio (2013) briefly outlines DL algorithms from the perspective of representation learning, namely supervised and unsupervised networks, optimization, and training models. He focuses on many challenges of deep learning, such as scaling algorithms for larger models and data, reducing optimization difficulties, and designing effective scaling methods.

Bengio et al. (2013) discuss representation and feature learning, i.e., deep learning. They explore various methods and models from the perspectives of applications, techniques, and challenges.

Deng (2011) provides an overview of deep structured learning and its architectures from the perspective of information processing and related fields.

Arel et al. (2010) briefly summarize recent DL technologies.

Bengio (2009) discusses deep architectures, namely neural networks and generative models in artificial intelligence.

All recent papers on deep learning (DL) discuss key points of deep learning from multiple perspectives. This is very necessary for DL researchers. However, DL is currently a thriving field. After the recent publication of DL overview papers, there are still many new techniques and architectures being proposed. Moreover, previous papers have studied from different angles. Our paper primarily targets learners and novices who are just entering this field. To this end, we will strive to provide a foundation and clear concepts of deep learning for new researchers and anyone interested in this field.

3. Latest Advances

In this section, we will discuss the main deep learning (DL) methods derived from machine learning and artificial neural networks (ANN), with artificial neural networks being the most commonly used form of deep learning.

3.1 Evolution of Deep Architectures

Artificial neural networks (ANN) have made significant progress, leading to other deep models. The first generation of artificial neural networks consisted of simple perceptron neural layers that could only perform limited simple computations. The second generation used backpropagation to update the weights of neurons based on error rates. Then, support vector machines (SVM) emerged, surpassing ANN for a period. To overcome the limitations of backpropagation, restricted Boltzmann machines (RBM) were proposed to facilitate learning. At this time, other techniques and neural networks also emerged, such as feedforward neural networks (FNN), convolutional neural networks (CNN), recurrent neural networks (RNN), as well as deep belief networks, autoencoders, etc. Since then, ANN has been improved and designed for various purposes.

Schmidhuber (2014), Bengio (2009), Deng and Yu (2014), Goodfellow et al. (2016), Wang et al. (2017) provide detailed overviews of the evolution and history of deep neural networks (DNN) and deep learning (DL). In most cases, deep architectures are multilayer nonlinear repetitions of simple architectures, allowing for highly complex functions to be obtained from inputs.

4. Deep Learning Methods

Deep neural networks have achieved great success in supervised learning. Moreover, deep learning models are also very successful in unsupervised, semi-supervised, and reinforcement learning.

4.1 Deep Supervised Learning

Supervised learning is applied when data is labeled, classifiers classify, or numerical predictions are made. LeCun et al. (2015) provide a concise explanation of supervised learning methods and the formation of deep structures. Deng and Yu (2014) mention many deep networks used for supervised and semi-supervised learning and provide explanations, such as deep stacked networks (DSN) and their variants. Schmidthuber (2014) covers all neural networks, from early neural networks to recently successful convolutional neural networks (CNN), recurrent neural networks (RNN), long short-term memory (LSTM), and their improvements.

4.2 Deep Unsupervised Learning

When input data is unlabeled, unsupervised learning methods can be applied to extract features from data and classify or label them. LeCun et al. (2015) predict the future of unsupervised learning in deep learning. Schmidthuber (2014) also describes unsupervised learning neural networks. Deng and Yu (2014) briefly introduce the deep architectures of unsupervised learning and explain deep autoencoders in detail.

4.3 Deep Reinforcement Learning

Reinforcement learning uses a reward and punishment system to predict the next step of the learning model. This is mainly used in games and robotics to solve common decision-making problems. Schmidthuber (2014) describes the progress of deep learning (DL) in reinforcement learning (RL) and the applications of deep feedforward neural networks (FNN) and recurrent neural networks (RNN) in RL. Li (2017) discusses deep reinforcement learning (DRL), its architecture (e.g., Deep Q-Network, DQN), and applications in various fields.

Mnih et al. (2016) proposed a DRL framework that optimizes DNN using asynchronous gradient descent.

van Hasselt et al. (2015) proposed a DRL architecture using deep neural networks (DNN).

5. Deep Neural Networks

In this section, we will briefly discuss deep neural networks (DNN) and their recent improvements and breakthroughs. The functionality of neural networks is similar to that of the human brain. They mainly consist of neurons and connections. When we talk about deep neural networks, we can assume that there are quite a few hidden layers that can be used to extract features from inputs and compute complex functions. Bengio (2009) explains deep structured neural networks, such as convolutional neural networks (CNN), autoencoders (AE), and their variants. Deng and Yu (2014) provide a detailed introduction to some neural network architectures, such as AE and its variants. Goodfellow et al. (2016) introduce and technically explain deep feedforward networks, convolutional networks, recurrent networks, and their improvements. Schmidhuber (2014) mentions the complete history of neural networks from early neural networks to recently successful technologies.

5.1 Deep Autoencoders

Autoencoders (AE) are neural networks (NN) where the output is the same as the input. AE takes the raw input, encodes it into a compressed representation, and then decodes it to reconstruct the input. In deep AEs, lower hidden layers are used for encoding, and higher hidden layers are used for decoding, with error backpropagation used for training.

5.1.1 Variational Autoencoders

Variational autoencoders (VAE) can be considered as decoders. VAE builds on standard neural networks and can be trained through stochastic gradient descent (Doersch, 2016).

5.1.2 Stacked Denoising Autoencoders

In early autoencoders (AE), the dimensionality of the encoding layer is smaller than that of the input layer (narrow). In stacked denoising autoencoders (SDAE), the encoding layer is wider than the input layer (Deng and Yu, 2014).

5.1.3 Transformable Autoencoders

Deep autoencoders (DAE) can be transformable, meaning that features extracted from multi-layer nonlinear processing can be changed according to the learner’s needs. Transformable autoencoders (TAE) can use either input vectors or target output vectors to apply transformation invariance properties, guiding the code in the desired direction (Deng and Yu, 2014).

5.2 Deep Convolutional Neural Networks

Four basic ideas constitute convolutional neural networks (CNN): local connections, shared weights, pooling, and multilayer usage. The first part of CNN consists of convolutional layers and pooling layers, while the latter part mainly comprises fully connected layers. The convolutional layer detects features through local connections, while the pooling layer merges similar features into one. CNN uses convolution instead of matrix multiplication in the convolutional layer.

Krizhevsky et al. (2012) proposed a deep convolutional neural network (CNN) architecture known as AlexNet, which is a significant breakthrough in deep learning (DL). The network consists of 5 convolutional layers and 3 fully connected layers. This architecture uses graphics processing units (GPUs) for convolution operations, employs the rectified linear unit (ReLU) as the activation function, and uses dropout to reduce overfitting.

Iandola et al. (2016) proposed a small CNN architecture called “SqueezeNet.”

Szegedy et al. (2014) proposed a deep CNN architecture named Inception. Dai et al. (2017) proposed improvements to Inception-ResNet.

Redmon et al. (2015) proposed a CNN architecture called YOLO (You Only Look Once) for uniform and real-time object detection.

Zeiler and Fergus (2013) proposed a method to visualize the internal activations of CNNs.

Gehring et al. (2017) proposed a CNN architecture for sequence-to-sequence learning.

Bansal et al. (2017) proposed PixelNet, which uses pixels for representation.

Goodfellow et al. (2016) explain the basic architecture and ideas of CNN. Gu et al. (2015) provide a good overview of the latest advancements in CNN, various variants of CNN, CNN architectures, regularization methods, functionalities, and applications in various fields.

5.2.1 Deep Max-Pooling Convolutional Neural Networks

Max-pooling convolutional neural networks (MPCNN) mainly operate on convolution and max-pooling, especially in digital image processing. MPCNN typically consists of three layers beyond the input layer. The convolutional layer acquires the input image and generates feature maps, then applies a nonlinear activation function. The max-pooling layer downsamples the image and retains the maximum value of sub-regions. The fully connected layer performs linear multiplication. In deep MPCNN, convolution and mixed pooling are periodically used after the input layer, followed by fully connected layers.

5.2.2 Very Deep Convolutional Neural Networks

Simonyan and Zisserman (2014) proposed a very deep convolutional neural network (VDCNN) architecture, also known as VGG Net. VGG Net uses very small convolutional filters, achieving depths of 16-19 layers. Conneau et al. (2016) proposed another VDCNN architecture for text classification, using small convolutions and pooling. They claim this VDCNN architecture is the first used in text processing, operating at the character level. This architecture consists of 29 convolutional layers.

5.3 Networks in Networks

Lin et al. (2013) proposed Networks in Networks (NIN). NIN replaces the convolutional layers of traditional convolutional neural networks (CNN) with micro neural networks of complex structures. It uses multilayer perceptrons (MLPConv) to process micro neural networks and global average pooling layers instead of fully connected layers. The deep NIN architecture can consist of multiple overlapping NIN structures.

5.4 Region-Based Convolutional Neural Networks

Girshick et al. (2014) proposed Region-Based Convolutional Neural Networks (R-CNN), which use regions for recognition. R-CNN uses regions to locate and segment objects. The architecture consists of three modules: a category-independent region proposal that defines a set of candidate regions, a large convolutional neural network (CNN) that extracts features from regions, and a set of class-specific linear support vector machines (SVM).

5.4.1 Fast R-CNN

Girshick (2015) proposed Fast Region-Based Convolutional Networks (Fast R-CNN). This method utilizes the R-CNN architecture to generate results quickly. Fast R-CNN consists of convolutional layers and pooling layers, region proposal layers, and a series of fully connected layers.

5.4.2 Faster R-CNN

Ren et al. (2015) proposed Faster Region-Based Convolutional Neural Networks (Faster R-CNN), which use Region Proposal Networks (RPN) for real-time object detection. RPN is a fully convolutional network that can accurately and efficiently generate region proposals (Ren et al., 2015).

5.4.3 Mask R-CNN

He et al. (2017) proposed Mask Region-Based Convolutional Networks (Mask R-CNN) for instance segmentation. Mask R-CNN extends the architecture of R-CNN and uses an additional branch to predict object masks.

5.4.4 Multi-Expert R-CNN

Lee et al. (2017) proposed Multi-Expert Region-Based Convolutional Neural Networks (ME R-CNN), which leverage the Fast R-CNN architecture. ME R-CNN generates regions of interest (RoI) from selective and exhaustive searches. It also uses per-RoI multi-expert networks instead of a single per-RoI network. Each expert is the same architecture as the fully connected layer from Fast R-CNN.

5.5 Deep Residual Networks

He et al. (2015) proposed Residual Networks (ResNet) consisting of 152 layers. ResNet has lower errors and is easier to train through residual learning. Deeper ResNets can achieve better performance. In the field of deep learning, ResNet is considered a significant advancement.

5.5.1 ResNet in ResNet

Targ et al. (2016) proposed ResNet in ResNet (RiR), which integrates ResNets and standard convolutional neural networks (CNN) into a deep dual-stream architecture.

5.5.2 ResNeXt

Xie et al. (2016) proposed the ResNeXt architecture. ResNeXt utilizes ResNets to reuse the split-transform-merge strategy.

5.6 Capsule Networks

Sabour et al. (2017) proposed Capsule Networks (CapsNet), an architecture consisting of two convolutional layers and a fully connected layer. CapsNet typically contains multiple convolutional layers, with capsule layers at the end. CapsNet is considered one of the latest breakthroughs in deep learning, as it was proposed based on the limitations of convolutional neural networks. It uses layers of capsules instead of neurons. Lower-level activated capsules make predictions, and after agreeing on multiple predictions, higher-level capsules become active. A protocol routing mechanism is used among these capsule layers. Hinton later proposed EM routing, improving CapsNet using the Expectation-Maximization (EM) algorithm.

5.7 Recurrent Neural Networks

Recurrent Neural Networks (RNN) are better suited for sequential inputs such as speech, text, and generated sequences. A repeated hidden unit can be considered a very deep feedforward network with the same weights when unfolded over time. Due to gradient vanishing and dimension explosion issues, RNNs were once difficult to train. To address this issue, many improvements have been proposed later.

Goodfellow et al. (2016) provide a detailed analysis of the details of recurrent and recursive neural networks and architectures, as well as related gating and memory networks.

Karpathy et al. (2015) use character-level language models to analyze and visualize predictions, represent training dynamics, and types of errors in RNNs and their variants (such as LSTM).

Józefowicz et al. (2016) explore the limitations of RNN models and language models.

5.7.1 RNN-EM

Peng and Yao (2015) proposed RNN-EM, which utilizes external memory (RNN-EM) to improve the memory capacity of RNNs. They claim to have achieved state-of-the-art performance in language understanding, outperforming other RNNs.

5.7.2 GF-RNN

Chung et al. (2015) proposed Gated Feedback Recurrent Neural Networks (GF-RNN), which extend standard RNNs by stacking multiple recurrent layers with global gating units.

5.7.3 CRF-RNN

Zheng et al. (2015) proposed Conditional Random Fields as Recurrent Neural Networks (CRF-RNN), which combine convolutional neural networks (CNN) and conditional random fields (CRF) for probabilistic graphical modeling.

5.7.4 Quasi-RNN

Bradbury et al. (2016) proposed Quasi-Recurrent Neural Networks (QRNN) for neural sequence modeling and parallel applications over time steps.

5.8 Memory Networks

Weston et al. (2014) proposed Question Answering Memory Networks (QA). Memory networks consist of memory, input feature mapping, generalization, output feature mapping, and response.

5.8.1 Dynamic Memory Networks

Kumar et al. (2015) proposed Dynamic Memory Networks (DMN) for QA tasks. DMN has four modules: input, question, contextual memory, output.

5.9 Enhanced Neural Networks

Olah and Carter (2016) effectively demonstrate attention and enhanced recurrent neural networks, namely Neural Turing Machines (NTM), attention interfaces, neural encoders, and adaptive computation time. Enhanced neural networks typically use additional attributes, such as logical functions, along with standard neural network architectures.

5.9.1 Neural Turing Machines

Graves et al. (2014) proposed Neural Turing Machines (NTM), consisting of neural network controllers and memory banks. NTM typically combines RNNs with external memory banks.

5.9.2 Neural GPUs

Kaiser and Sutskever (2015) proposed Neural GPUs, addressing the parallelization issue of NTM.

5.9.3 Neural Random Access Machines

Kurach et al. (2015) proposed Neural Random Access Machines, which use external variable-sized random access memory.

5.9.4 Neural Programmers

Neelakantan et al. (2015) proposed Neural Programmers, an enhanced neural network with arithmetic and logical capabilities.

5.9.5 Neural Programmer-Interpreter

Reed and de Freitas (2015) proposed a learnable Neural Programmer-Interpreter (NPI). NPI includes periodic kernels, program memory, and domain-specific encoders.

5.10 Long Short-Term Memory Networks

Hochreiter and Schmidhuber (1997) proposed Long Short-Term Memory (LSTM) to overcome the backpropagation errors of recurrent neural networks (RNN). LSTM is based on recurrent networks and gradient-based learning algorithms, introducing self-recurrent paths that allow gradients to flow.

Greff et al. (2017) conducted a large-scale analysis of standard LSTM and 8 LSTM variants, used for speech recognition, handwriting recognition, and polyphonic music modeling. They claim that the 8 variants of LSTM did not show significant improvements, while only standard LSTM performed well.

Shi et al. (2016b) proposed Deep Long Short-Term Memory Networks (DLSTM), which are a stack of LSTM units for feature mapping learning representation.

5.10.1 Batch-Normalized LSTM

Cooijmans et al. (2016) proposed Batch-Normalized LSTM (BN-LSTM), which uses batch normalization on the hidden states of recurrent neural networks.

5.10.2 Pixel RNN

van den Oord et al. (2016b) proposed Pixel Recurrent Neural Networks (Pixel-RNN), consisting of 12 two-dimensional LSTM layers.

5.10.3 Bidirectional LSTM

Wöllmer et al. (2010) proposed Bidirectional LSTM (BLSTM), a recurrent network used in conjunction with Dynamic Bayesian Networks (DBN) for context-sensitive keyword detection.

5.10.4 Variational Bi-LSTM

Shabanian et al. (2017) proposed Variational Bi-LSTM, a variant of the bidirectional LSTM architecture. Variational Bi-LSTM creates an information exchange channel between LSTMs using variational autoencoders (VAE) to learn better representations.

5.11 Google Neural Machine Translation

Wu et al. (2016) proposed an automatic translation system called Google Neural Machine Translation (GNMT), which combines encoder networks, decoder networks, and attention networks, following a common sequence-to-sequence learning framework.

5.12 Fader Network

Lample et al. (2017) proposed Fader Networks, a new type of encoder-decoder architecture that generates realistic input image variations by changing attribute values.

5.13 Hyper Networks

Ha et al. (2016) proposed Hyper Networks that generate weights for other neural networks, such as static hyper networks for convolutional networks and dynamic hyper networks for recurrent networks.

Deutsch (2018) used hyper networks to generate neural networks.

5.14 Highway Networks

Srivastava et al. (2015) proposed Highway Networks that learn to manage information using gating units. Information flow across multiple layers is referred to as an information highway.

5.14.1 Recurrent Highway Networks

Zilly et al. (2017) proposed Recurrent Highway Networks (RHN), which extend the Long Short-Term Memory (LSTM) architecture. RHN uses Highway layers in periodic transitions.

5.15 Highway LSTM RNN

Zhang et al. (2016) proposed Highway Long Short-Term Memory (HLSTM) RNN, which extends the deep LSTM network with closed-direction connections (i.e., Highway) between memory units of adjacent layers.

5.16 Long-Term Recurrent CNN

Donahue et al. (2014) proposed Long-Term Recurrent Convolutional Networks (LRCN), which use CNN for input and then use LSTM for recursive sequence modeling and generating predictions.

5.17 Deep Neural SVM

Zhang et al. (2015) proposed Deep Neural SVM (DNSVM), which uses Support Vector Machines (SVM) as the top layer of Deep Neural Networks (DNN) classification.

5.18 Convolutional Residual Memory Networks

Moniz and Pal (2016) proposed Convolutional Residual Memory Networks, which incorporate memory mechanisms into Convolutional Neural Networks (CNN). It enhances the convolutional residual network with a Long Short-Term Memory mechanism.

5.19 Fractal Networks

Larsson et al. (2016) proposed Fractal Networks (FractalNet) as an alternative to residual networks. They claim to train ultra-deep neural networks without the need for residual learning. Fractals are repetitive architectures generated by simple expansion rules.

5.20 WaveNet

van den Oord et al. (2016) proposed WaveNet, a deep neural network for generating raw audio. WaveNet consists of a stack of convolutional layers and softmax distribution layers for output.

Rethage et al. (2017) proposed a WaveNet model for speech denoising.

5.21 Pointer Networks

Vinyals et al. (2017) proposed Pointer Networks (Ptr-Nets), which solve the problem of representing variable dictionaries using a softmax probability distribution called “pointer”.

6. Deep Generative Models

In this section, we will briefly discuss other deep architectures that use multiple abstract layers and representation layers similar to deep neural networks, also known as deep generative models (DGM). Bengio (2009) explains deep architectures such as Boltzmann machines (BM) and Restricted Boltzmann Machines (RBM) and their variants.

Goodfellow et al. (2016) provide a detailed explanation of deep generative models, such as restricted and unrestricted Boltzmann machines and their variants, deep Boltzmann machines, deep belief networks (DBN), directed generative networks, and generative stochastic networks.

Maaløe et al. (2016) proposed auxiliary deep generative models, in which they extend deep generative models with auxiliary variables. Auxiliary variables leverage random layers and skip connections to generate variational distributions.

Rezende et al. (2016) developed a single generalization of deep generative models.

6.1 Boltzmann Machines

Boltzmann machines are a connectionist approach for learning arbitrary probability distributions, learning using the principle of maximum likelihood.

6.2 Restricted Boltzmann Machines

Restricted Boltzmann Machines (RBM) are a special type of Markov random field, consisting of a layer of random hidden units, i.e., latent variables, and a layer of observable variables.

Hinton and Salakhutdinov (2011) proposed a deep generative model for document processing using restricted Boltzmann machines (RBM).

6.3 Deep Belief Networks

Deep Belief Networks (DBN) are generative models with multiple layers of latent binary or real variables.

Ranzato et al. (2011) used deep belief networks (DBN) to establish deep generative models for image recognition.

6.4 Deep Lambertian Networks

Tang et al. (2012) proposed Deep Lambertian Networks (DLN), a multi-layer generative model where latent variables are reflectance, surface normals, and light sources. DLN is a combination of Lambertian reflectance with Gaussian restricted Boltzmann machines and deep belief networks.

6.5 Generative Adversarial Networks

Goodfellow et al. (2014) proposed Generative Adversarial Networks (GAN) for evaluating generative models through adversarial processes. The GAN architecture consists of a generator model that targets an adversary (i.e., a model for learning or data distribution).

Mao et al. (2016), Kim et al. (2017) proposed further improvements to GAN.

Salimans et al. (2016) proposed several methods for training GANs.

6.5.1 Laplacian Generative Adversarial Networks

Denton et al. (2015) proposed a deep generative model (DGM) called Laplacian Generative Adversarial Networks (LAPGAN), which uses generative adversarial networks (GAN) methods. This model also uses convolutional networks within the Laplacian pyramid framework.

6.6 Recurrent Support Vector Machines

Shi et al. (2016a) proposed Recurrent Support Vector Machines (RSVM), which utilize recurrent neural networks (RNN) to extract features from input sequences for sequence-level target recognition using standard support vector machines (SVM).

7. Training and Optimization Techniques

In this section, we will briefly overview some major techniques for regularizing and optimizing deep neural networks (DNN).

7.1 Dropout

Srivastava et al. (2014) proposed Dropout to prevent neural networks from overfitting. Dropout is a regularization method for neural network models that adds noise to its hidden units. During training, it randomly samples units and connections from the neural network. Dropout can be used in graphical models like RBM (Srivastava et al., 2014) and can be applied to any type of neural network. A recent improvement on Dropout is Fraternal Dropout, used for recurrent neural networks (RNN).

7.2 Maxout

Goodfellow et al. (2013) proposed Maxout, a new activation function for Dropout. The output of Maxout is the maximum of a set of inputs, which benefits the model averaging of Dropout.

7.3 Zoneout

Krueger et al. (2016) proposed a regularization method for recurrent neural networks (RNN) called Zoneout. Zoneout randomly uses noise during training, similar to Dropout, but retains hidden units instead of dropping them.

7.4 Deep Residual Learning

He et al. (2015) proposed a deep residual learning framework known as ResNet with low training errors.

7.5 Batch Normalization

Ioffe and Szegedy (2015) proposed batch normalization, a method to accelerate deep neural network training by reducing internal covariate shifts. Ioffe (2017) proposed batch re-normalization, extending previous methods.

7.6 Distillation

Hinton et al. (2015) proposed a method to transfer knowledge from a collection of highly regularized models (i.e., neural networks) to a compressed small model.

7.7 Layer Normalization

Ba et al. (2016) proposed layer normalization, specifically aimed at accelerating training of deep neural networks for RNN, addressing the limitations of batch normalization.

8. Deep Learning Frameworks

There are numerous open-source libraries and frameworks available for deep learning. Most of them are built for the Python programming language, such as Theano, TensorFlow, PyTorch, PyBrain, Caffe, Blocks and Fuel, CuDNN, Honk, ChainerCV, PyLearn2, Chainer, torch, etc.

9. Applications of Deep Learning

In this section, we will briefly discuss some outstanding applications in deep learning recently. Since the inception of deep learning (DL), DL methods have been widely applied in various fields in the forms of supervised, unsupervised, semi-supervised, or reinforcement learning. Starting from classification and detection tasks, DL applications are rapidly expanding into every field.

For example:

  • Image classification and recognition

  • Video classification

  • Sequence generation

  • Defect classification

  • Text, speech, image, and video processing

  • Text classification

  • Speech processing

  • Speech recognition and spoken language understanding

  • Text-to-speech generation

  • Query classification

  • Sentence classification

  • Sentence modeling

  • Lexical processing

  • Pre-selection

  • Document and sentence processing

  • Generating image captions

  • Photo style transfer

  • Natural image manifold

  • Image colorization

  • Image question answering

  • Generating textures and stylized images

  • Visual and text question answering

  • Visual recognition and description

  • Object recognition

  • Document processing

  • Human action synthesis and editing

  • Song synthesis

  • Identity recognition

  • Face recognition and verification

  • Video action recognition

  • Human action recognition

  • Action recognition

  • Classification and visualization of motion capture sequences

  • Handwriting generation and prediction

  • Automation and machine translation

  • Named entity recognition

  • Mobile vision

  • Conversational agents

  • Genetic variation calling

  • Cancer detection

  • X-ray CT reconstruction

  • Seizure prediction

  • Hardware acceleration

  • Robotics

etc.

Deng and Yu (2014) provide a detailed list of DL applications in speech processing, information retrieval, object recognition, computer vision, multimodal, and multi-task learning.

Using deep reinforcement learning (DRL) to master games has become a hot topic today. Nowadays, AI robots are created using DNN and DRL, defeating human world champions and chess masters in strategic and other games, starting from a few hours of training. For example, AlphaGo and AlphaGo Zero for Go.

10. Discussion

Despite the tremendous success of deep learning in many fields, there is still a long way to go. There are still many areas that need improvement. As for limitations, there are quite a few examples. For instance, Nguyen et al. showed that deep neural networks (DNN) are easily deceived when recognizing images. There are other issues, such as the transferability of learned features proposed by Yosinski et al. Huang et al. proposed an architecture for neural network attack defense, suggesting that future work needs to defend against these attacks. Zhang et al. proposed an experimental framework for understanding deep learning models, claiming that understanding deep learning requires rethinking and generalization.

Marcus conducted a significant review in 2018 on the role, limitations, and essence of deep learning (DL). He strongly pointed out the limitations of DL methods, namely the need for more data, limited capacity, inability to handle hierarchies, lack of open-ended reasoning, insufficient transparency, inability to integrate prior knowledge, and inability to distinguish causality. He also mentioned that DL assumes a stable world, achieved through approximate methods, is difficult to engineer, and carries potential risks of overhype. Marcus believes that DL needs to be reconceptualized and seeks possibilities in unsupervised learning, symbolic operations, and hybrid models, gaining insights from cognitive science and psychology, and embracing bolder challenges.

11. Conclusion

Despite the rapid advancement of deep learning (DL) in the world today, there are still many aspects worth exploring. We still do not fully understand deep learning, how to make machines smarter, closer to or smarter than humans, or how to learn like humans. DL has been solving many problems while applying technology to all aspects. However, humanity still faces many challenges, such as people still dying from hunger and food crises, cancer, and other deadly diseases. We hope that deep learning and artificial intelligence will be more committed to improving the quality of human life through the most challenging scientific research. Last but not least, may our world become a better place.

A Comprehensive Overview of Deep Learning for Beginners

Leave a Comment