Introduction to CNN and Code Implementation

Follow the public account “ML_NLP
Set as “Starred”, heavy content delivered first!

Introduction to CNN and Code Implementation

Author & Code: Harbin Institute of Technology SCIR Shi Jihao

1. Introduction

1.1 Article Organization

This article briefly introduces the basic principles of CNN and uses the sentence-level sentiment classification task as an example to introduce the use of CNN methods for feature extraction modeling. At the end of the article, we provide the implementation code of CNN under Pytorch for readers’ reference.

1.2 Sentiment Classification Task

The sentiment classification task in natural language processing is a task of classifying the emotional tendency of a given text, which can be roughly considered as a type of classification task. For sentiment classification tasks, the usual approach is to first represent words or phrases, and then combine the representations of words in a sentence into a representation of the sentence. Finally, the sentence representation is used to classify the sentiment of the sentence.

For example, a case of binary classification of sentiment.

Sentence: I love Sail

Sentiment Label: Positive

1.3 What is CNN

CNN stands for Convolutional Neural Network, which is a type of feedforward neural network. It consists of one or more convolutional layers, pooling layers, and a fully connected layer at the top, performing excellently in the field of image processing. This article mainly explains how CNN is used in natural language processing.

1.4 Why Use CNN

Convolutional Neural Networks are mainly used to extract local features of convolutional objects. When the convolutional object is natural language text, such as a sentence, its local features are specific keywords or key phrases. Therefore, using convolutional neural networks as feature extractors is equivalent to a bag-of-words model, indicating whether specific keywords or key phrases appear in a sentence. When applied to classification tasks, it extracts the most useful feature information for classification.

Compared to other models, Convolutional Neural Networks have fewer parameters. Another advantage is that CNN does not have sequence dependency issues and can be computed in parallel.

2. Brief Introduction to CNN Principles

Taking an early work that introduced CNN into NLP as an example (EMNLP 2014. Convolutional Neural Networks for Sentence Classification), we explain.

Introduction to CNN and Code Implementation

We divide the above figure into 4 parts, introducing them respectively:

The first part is vectorizing the sentence “wait for the video and don’t rent it”. Using word embedding methods to obtain the vector representation of each word xi, with a vector dimension of k. For a sentence of length n, its vector representation is as follows (i.e., n*k matrix):

Introduction to CNN and Code Implementation

Here, static channel and non-static channel refer to fixed word vectors (i.e., not adjusted during training) and updated word vectors (fine-tuned through backpropagation), respectively.

The second part, the convolution layer with multiple convolution kernels and feature mapping. The red and yellow dashed lines in the figure represent the features extracted by the convolution operation and the output obtained from mapping. The figure shows 4 convolution kernels. Below is the formula for the convolution operation:

Introduction to CNN and Code Implementation

For a sliding window containing h words, its mathematical notation is xi:i+h-1. w represents the convolution kernel, and by sliding h words each time, we obtain the feature ci. Each convolution kernel performs a sliding window of h words on the sentence, obtaining the corresponding feature mapping.

Introduction to CNN and Code Implementation

The third part, max pooling. This article adopts the max pooling method, while average pooling can also be used. For each feature mapping vector obtained from each convolution kernel, the maximum value is selected. The idea behind this is to extract the most important information from each feature mapping, and the pooling process can also handle variable sentence lengths.

Introduction to CNN and Code Implementation

The fourth part, the fully connected layer. After passing the features obtained from the pooling layer through the fully connected softmax layer, the probabilities for each category can be obtained.

3. Sample Code Implementation of CNN

Code Address:

https://github.com/Shijihao/CNN_for_classification

3.1 Model Construction

Introduction to CNN and Code Implementation

Introduction to CNN and Code Implementation

The example code uses two convolution kernels, with sliding windows of 2 words and 3 words, respectively. The data used in this example is MR data (Movie Review Data), which is used for sentiment analysis tasks, classifying the sentiment polarity (positive or negative) using movie review texts.

3.2 Model Training

Introduction to CNN and Code Implementation

3.3 Model Testing

Introduction to CNN and Code Implementation

Introduction to CNN and Code Implementation

Introduction to CNN and Code Implementation

4. Conclusion

This article introduces how the convolutional neural network model is combined with NLP tasks in early applications. With continuous development in recent years, CNN models have been constantly improved, and a series of enhancements have emerged, such as using dilated convolutions and increasing network depth to capture long-distance dependency features. Interested readers can refer to the relevant literature in the references.

5. References

[1] KIM Y. Convolutional Neural Networks for Sentence Classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2014: 1746–1751.

[2] YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions[J]. arXiv preprint arXiv:1511.07122, 2015.

[3] KALCHBRENNER N, ESPEHOLT L, SIMONYAN K et al. Neural Machine Translation in Linear Time[J]. CoRR, 2016, abs/1610.10099.

[4] GEHRING J, AULI M, GRANGIER D et al. Convolutional Sequence to Sequence Learning[C]//PRECUP D, TEH Y W. Proceedings of the 34th International Conference on Machine Learning. International Convention Centre, Sydney, Australia: PMLR, 2017, 70: 1243–1252.

Sail Notes | Introduction to BiLSTM and Code Implementation

This issue is edited by: Cui Yiming

This issue is edited by: Liu Yuanxing

Download 1: Four Piece Set
Reply “Four Piece Set” in the backend of the Machine Learning Algorithms and Natural Language Processing public account to get the learning materials for TensorFlow, Pytorch, machine learning, and deep learning!

Download 2: Repository Address Sharing
Reply “Code” in the backend of the Machine Learning Algorithms and Natural Language Processing public account to get 195 NAACL + 295 ACL 2019 papers with open source code. The open source address is as follows: https://github.com/yizhen20133868/NLP-Conferences-Code

Heavy! The Machine Learning Algorithms and Natural Language Processing exchange group has been officially established! There are a lot of resources in the group, and everyone is welcome to join the group for learning!

Extra welfare resources! Qiu Xipeng's deep learning and neural networks, official PyTorch Chinese tutorial, data analysis using Python, machine learning notes, official pandas documentation in Chinese, effective java (Chinese version), and other 20 welfare resources

How to get: After entering the group, click on the group announcement to receive the download link. Note: Please modify the remarks when adding as [School/Company + Name + Direction] For example - HIT + Zhang San + Dialogue System. The account owner, please take a detour. Thank you!

Recommended Reading:
12 Golden Rules for Solving NER Problems in Industry
Three Steps to Master the Core of Machine Learning: Matrix Derivation
Distillation Techniques in Neural Networks, Starting from Softmax

Leave a Comment