An Overview of NLP from Linguistics to Deep Learning

Selected from arXiv

Compiled by Machine Heart

Contributors: Li Yazhou, Jiang Siyuan

This article starts with two papers to briefly introduce the basic classifications and concepts of Natural Language Processing (NLP), and then showcases NLP in deep learning to the readers. Both papers are excellent introductory reviews, and readers who wish to delve deeper into NLP can further read these two papers.

The first part of this article introduces the basic concepts of NLP, categorizing it into Natural Language Understanding and Natural Language Generation, and explains the various levels and applications of the NLP process. This paper is suitable for readers to systematically understand the basic concepts of NLP.

The second part describes NLP based on deep learning. The paper first discusses word representation in deep learning, from one-hot encoding, bag-of-words model to word embeddings and word2vec. We first need numerical representations of vocabulary to further perform NLP. Subsequently, the paper introduces various models applied to NLP, including Convolutional Neural Networks, Recurrent Neural Networks, Long Short-Term Memory, and Gated Recurrent Units. These models, combined with techniques such as attention mechanisms, can achieve powerful capabilities such as machine translation, question-answering systems, and sentiment analysis.

Conceptual Foundations

Paper link: https://arxiv.org/abs/1708.05148

Natural Language Processing (NLP) has gained increasing attention recently due to the computational representation and analysis of human language. It has been applied in many fields such as machine translation, spam detection, information extraction, automatic summarization, healthcare, and question-answering systems. This paper discusses different levels of NLP and Natural Language Generation (NLG) from a historical and developmental perspective, presenting various cutting-edge technologies and current trends and challenges in NLP applications.

1 Introduction

Natural Language Processing (NLP) is a part of artificial intelligence and linguistics, dedicated to using computers to understand sentences or words in human language. NLP aims to reduce user workload and meet the desire for human-computer interaction using natural language. Since users may not be familiar with machine language, NLP can help such users communicate with machines using natural language.

Language can be defined as a set of rules or symbols. We combine symbols to convey or broadcast information. NLP can essentially be divided into two parts: Natural Language Understanding and Natural Language Generation, which evolve into tasks of understanding and generating text (Figure 1).

Figure 1: Rough Classification of NLP

Linguistics is the science of language, encompassing phonology (the study of sounds), morphology (the study of word formation), syntax (the study of sentence structure), semantics (the study of meaning), and pragmatics (the study of context).

NLP research tasks include automatic summarization, co-reference resolution, discourse analysis, machine translation, morphological segmentation, named entity recognition, optical character recognition, and part-of-speech tagging. Automatic summarization generates a summary of detailed information from a set of texts in a specific format. Co-reference resolution determines which words in a sentence or larger text refer to the same entity. Discourse analysis identifies the discourse structure connecting texts, while machine translation refers to automatic translation between two or more languages. Morphological segmentation involves breaking down vocabulary into morphemes and identifying their categories. Named Entity Recognition (NER) identifies which nouns in a string of text refer to proper nouns. Optical Character Recognition (OCR) extracts textual information from printed documents (like PDFs). Part-of-speech tagging describes the grammatical category of each word in a sentence. Although these NLP tasks may seem different from one another, they are often processed in conjunction.

2 Levels of NLP

The hierarchical structure of language is the most explanatory method of expressing NLP, helping to generate text through content planning, sentence planning, and surface realization (Figure 2).

Figure 2: Stages of NLP Architecture

Linguistics is a discipline that involves language, context, and various forms of language. Important terms related to NLP include:

Phonology
Morphology
Lexicology
Syntax
Semantics
Discourse Analysis
Pragmatics

3 Natural Language Generation

NLG is the process of generating meaningful phrases, sentences, and paragraphs from internal representations. It is part of NLP and includes four stages: determining goals, planning how to achieve goals through scene evaluation, available dialogue sources, and realizing plans as text, as shown in Figure 3. Generation and understanding are opposing processes.

Figure 3: Components of NLG

6 Applications of NLP

NLP can be applied in various fields, such as machine translation, spam detection, information extraction, etc. In this section, the paper introduces the following applications of NLP:

Machine Translation
Text Classification
Spam Filtering
Information Extraction
Automatic Summarization
Dialogue Systems
Healthcare

NLP in Deep Learning

The above content provides a basic introduction to NLP but overlooks the recent applications of deep learning in the field of NLP. Thus, we supplement with a paper from Beijing Institute of Technology. This paper reviews important models and methods of deep learning in NLP, such as Convolutional Neural Networks, Recurrent Neural Networks, and Recursive Neural Networks. It also discusses memory-augmented strategies, attention mechanisms, as well as unsupervised models, reinforcement learning models, and deep generative models in language-related tasks. Finally, it discusses various frameworks of deep learning to comprehensively overview the current developments in NLP from the perspective of deep learning.

Today, deep learning architectures and algorithms have made remarkable progress in computer vision and pattern recognition fields. Against this backdrop, recent research on NLP based on new deep learning methods has seen significant growth.

Figure 4: Growth trend of deep learning papers presented at ACL, EMNLP, EACL, and NAACL conferences from 2012 to 2017.

For over a decade, machine learning methods for solving NLP problems have been based on shallow models, such as SVM and logistic regression, trained on very high-dimensional, sparse features. In recent years, neural networks based on dense vector representations have produced excellent results across various NLP tasks. This trend has arisen from the success of word embeddings and deep learning methods. Deep learning has made it possible to learn multi-level automatic feature representations. Traditional machine learning-based NLP systems heavily rely on handcrafted features, which is time-consuming and often incomplete.

In 2011, Collobert et al.’s paper demonstrated that a simple deep learning framework could outperform the state-of-the-art methods in various NLP tasks, such as Named Entity Recognition (NER), Semantic Role Labeling (SRL), and Part-of-Speech (POS) tagging tasks. Since then, various complex algorithms based on deep learning have been proposed to tackle NLP challenges.

This paper reviews important models and methods related to deep learning, such as Convolutional Neural Networks, Recurrent Neural Networks, and Recursive Neural Networks. Additionally, the paper discusses memory-augmented strategies, attention mechanisms, and the applications of unsupervised models, reinforcement learning models, and deep generative models in language-related tasks.

In 2016, Goldberg also introduced deep learning in the field of NLP in a tutorial format, primarily providing a technical overview of distributed semantics (word2vec, CNN) but not discussing various architectures of deep learning. This paper provides a more comprehensive perspective.

Abstract: Deep learning methods utilize multiple processing layers to learn hierarchical representations of data, achieving top results in many fields. Recently, there has been an abundance of model designs and methods emerging in the field of Natural Language Processing. In this paper, we review important models and methods related to deep learning applied to NLP tasks, while providing an overview of this progress. We also summarize and compare various models, offering a detailed understanding of the past, present, and future of deep learning in NLP.

Paper link: https://arxiv.org/abs/1708.02709

Figure 2: Distributed vector representation of a D-dimensional vector, where D << V, and V is the vocabulary size.

Figure 3: Neural language model proposed by Bengio et al. in 2003, where C(i) is the i-th word embedding.

Figure 4: CBOW (continuous bag-of-words) model

Table 1: Frameworks providing embedding tools and methods

Figure 5: CNN framework used by Collobert et al. for word-level classification prediction

Figure 6: CNN modeling on text (Zhang and Wallace, 2015)

Figure 7: Top 7-grams for 4 kernels of 7-gram, each kernel sensitive to a specific type of 7-gram (Kim, 2014)

Figure 8: DCNN subgraph. With dynamic pooling, a top layer only needs a narrow filter layer to relate phrases that are far apart in the input statement (Kalchbrenner et al., 2014).

Figure 9: Simple RNN Network

Figure 10: Illustration of LSTM and GRU (Chung et al., 2014)

Figure 11: Training and validation learning curves for different unit types over iterations (top figure) and wall-clock time (bottom figure). The y-axis describes the model’s negative log likelihood on a logarithmic scale.

Figure 12: LSTM decoder combined with CNN image embedder to generate image descriptions (Vinyals et al., 2015a)

Figure 13: Neural Image QA (Malinowski et al., 2015)

Figure 14: Word Calibration Matrix (Bahdanau et al., 2014)

Figure 15: Attention-based Region Ranking (Wang et al., 2016)

Figure 16: Focus points of attention modules on specific area statements (Wang et al., 2016)

Figure 17: Recursive Neural Network applied to sentences containing “but” (Socher et al., 2013)

Figure 18: Sentence generation using RNN-based AVE (Bowman et al., 2015)

This article is compiled by Machine Heart. For reprints, please contact this public account for authorization..

✄————————————————

Join Machine Heart (Full-time Reporter/Intern): [email protected]

Submissions or Seeking Coverage: [email protected]

Advertising & Business Cooperation: [email protected]

Conceptual Foundations

NLP in Deep Learning

Leave a Comment Cancel reply