Notes on Papers in Natural Language Processing

This article is authorized to be reprinted from the WeChat public account Paper Weekly (ID: paperweekly). Paperweekly shares interesting papers in the field of natural language processing every week.

Introduction

Dialogue systems are currently a research hotspot and a focus for venture capital. Since the beginning of 2016, countless companies have been established that create chatbots, voice assistants, and similar products. Whether for users or enterprises, the application of dialogue systems has reached a new height. Seq2seq is a popular algorithm framework that automatically generates a good output given an input, which sounds wonderful. There has been much research on seq2seq in dialogue systems, and this issue of PaperWeekly shares notes on 4 papers that discuss how to improve the fluency and diversity of generated dialogues, bringing dialogue systems closer to human conversation. The 4 papers are as follows:

1. Sequence to Backward and Forward Sequences: A Content-Introducing Approach to Generative Short-Text Conversation, 2016 2. A Simple, Fast Diverse Decoding Algorithm for Neural Generation, 2016 3. DIVERSE BEAM SEARCH: DECODING DIVERSE SOLUTIONS FROM NEURAL SEQUENCE MODELS, 2016 4. A Diversity-Promoting Objective Function for Neural Conversation Models, 2015

1. Sequence to Backward and Forward Sequences: A Content-Introducing Approach to Generative Short-Text Conversation

Authors

Lili Mou, Yiping Song, Rui Yan, Ge Li, Lu Zhang, Zhi Jin

Affiliations

Key Laboratory of High Confidence Software Technologies (Peking University), MoE, China Institute of Software, Peking University, China Institute of Network Computing and Information Systems, Peking University, China Institute of Computer Science and Technology, Peking University, China

Keywords

content-introducing approach, neural network-based, generative dialogue systems, seq2BF

Source

arXiv, 2016

Problem

Using a content-introducing method to handle neural network-based generative dialogue systems

Model

Notes on Papers in Natural Language Processing

The model consists of two parts: 1. use PMI to predict a keyword for the reply. The Pointwise Mutual Information (PMI) is used to predict, selecting the word with the highest PMI value as the keyword in the answer, which can appear at any position in the response. Notes on Papers in Natural Language Processing

2. generate a reply conditioned on the keyword as well as the query. The sequence to backward and forward sequences (seq2BF) model is used to generate a response that contains the keyword. The response is divided into two sequences based on the keyword: (1) backward sequence: all words to the left of the keyword are arranged in reverse order (2) forward sequence: all words to the right of the keyword are arranged in order.

The seq2BF model works as follows: (1) Use a seq2seq neural network to encode the question, decoding only the words to the left of the keyword and outputting each word in reverse order. (2) Use another seq2seq model to encode the question again, decoding the remaining words in the response in order based on the previously decoded reversed word sequence, outputting the final word sequence Notes on Papers in Natural Language Processing

Resources

Dataset: http://tieba.baidu.com

Related Work

1. Dialogue Systems (1) (Isbell et al., 2000; Wang et al., 2013) retrieval methods (2) (Ritter et al., 2011) phrase-based machine translation (3) (Sordoni et al., 2015; Shang et al., 2015) recurrent neural networks

2. Neural Networks for Sentence Generation

(1) (Sordoni et al., 2015) bag-of-words features (2) (Shang et al., 2015) seq2seq-like neural networks (3) (Yao et al., 2015; Serban et al., 2016a) design hierarchical neural networks (4) (Li et al., 2016a) mutual information training objective

Comments

The innovation of this paper lies in its different approach from the commonly used method of generating target words from the beginning to the end of the sentence. It introduces the pointwise mutual information method to predict the keyword in the response, using the seq2BF mechanism to ensure that the keyword can appear at any position in the target response while ensuring fluency, significantly improving the quality of dialogue systems compared to seq2seq generation methods.

2. A Simple, Fast Diverse Decoding Algorithm for Neural Generation

Authors

Jiwei Li, Will Monroe, and Dan Jurafsky

Affiliations

Stanford

Keywords

seq2seq, diversity, RL

Source

arXiv, 2016

Problem

Improving the beam search during the decoding of the seq2seq model by introducing a penalty factor that affects the ranking results and incorporating a reinforcement learning model to automatically learn the diversity rate, making the decoded results more diverse.

Model

Notes on Papers in Natural Language Processing

Compared to standard beam search, this model introduces a penalty factor, as shown in the formula below:

Notes on Papers in Natural Language Processing

Where $eta$ is called the diversity rate, k’ ranges from [1,k], and K is the beam size. In the reinforcement learning model, the strategy is:

Notes on Papers in Natural Language Processing

The reward is an evaluation metric, such as the BLEU score in machine translation.

Resources

1. Response generation experimental dataset: OpenSubtitles https://github.com/jiweil/mutual-information-for-neural-machine-translation (the code model can be slightly modified from the author’s other paper)

2. Machine translation dataset: WMT’14 http://www.statmt.org/wmt13/translation-task.html

Related Work

Notes on Papers in Natural Language Processing

Comments

The innovation of this model lies in the introduction of a penalty factor, which reorders the standard beam search algorithm during decoding, and incorporates a reinforcement learning model to automatically learn the diversity rate. The authors validated this on three experiments: machine translation, extractive summarization, and dialogue response generation. The experiments show varying performances across different tasks, but overall this method can decode sentences with greater diversity to some extent (the idea is clear, and only a slight modification to the traditional beam search is needed, as mentioned in the original text, that only one line needs to be changed in the Matlab code).

3. DIVERSE BEAM SEARCH: DECODING DIVERSE SOLUTIONS FROM NEURAL SEQUENCE MODELS

Authors

Ashwin K Vijayakumar, Michael Cogswell, Ramprasath R. Selvaraju, Qing Sun1 Stefan Lee, David Crandall & Dhruv Batra

Affiliations

Virginia Tech, Blacksburg, VA, USA Indiana University, Bloomington, IN, USA

Keywords

Beam Search; Diversity; Image Caption; Machine Translation; Visual Question Answer; Chatbot

Source

arXiv, 2016.10

Problem

How to improve the beam search decoding algorithm to generate richer results in seq2seq models?

Model

The classic beam search algorithm optimizes the objective function based on the maximum a posteriori probability, retaining only the B most optimal states at each time step. This greedy algorithm is typically used for decoding situations with a large number of optional states, such as generating dialogues, generating image descriptions, machine translation, etc. Every step has a selection set of states of the size of the vocabulary. The popularity of seq2seq models has made the research on this decoding algorithm a hot topic. Using classic beam search in generating dialogue often produces responses like “I don’t know,” which, while grammatically correct, yield poor practical application results. Therefore, research on diversity has become popular.

This paper proposes an improved beam search algorithm aimed at generating more diverse outputs.

Notes on Papers in Natural Language Processing

The main idea of the new algorithm is to group the beams in the classic algorithm and introduce a penalty mechanism to ensure that the similarity within each group is minimized, which guarantees greater differences among the generated outputs, thus satisfying the demand for diversity. Within each beam group, the classic algorithm is used for optimal search. The specific algorithm process is shown in the figure below:

Notes on Papers in Natural Language Processing

In the experiments, comparisons were made on three tasks: Image Captioning, Machine Translation, and Visual Question Answering, validating the effectiveness of this algorithm and conducting sensitivity analysis on several parameters to analyze the impact of the number of groups on diversity.

Resources

1. The algorithm’s torch implementation https://github.com/ashwinkalyan/dbs 2. Online demo of the algorithm dbs.cloudcv.org 3. Implementation of neuraltalk2 https://github.com/karpathy/neuraltalk2 4. Open-source implementation of machine translation dl4mt https://github.com/nyu-dl/dl4mt-tutorial

Related Work

The related work mainly falls into two categories: 1. Diverse M-Best Lists 2. Diverse Decoding for RNNs. Previously, Jiwei Li changed the objective function of the decoding algorithm to mutual information for optimized decoding and studied diversity.

Comments

This paper addresses a fundamental issue. The beam search algorithm is a classic approximate decoding algorithm with many applications. However, practical applications, especially in tasks such as dialogue generation and answer generation, face adaptability issues, such as diversity. Simply generating simple and safe responses is not very meaningful for practical applications, making this research very significant. The experiments provide solid results validating the effectiveness of the improved beam search across three different tasks, and sensitivity analysis of several key parameters is well-reasoned. The code is open-sourced on GitHub, and an online demo is provided. In terms of evaluation, not only several automatic evaluation metrics were designed, but also manual evaluation methods were used to validate the algorithm, making this a very good paper worth learning from.

4. A Diversity-Promoting Objective Function for Neural Conversation Models

Authors

Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, Bill Dolan

Affiliations

Stanford University, Stanford, CA, USA Microsoft Research, Redmond, WA, USA

Keywords

Sequence-to-sequence neural network models, conversational responses, Maximum Mutual Information (MMI)

Source

arXiv, 2015

Problem

Using MMI to train a sequence-to-sequence model for conversational responses generation. Traditional ML (Maximum Likelihood Estimation) often generates ‘safe’ responses that are unrelated to the input during the training of sequence-to-sequence models (the drawback of maximum likelihood estimation—always trying to cover all modes of input data). By using MMI, maximizing the mutual information between input and output can effectively avoid responses unrelated to the input, resulting in more diverse responses.

Model

MMI was first proposed and applied in speech recognition (discriminative training criteria). In speech recognition, an ML-trained acoustic model is typically followed by MMI and a language model to further tune the acoustic model.

In this paper, the authors propose the use of MMI for optimizing seq-to-seq models. They introduce two different formulations of MMI: MMI-antiLM and MMI-bidi. There are decoding issues with MMI’s application in seq-to-seq.

In MMI-antiLM, the authors generate more diverse responses by penalizing the first word using a weighted language model.

In MMI-bidi, the large number of search spaces makes exploring all possibilities impractical. The authors first generate an N-best list, then re-rank the N-best list according to the corresponding criterion function.

In the various formulations of MMI, the authors employ heuristic designs to facilitate decoding and produce more diverse responses, achieving better BLEU scores on relevant datasets and generating more diverse responses.

Comments

Maximum a posteriori probability is typically used as the optimization objective function, but the results obtained in many applications are not ideal. This paper adopts a new and commonly used objective function from other fields to replace the maximum a posteriori probability, yielding richer results in dialogue generation.

Conclusion

Dialogue systems are relatively advanced and complex tasks that rely on many foundational tasks, such as word segmentation, named entity recognition, syntactic parsing, and semantic role labeling. For standard Chinese expression, syntactic parsing remains an unresolved issue, let alone less standardized human speech, where the accuracy of syntactic parsing needs to reach another level, subsequently affecting semantic role labeling. Classic foundational tasks still have a long way to go, and the more difficult and complex task of dialogue systems is unlikely to be solved in a year or two. Although it is currently very popular and many people are working on it, given the current level of research, there is still a long way to go. Seq2seq is a good method and idea to escape these problems, but it is relatively immature and has many issues. Trying to cover all problems through large amounts of data is not a very scientific approach. I believe seq2seq is a good method, but traditional NLP methods are also essential, and the two should complement each other. The more attention given to dialogue systems, the faster this field will develop, and I hope to see reliable and mature solutions soon. Thanks to @Penny, @tonya, @zhangjun, and @皓天 for completing these paper notes.

[To view the link, please click Read the Original]

Read More

▽ Story

· Which of the 100 hottest papers of 2016 have you read?

· The “Molecule of the Year” for 2016 is announced today, with the “World’s Strongest Base” only ranking 4th

· The top journal published erroneous research harming tens of millions, and this group of “madmen” overturned it

· Those academic fraud tactics that have taken to the skies: there is nothing impossible to think of, nothing that cannot be deceived

▽ Paper Recommendations

· Hidden galaxy clusters are pulling on the Milky Way | MNRAS Paper Recommendation

· South China Sea Ocean Institute reveals the mystery of male seahorse breeding | Nature Cover Paper

· The lack of single women forces Vikings to become pirates | Evolution and Human Behavior Paper Recommendation

▽ Paper Guide

· Weekly paper guide from journals like Science | Biological (including ecology)

· Weekly paper guide from journals like Science | Physics, Chemistry, Materials, Astronomy, Earth Sciences, Social Sciences

For content cooperation, please contact

[email protected]

This is the WeChat account “Research Circle” of the Chinese edition of Scientific American, serving researchers. We:

· Focus on scientific progress and research ecology

· Recommend important cutting-edge research

· Publish research recruitment

· Push academic lectures and conference announcements.

Welcome long press the QR code to follow.

Notes on Papers in Natural Language Processing

Leave a Comment