This article is reprinted with permission from the WeChat official account Paper Weekly (ID: paperweekly). Paperweekly shares interesting papers in the field of natural language processing every week.

Introduction

Dialogue systems are currently a hot research topic and a hotspot for venture capital. Since early 2016, countless companies have been established to create chatbots, voice assistants, and similar products. Both for users and enterprises, the application of dialogue systems has reached a new height. Seq2seq is a popular algorithm framework that automatically provides a good output given an input, which sounds like a wonderful thing. There is a lot of research on seq2seq in dialogue systems. This issue of PaperWeekly shares notes on four papers that involve how to improve the fluency and diversity of generated dialogues, making dialogue systems closer to human conversation. The four papers are as follows:

1. Sequence to Backward and Forward Sequences: A Content-Introducing Approach to Generative Short-Text Conversation, 20162. A Simple, Fast Diverse Decoding Algorithm for Neural Generation, 20163. DIVERSE BEAM SEARCH: DECODING DIVERSE SOLUTIONS FROM NEURAL SEQUENCE MODELS, 20164. A Diversity-Promoting Objective Function for Neural Conversation Models, 2015

1. Sequence to Backward and Forward Sequences: A Content-Introducing Approach to Generative Short-Text Conversation

Authors

Lili Mou, Yiping Song, Rui Yan, Ge Li, Lu Zhang, Zhi Jin

Affiliations

Key Laboratory of High Confidence Software Technologies (Peking University), MoE, China Institute of Software, Peking University, China Institute of Network Computing and Information Systems, Peking University, China Institute of Computer Science and Technology, Peking University, China

Keywords

content-introducing approach, neural network-based, generative dialogue systems, seq2BF

Source

arXiv, 2016

Problem

Using a content-introducing method to handle neural network-based generative dialogue systems.

Model

The model consists of two parts: 1. Use PMI to predict a keyword for the reply. The Pointwise Mutual Information (PMI) is used to predict and select the word with the highest PMI value as the keyword in the reply, which can appear in any position of the response statement.

2. Generate a reply conditioned on the keyword as well as the query. Use the sequence to backward and forward sequences (seq2BF) model to generate a response that includes the keyword. The response statement is divided into two sequences based on this keyword: (1) Backward sequence: all words to the left of the keyword are arranged in reverse order. (2) Forward sequence: all words to the right of the keyword are arranged in order.

The seq2BF model works as follows: (1) Use the seq2seq neural network to encode the question and decode only the words to the left of the keyword, outputting each word in reverse order. (2) Use another seq2seq model to re-encode the question and, given the decoded reverse word sequence from the previous step, sequentially decode the remaining words in the response, outputting the final word sequence.

Resources

Dataset: http://tieba.baidu.com

Related Work

1. Dialogue Systems (1) (Isbell et al., 2000; Wang et al., 2013) retrieval methods (2) (Ritter et al., 2011) phrase-based machine translation (3) (Sordoni et al., 2015; Shang et al., 2015) recurrent neural networks

2. Neural Networks for Sentence Generation

(1) (Sordoni et al., 2015) bag-of-words features (2) (Shang et al., 2015) seq2seq-like neural networks (3) (Yao et al., 2015; Serban et al., 2016a) design hierarchical neural networks (4) (Li et al., 2016a) mutual information training objective

Comment

The innovation of this paper lies in the introduction of the pointwise mutual information method to predict keywords in response statements, ensuring that the keyword can appear in any position within the target response statement and ensuring the fluency of the output. This significantly improves the quality of dialogue systems compared to the seq2seq generation method.

2. A Simple, Fast Diverse Decoding Algorithm for Neural Generation

Authors

Jiwei Li, Will Monroe, and Dan Jurafsky

Affiliations

Stanford

Keywords

seq2seq, diversity, RL

Source

arXiv, 2016

Problem

Improving the beam search in the seq2seq model decoder by introducing a penalty factor that affects the ranking results, and incorporating a reinforcement learning model to automatically learn the diversity rate, making the decoded results more diverse.

Model

Compared to standard beam search, this model introduces a penalty factor, as shown in the formula below.

Where $eta$ is called the diversity rate, k’ ranges from [1,k], and K is the beam size in the reinforcement learning model.

The strategy is as follows:

Reward is an evaluation metric, such as the BLEU score in machine translation.

Resources

1. Response generation experimental dataset: OpenSubtitles https://github.com/jiweil/mutual-information-for-neural-machine-translation (the code model can be slightly modified from another paper by the author) 2. Machine translation dataset: WMT’14 http://www.statmt.org/wmt13/translation-task.html

Related Work

Comment

The innovation of this model lies in introducing a penalty factor, which reorders the standard beam search algorithm during decoding and incorporates a reinforcement learning model to automatically learn the diversity rate. The authors validated this in three experiments: machine translation, extractive summarization, and dialogue response generation. The experiments show different performances across various tasks, but overall, this method can decode sentences that are more diverse to a certain extent. (The idea is clear and straightforward; the authors mention that only one line needed to be changed in the Matlab code.)

3. DIVERSE BEAM SEARCH: DECODING DIVERSE SOLUTIONS FROM NEURAL SEQUENCE MODELS

Authors

Ashwin K Vijayakumar, Michael Cogswell, Ramprasath R. Selvaraju, Qing Sun, Stefan Lee, David Crandall & Dhruv Batra

Affiliations

Virginia Tech, Blacksburg, VA, USA Indiana University, Bloomington, IN, USA

Keywords

Beam Search; Diversity; Image Caption; Machine Translation; Visual Question Answer; Chatbot

Source

arXiv, 2016.10

Problem

How to improve the beam search decoding algorithm to generate richer results in the seq2seq model?

Model

The classic beam search algorithm optimizes the objective function based on the maximum posterior probability, retaining only the top B optimal states at each time step. It is a typical greedy algorithm commonly used to decode situations with a large number of optional states, such as generating dialogues, generating image descriptions, and machine translation, where there is a set of optional states of vocabulary size at each step. The popularity of the seq2seq model has made research on this decoding algorithm hot. When generating dialogue tasks, using the classic beam search can produce uninformative dialogues like “I don’t know,” which, while grammatically correct, often scores well in certain evaluation systems, but performs poorly in practical applications. Therefore, research on diversity has become popular.

This paper proposes an improved beam search algorithm targeting the diversity problem, aiming to generate more diverse dialogues.

The main idea of the new algorithm is to group beams in the classic algorithm and introduce a penalty mechanism to minimize similarity within each group, ensuring that the generated dialogues are more distinct, thus meeting the diversity requirement. Within each group of beams, the classic algorithm is used for optimal search. The specific algorithm flow is shown in the figure below:

In the experiments, three tasks were used: Image Captioning, Machine Translation, and VQA, to validate the effectiveness of this algorithm, and sensitivity analysis was performed on several parameters to analyze the impact of the number of groups on diversity.

Resources

1. The algorithm implementation in Torch: https://github.com/ashwinkalyan/dbs 2. Online demo of this paper: dbs.cloudcv.org 3. Implementation of neuraltalk2: https://github.com/karpathy/neuraltalk2 4. Open-source implementation of machine translation: dl4mt https://github.com/nyu-dl/dl4mt-tutorial

Related Work

The related work is mainly classified into two categories: 1. Diverse M-Best Lists 2. Diverse Decoding for RNNs. Previously, Jiwei Li changed the objective function of the decoding algorithm to mutual information for optimizing decoding and researched diversity.

Comment

This paper studies a fundamental problem; the beam search algorithm, as a classic approximate decoding algorithm, applies to many scenarios. However, in practical applications, especially concerning tasks like dialogue generation and answer generation, there are some adaptability issues, such as diversity. Simply generating simple and safe dialogues is not very meaningful for practical applications, making this research very significant. The experiments provide solid results validating the algorithm’s effectiveness across three different tasks and also perform sensitivity analysis on several key parameters, making it well-founded. The code has been open-sourced on GitHub, and an online demo is provided. In terms of evaluation, not only are several automatic evaluation metrics designed, but human evaluation methods are also used to validate the algorithm, making this a very good paper worth studying.

4. A Diversity-Promoting Objective Function for Neural Conversation Models

Authors

Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, Bill Dolan

Affiliations

Stanford University, Stanford, CA, USA Microsoft Research, Redmond, WA, USA

Keywords

Sequence-to-sequence neural network models, conversational responses, Maximum Mutual Information (MMI)

Source

arXiv, 2015

Problem

Using MMI to train a sequence-to-sequence model for generating conversational responses. Traditional ML (maximum likelihood estimation) tends to produce ‘safe’ responses that are unrelated to the input (the downside of maximum likelihood estimation—always trying to cover all modes of input data). By using MMI, maximizing the mutual information between input and output can effectively avoid generating responses unrelated to the input, yielding more diverse responses.

Model

MMI was first proposed and applied in speech recognition (discriminative training criteria). In speech recognition, an acoustic model is typically first trained using ML, followed by MMI and language model adjustments for further tuning of the acoustic model.

In this paper, the authors propose MMI for optimizing seq-to-seq models. They present two different formulations of MMI: MMI-antiLM and MMI-bidi. MMI faces decoding challenges in seq-to-seq applications.

In MMI-antiLM, the authors generate more diverse responses by using a weighted language model that penalizes the first word.

In MMI-bidi, the search space is too large, making it impractical to explore all possibilities. The authors first generate an N-best list, then re-rank the obtained N-best list based on appropriate criteria.

In the different formulations of MMI, the authors use heuristic designs to make decoding easier and produce more diverse responses, achieving better BLEU scores on relevant datasets while generating more diverse responses.

Comment

Maximum posterior probability is usually used as the optimization objective function, but the results obtained in many application scenarios are not ideal. This paper adopts a new and also commonly used objective function from other fields to replace maximum posterior probability, achieving richer results in dialogue generation.

Conclusion

Dialogue systems are a relatively advanced and highly integrated task, relying on many foundational tasks such as word segmentation, named entity recognition, syntactic parsing, and semantic role labeling. For standardized Chinese expressions, syntactic parsing remains an unresolved issue, let alone for less standardized human speech, where the accuracy of syntactic parsing needs to reach another level, resulting in poor performance in semantic role labeling as well. Classic foundational tasks have a long way to go; the more challenging and complex task of dialogue systems is unlikely to be solved in just a year or two. Although it is currently popular and many people are working on it, based on the current level of research, there is still a long way to go. Seq2seq is a good method and idea to avoid these issues, but it is relatively immature and has many problems. Attempting to cover all issues through large amounts of data is an unscientific approach. I believe that seq2seq is a good method, but traditional NLP methods are also essential, and both should complement each other. The more people focus on dialogue systems, the faster this field will develop. I hope to see reliable and mature solutions soon. Thanks to @Penny, @tonya, @zhangjun, and @皓天 for completing the paper notes.

[For the link, please click Read the Original]

Read More

▽ Stories

· Which of the 100 hottest papers in 2016 have you read?

· The “Molecule of the Year” for 2016 is announced today, with the world’s strongest base only ranking 4th.

· Top journals published erroneous research that endangers millions, and this group of “madmen” overturned it.

· The academic fraud tactics that have gone to the sky: there is nothing that cannot be thought of, and nothing that cannot be deceived.

▽ Paper Recommendations

· The hidden galaxy clusters are pulling the Milky Way | MNRAS Paper Recommendation

· The South China Sea Institute of Oceanology reveals the mystery of male seahorse breeding | Nature Cover Paper

· The lack of single women forces Vikings to become pirates | Evolution and Human Behavior Paper Recommendation

▽ Paper Guide

· Science and other journals weekly paper guide | Biological (including ecology)

· Science and other journals weekly paper guide | Physics, Chemistry, Materials, Astronomy, Earth Sciences, Social Sciences

For content collaboration, please contact

[email protected]

This is the WeChat account of the Chinese version of “Scientific American” and “Global Science” serving researchers. We:

· Focus on scientific progress and research ecology

· Recommend important frontier research

· Publish research recruitment

· Push academic lectures and conference announcements.

WelcomeLong press the QR code to follow.

This article is reprinted with permission from the WeChat official account Paper Weekly (ID: paperweekly). Paperweekly shares interesting papers in the field of natural language processing every week.

Introduction

1. Sequence to Backward and Forward Sequences: A Content-Introducing Approach to Generative Short-Text Conversation

Affiliations

Keywords

Source

Problem

Model

Resources

Related Work

2. A Simple, Fast Diverse Decoding Algorithm for Neural Generation

Authors

Affiliations

Keywords

Source

Problem

Model

Resources

Related Work

Comment

3. DIVERSE BEAM SEARCH: DECODING DIVERSE SOLUTIONS FROM NEURAL SEQUENCE MODELS

Authors

Affiliations

Keywords

Source

Problem

Model

Resources

Related Work

Comment

4. A Diversity-Promoting Objective Function for Neural Conversation Models

Authors

Affiliations

Keywords

Source

Problem

Model

Comment

Conclusion

Leave a Comment Cancel reply