Introduction to Neural Machine Translation and Seq2Seq Models

Selected from arXiv

Author: Graham Neubig

Translation by Machine Heart

Contributors: Li Zenan, Jiang Siyuan

This article is a detailed tutorial on machine translation, suitable for readers with a background in computer science. According to Paper Weekly (ID: paperweekly), this paper comes from CMU LTI and covers various foundational knowledge of the Seq2Seq method, including N-gram Language Model, Log Linear Language Model, NNLM, RNNLM, encoder-decoder, and attention. It is a high-quality tutorial suitable for beginners. Readers can click 「Read the Original」 to download the paper.

Abstract

This tutorial introduces a set of powerful techniques: Neural Machine Translation and Neural Seq2Seq Models. These techniques have been applied to many tasks related to human language and have become a strong tool for modeling all sequential data. This tutorial assumes that readers have basic mathematical and programming knowledge but do not require a background in neural networks or natural language processing. This article attempts to explain the ideas behind various methods and then reproduce them with complete mathematical analysis, allowing readers to gain a deep understanding of these techniques. Additionally, this article includes some implementation suggestions, allowing readers to test their understanding of the content through exercises.

Background

Machine translation is the technology that uses computers to translate different human languages. Imagine a real-time translator from a sci-fi movie that can convert one language into another instantly. Currently, websites like Google Translate have made significant progress in this direction. Machine translation can eliminate language barriers and has broad application prospects, so it became a focus for researchers shortly after computers were invented.

We refer to the language input into the machine translation system as the source language, and the output language as the target language. Thus, machine translation can be described as a task that converts a sequence of words in the source language into a sequence of words in the target language. The goal of machine translation researchers is to eventually achieve an efficient model that can perform this conversion quickly across various languages.

The Seq2Seq model is a broader type of model that includes all models mapping one sequence to another. It encompasses machine translation, as well as many methods used to handle other tasks. In fact, we can view every computer program as inputting a sequence of bits and outputting a sequence of bits after processing, which means all programs are Seq2Seq models representing some behavior (although in many cases, this is not the most natural or intuitive expression).

An example task of a Seq2Seq model

Machine translation, as a representative of Seq2Seq models, has the following characteristics:

1. Machine translation is the most recognized instance of Seq2Seq models, allowing us to use many intuitive examples to illustrate the difficulties in handling such problems.

2. Machine translation is often one of the main tasks encountered during the development phase of new models, which are frequently first applied to machine translation before being used for other tasks.

3. Of course, there are also cases where machine translation draws inspiration from other tasks, and insights gained from other tasks contribute to the development of machine translation technology.

Structure of the Introduction

This tutorial begins with Chapter 2, introducing the general mathematical definitions and methods needed for machine translation. In the following chapters, the course will explain along the direction of increasing technical complexity, all the way to the current state-of-the-art attention models.

Chapters 3 to 6 focus on language models, which calculate the probability of interest target sequences. Although these models cannot perform translation or transduction, they are quite helpful for a preliminary understanding of the Seq2Seq model.

Chapter 3 emphasizes the n-gram language model, which is a simple method for calculating the probability of words based on their frequency in the dataset. This chapter also discusses how to use measures like perplexity to evaluate the performance of these models.
Chapter 4 covers log-linear language models, which calculate the probability of the next word based on contextual features. This chapter also describes how to learn the model parameters through stochastic gradient descent, which increases the likelihood of the observed data by solving partial derivatives and iteratively updating parameters.
Chapter 5 introduces the basic concepts of neural networks, which combine multiple information blocks more easily than log-linear language models, thereby further improving the accuracy of language models. This chapter provides an example of a feedforward neural network language model, which mainly predicts the probability of the next word based on the previous word.
Chapter 6 discusses recurrent neural networks, which allow for the recording of information over multiple time steps. This feature gives rise to recurrent neural network language models, which can capture long-term dependencies in language or sequence data modeling.

Finally, Chapters 7 and 8 describe actual Seq2Seq models capable of performing machine translation or other tasks.

Chapter 7 describes the encoder-decoder model, which uses recurrent neural networks to encode the target sequence into a numerical vector, while another network decodes these numbers and converts the numerical vector into a statement output. This chapter also explains the search algorithms for generating output sequences based on this model.
Chapter 8 explains the attention mechanism, which allows the model to focus on different parts of the input sentence when generating translations. This leads to more effective and intuitive sentence representations, often outperforming the relatively simple encoder-decoder mechanisms.

© This article is translated by Machine Heart, Please contact this public account for authorization to reproduce.

✄————————————————

Join Machine Heart (Full-time Reporter/Intern): [email protected]

Submissions or Reporting Requests: [email protected]

Advertising & Business Cooperation: [email protected]

Leave a Comment Cancel reply