Applications of Generative Adversarial Networks in Speech Processing

Recommended by New Intelligence

Source:Special Knowledge (LiteProgrammer)

【New Intelligence Introduction】InterSpeech is the top conference in the field of speech processing, held from September 15 to 20 in Graz, Austria. Professor Li Hongyi from National Taiwan University presented a report titled “Generative Adversarial Network and its Application to Speech Processing and Natural Language Processing”. This article summarizes the main content of the report and shares the presentation PPT.

Generative Adversarial Networks (GANs) are a new idea for training models, where the generator and discriminator compete against each other to improve the quality of generation. Recently, GANs have achieved astonishing results in image generation, sparking a plethora of new ideas, techniques, and applications. Although there are only a few successful cases, GANs have great potential in the fields of text and speech to overcome the limitations of traditional methods.

Applications of Generative Adversarial Networks in Speech Processing

Content Overview

This tutorial is divided into three parts. In the first part, we will introduce Generative Adversarial Networks (GANs) and provide a comprehensive introduction to this technology. In the second part, we will focus on the applications of GANs in speech signal processing, including speech enhancement, voice conversion, speech synthesis, and the application of domain adversarial training in speaker recognition and lip reading. In the third part, we will describe the main challenges of generating sentences with GANs and review a series of methods to tackle these challenges. Additionally, we will propose algorithms using GANs for text style transfer, machine translation, and abstractive summarization without paired data.

Speaker Introduction

Professor Li Hongyi obtained his master’s and doctoral degrees from National Taiwan University in 2010 and 2012, respectively. From September 2012 to August 2013, he was a postdoctoral researcher at the Center for Information Technology Innovation, Chinese Academy of Sciences. From September 2013 to July 2014, he was a visiting scientist in the Language Systems Group at the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). He is currently an assistant professor in the Department of Electrical Engineering at National Taiwan University and also serves in the Department of Computer Science and Information Engineering at the same university. His research focuses on machine learning (especially deep learning), spoken language understanding, and speech recognition.

Associate Researcher Cao Yu obtained his bachelor’s and master’s degrees in electronic engineering from National Taiwan University in 1999 and 2001, respectively. He received his Ph.D. in electrical and computer engineering from Georgia Institute of Technology in 2008. From 2009 to 2011, Dr. Cao was a researcher at the National Institute of Information and Communications Technology (NICT) in Japan, working on automatic speech research and product development, including multilingual speech-to-speech translation. He is currently an associate researcher at the Center for Information Technology Innovation (CITI) at Academia Sinica in Taipei, Taiwan. He received the Career Development Award from Academia Sinica in 2017. Dr. Cao’s research interests include speech and speaker recognition, acoustic and language modeling, audio coding, and biomedical signal processing.

Table of Contents

Basic Ideas of GAN and Some Fundamental Theoretical Knowledge

– Three Categories of GANs

– Basic Theory of GANs

– Some Useful Techniques

– How to Evaluate GANs

– Relationship with Reinforcement Learning

Applications of GANs in Speech

– Speech Signal Generation

– Speech Signal Recognition

– Conclusion

Applications of GANs in Natural Language Processing

– GAN Sequence Generation

– Unsupervised Conditional Sequence Generation

Original Link:

https://interspeech2019.org/program/tutorials/

Attached PDF Preview:

Three Categories of GANs

Basic Ideas of GAN

GAN has made great progress since its development in 2014

Conditional GAN

In Conditional GAN, applications include generating images from images, sounds from images, and images from labels.

There are two methods for unsupervised conditional GAN generation:

Cycle-GAN
Sharing a latent space

Click the original link, and reply “GANSP” in the Special Knowledge WeChat public account to get the download link for the complete version of “Applications of Generative Adversarial Networks in Speech Processing and Natural Language Processing”~

Recommended by New Intelligence

Leave a Comment Cancel reply