Applications of Generative Adversarial Networks in Speech Processing

Applications of Generative Adversarial Networks in Speech Processing
Recommended by New Intelligence

Source:Special Knowledge (LiteProgrammer)
【New Intelligence Introduction】InterSpeech is the top conference in the field of speech processing, held from September 15 to 20 in Graz, Austria. Professor Li Hongyi from National Taiwan University presented a report titled “Generative Adversarial Network and its Application to Speech Processing and Natural Language Processing”. This article summarizes the main content of the report and shares the presentation PPT.
Generative Adversarial Networks (GANs) are a new idea for training models, where the generator and discriminator compete against each other to improve the quality of generation. Recently, GANs have achieved astonishing results in image generation, sparking a plethora of new ideas, techniques, and applications. Although there are only a few successful cases, GANs have great potential in the fields of text and speech to overcome the limitations of traditional methods.
Applications of Generative Adversarial Networks in Speech Processing
Content Overview
This tutorial is divided into three parts. In the first part, we will introduce Generative Adversarial Networks (GANs) and provide a comprehensive introduction to this technology. In the second part, we will focus on the applications of GANs in speech signal processing, including speech enhancement, voice conversion, speech synthesis, and the application of domain adversarial training in speaker recognition and lip reading. In the third part, we will describe the main challenges of generating sentences with GANs and review a series of methods to tackle these challenges. Additionally, we will propose algorithms using GANs for text style transfer, machine translation, and abstractive summarization without paired data.
Speaker Introduction
Applications of Generative Adversarial Networks in Speech Processing
Professor Li Hongyi obtained his master’s and doctoral degrees from National Taiwan University in 2010 and 2012, respectively. From September 2012 to August 2013, he was a postdoctoral researcher at the Center for Information Technology Innovation, Chinese Academy of Sciences. From September 2013 to July 2014, he was a visiting scientist in the Language Systems Group at the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). He is currently an assistant professor in the Department of Electrical Engineering at National Taiwan University and also serves in the Department of Computer Science and Information Engineering at the same university. His research focuses on machine learning (especially deep learning), spoken language understanding, and speech recognition.
Applications of Generative Adversarial Networks in Speech Processing
Associate Researcher Cao Yu obtained his bachelor’s and master’s degrees in electronic engineering from National Taiwan University in 1999 and 2001, respectively. He received his Ph.D. in electrical and computer engineering from Georgia Institute of Technology in 2008. From 2009 to 2011, Dr. Cao was a researcher at the National Institute of Information and Communications Technology (NICT) in Japan, working on automatic speech research and product development, including multilingual speech-to-speech translation. He is currently an associate researcher at the Center for Information Technology Innovation (CITI) at Academia Sinica in Taipei, Taiwan. He received the Career Development Award from Academia Sinica in 2017. Dr. Cao’s research interests include speech and speaker recognition, acoustic and language modeling, audio coding, and biomedical signal processing.
Table of Contents
Basic Ideas of GAN and Some Fundamental Theoretical Knowledge
– Three Categories of GANs
– Basic Theory of GANs
– Some Useful Techniques
– How to Evaluate GANs
– Relationship with Reinforcement Learning
Applications of GANs in Speech
– Speech Signal Generation
– Speech Signal Recognition
– Conclusion
Applications of GANs in Natural Language Processing
– GAN Sequence Generation
– Unsupervised Conditional Sequence Generation
Original Link:
https://interspeech2019.org/program/tutorials/
Attached PDF Preview:
Three Categories of GANs
Applications of Generative Adversarial Networks in Speech Processing
Basic Ideas of GAN
Applications of Generative Adversarial Networks in Speech Processing
Applications of Generative Adversarial Networks in Speech Processing
GAN has made great progress since its development in 2014
Applications of Generative Adversarial Networks in Speech Processing
Applications of Generative Adversarial Networks in Speech Processing
Conditional GAN
Applications of Generative Adversarial Networks in Speech Processing
In Conditional GAN, applications include generating images from images, sounds from images, and images from labels.
Applications of Generative Adversarial Networks in Speech Processing
Applications of Generative Adversarial Networks in Speech Processing
Applications of Generative Adversarial Networks in Speech Processing
There are two methods for unsupervised conditional GAN generation:
  1. Cycle-GAN
  2. Sharing a latent space
Applications of Generative Adversarial Networks in Speech Processing
Click the original link, and reply “GANSP” in the Special Knowledge WeChat public account to get the download link for the complete version of “Applications of Generative Adversarial Networks in Speech Processing and Natural Language Processing”~
Applications of Generative Adversarial Networks in Speech Processing

Leave a Comment