Understanding Transformers and Federated Learning

The Transformer, as an attention-based encoder-decoder architecture, has not only revolutionized the field of Natural Language Processing (NLP) but has also made groundbreaking contributions in the field of Computer Vision (CV). Compared to Convolutional Neural Networks (CNNs), Vision Transformers (ViT) rely on excellent modeling capabilities, achieving outstanding performance on multiple benchmarks such as ImageNet, COCO, and ADE20k.

As Atlas Wang, a computer scientist at the University of Texas at Austin, said: “We have ample reason to attempt to use Transformers across the entire spectrum of AI tasks.”

Therefore, whether researchers in academia or practitioners in industry, it is essential to have an in-depth understanding of Transformer technology and to keep up with cutting-edge research on Transformers to solidify their technical foundation.

AI is an easy field to start with, but very difficult to delve into deeply. This is also a key reason why high-end AI talent has been in short supply.

In the workplace:

Can you flexibly propose new models according to actual scenarios?

Or propose modifications to existing models?

These are actually core competencies and a threshold that one must pass to become a high-end talent. Although it is challenging, once you cross this threshold, you will find yourself among the TOP 5% in the market.

So we designed a course with one purpose: to give you the opportunity to become part of the TOP 5% in the market. In the course, we will explain the principles, implementation methods, and application techniques of Transformers in the fields of NLP and CV in a comprehensive and detailed manner, as well as the theoretical knowledge system of Federated Learning and its applications in privacy computing and finance. During the learning process, you can expand your thinking through six practical projects, integrating knowledge and improving your problem-solving skills.

Course Highlights

Comprehensive content explanation: Covers the hottest Transformer and Federated Learning topics in current applications and research, including over 70 Transformer models and three major types of Federated Learning with application cases.
In-depth technical analysis: A deep dive into the model and framework technical details of Transformers and Federated Learning, covering the most cutting-edge model principles and techniques.
Six practical projects: Each module includes a project, covering dialogue systems, text generation, image recognition, object detection, privacy computing, and financial risk control, enhancing students’ theoretical and practical skills.
Expert instructor team: Each module is taught by scientists or researchers with years of frontline experience in their respective fields, along with experienced teaching assistants, dedicated to providing the highest quality learning experience.

You will gain

▶ A comprehensive mastery of knowledge in the fields of Transformer and Federated Learning, flexibly applied in your work

▶ Understanding of the implementation methods of Transformer models and Federated Learning frameworks, and proficiency in their key techniques and methods

▶ A deep understanding of cutting-edge Transformer and Federated Learning technologies, broadening your technical vision for work and research

▶ A comprehensive and systematic understanding of a field in a short time, greatly saving learning time

▶ Meeting a group of like-minded individuals for mutual exchange and learning

Helping you become an industryTOP 10% engineer

Students interested in the course

Scan the QR code for inquiries

Below is a detailed introduction to each part of the content, interested friends can inquire for more.

NLP Transformer

Understanding Transformers and Federated Learning

Comprehensive technical explanation

Course content covers over 60 models including ELMo, GPT3 Codex, Alpha-Code, UniLM v2, BERT, RoBERTa, XLM, Span BERT, and more.
Project practice, applying what you learn

Students use Transformer models to practice the most widely used dialogue systems and text generation tasks in the NLP field.
Professionally crafted course content, cutting-edge and in-depth

The course content has been meticulously designed over hundreds of hours to ensure reasonable content and project node settings, truly achieving learning outcomes.
Job-oriented, clear goals

Outstanding students can receive referral interview opportunities with major Internet companies such as ByteDance, Alibaba, Tencent, Meituan, as well as AI unicorn companies like SenseTime and Megvii.

Content Outline

Week 1

Theme: Transformer in Autoregressive Language Models

This lesson will review important concepts in the field of natural language processing: language models. It will introduce autoregressive language models based on Transformers.

Course Outline:

ELMo
GPT/GPT2/GPT3
Codex/Alpha-Code
UniLM/UniLM v2
Tokenizers in NLP Models (wordpiece, BPE)

Week 2

Theme: Knowledge Distillation Optimization, Low-Rank Decomposition Optimization

This lesson will explain neural network knowledge distillation optimization and low-rank decomposition for accelerating computation.

Course Outline:

Introduction to Knowledge Distillation Methods
Principles and Steps of Knowledge Distillation
Demonstration of Knowledge Distillation Training Methods for Reducing Network Classification
Principles of Low-Rank Decomposition
Applications of Low-Rank Decomposition in Neural Network Inference

Week 3

Theme: Transformer Structures in Permutation Language Models

This lesson will introduce the task of Permutation Language Models and modifications to Transformer models based on this task.

Course Outline:

Transformer-XL
Relative Positional Embedding
Permutation Language Model
XLNet
MPNet

Week 4

Theme: Transformer Models Combined with Contrastive Learning

This lesson will introduce the contrastive learning framework. It will discuss how to design positive and negative examples within the contrastive learning framework using Transformer models.

Course Outline:

Common Loss Functions in Contrastive Learning
Word-Level Contrast: ELECTRA
Sentence-Level Contrast: ALBERT, StructBERT
Other Contrastive Learning Structures

Week 5

Theme: Applications of Transformer Models in Knowledge Modeling

This lesson will mainly introduce how to apply Transformer models in knowledge modeling, including how to inject knowledge into the models and how to better utilize the knowledge within the models.

Course Outline:

ERNIE/ERNIE2.0/ERNIE3.0
KnowBERT
K-BERT
SentiLR
KEPLER
WKLM
CoLAKE

Week 6

Theme: Transformers in Multilingual Applications and Transformers Suitable for Chinese

This lesson will introduce how to improve Transformers in multilingual applications and how to better apply Transformer models in Chinese.

Course Outline:

Multilingual Understanding: mBERT, Unicoder, XLM-R, MultiFit
Multilingual Generation: MASS, mBART, XNLG
Transformers for Processing Chinese: BERT-wwm-Chinese, NEZHA, ZEN
Transformers for Other Languages: BERTje, CamemBERT, FlauBERT, RobBERT

Week 7

Theme: Applications of Transformers in Dialogue and Summarization Tasks

This lesson will introduce the applications of Transformers in dialogue tasks and in text summarization tasks.

Course Outline:

Transformer Models in Dialogue: TransferTransfo, DialoGPT, Blender Bot, Meena, PLATO, LaMDa, GALAXY
Transformer Models in Text Summarization: BART, Pegasus

Week 8

Theme: Advanced Transformers: Faster, Larger, or Smaller

This lesson will introduce practical techniques in Transformer structures, how to achieve faster attention, how to increase the parameter count of Transformers, and how to reduce the parameter count.

Course Outline:

Faster: Multi-query Attention, Sparse Attention, performer, fastformer
Larger: Mixture of Expert (MoE)
Smaller: CompressingBERT, Q-BERT, ALBERT, DistillBERT, TinyBERT, MiniLM, BERT-of-Theseus.

Project Introduction

Project 1: Customer Service Dialogue System Based on Transformer

Project Description:

In this project, we will guide everyone to implement the Natural Language Understanding (NLU) module and the Natural Language Generation (NLG) module based on the Transformer model. The customer service dialogue system primarily implements the automatic dialogue function with users and helps users complete specific tasks, such as booking flights, hotels, restaurants, etc. The customer service dialogue system is the most widely applied type of task-oriented dialogue system.

Algorithms Used in the Project:

TextCNN/LSTM

BERT/RoBERTa

CRF

Contrastive Learning (Contrastive Loss/Triplet Loss)

Tools Used in the Project:

Python

pytorch

Elasticsearch

Transformers

Expected Results of the Project:

Using pre-trained models based on Transformers to implement the intent recognition module:

a) Proficient in text classification/sequence labeling models based on CNN/LSTM

b) Proficient in fine-tuning methods for pre-trained models based on Transformers

c) Mastering model compression techniques

Using pre-trained models based on Transformers to implement retrieval-based dialogue models:

a) Mastering the development of a coarse screening module based on Elasticsearch

b) Proficient in text matching algorithms based on pre-trained models based on Transformers

Corresponding Weeks of the Project:Weeks 1-4

Project 2:Text Generation Model Based on Transformer

Project Description:

In this project, we will guide everyone to implement a generative NLG model based on the Transformer model. Although this type of NLG model has certain uncontrollability, with the development of pre-trained model technology, we can now achieve very exciting generative effects. Even in certain specific fields, text generation based on pre-trained Transformer models has begun to create huge commercial value (such as text summarization, code generation, psychological counseling, etc.).

Algorithms Used in the Project:

Autoregressive Language Model

Multi-input Self-Attention

Top-K/Unclear Sampling

Beam Search

Tools Used in the Project:

Python

Pytorch

Expected Results of the Project:

1. Implement the generative NLG model based on pre-trained Transformers

2. Understand common techniques in text generation models, including

a) MLE loss in autoregressive language models

b) Various decoding techniques in text generation models

c) Using data parallelism to train models across multiple GPUs

3. Have the ability to independently develop and optimize text generation modules based on pre-trained models.

Corresponding Weeks of the Project:Weeks 1, 2, 5, 6, 7.

Helping you become an industryTOP 10% engineer

Students interested in the course

Scan the QR code for inquiries

Understanding Transformers and Federated Learning

CV Transformer

Comprehensive technical knowledge explanation

Course content covers explanations of over 10 models including Bert, ViT, SegFormer, DETR, UP-DETR, TimeSformer, DeiT, Mobile-Transformer, Efficient Transformer, SwinTransformer, Point Transformer, MTTR, MMT, Uniformer, etc.

Project practice, applying what you learn

Students use Transformer models to practice the most widely used image recognition and object detection tasks in the CV field.

Professionally crafted course content, cutting-edge and in-depth

The course content has been meticulously designed over hundreds of hours to ensure reasonable content and project node settings, truly achieving learning outcomes.

Job-oriented, clear goals

Outstanding students can receive referral interview opportunities with major Internet companies such as ByteDance, Alibaba, Tencent, Meituan, as well as AI unicorn companies like SenseTime and Megvii.

Content Outline

Week 1

Theme: Knowledge Review and Explanation of Transformers/Bert in NLP

This lesson will guide everyone through a review of Transformer/Bert technology in the NLP field. This will deepen understanding of Transformer/Bert technical details and algorithm advantages, facilitating further learning of Transformer technology applications in other fields.

Course Outline:

Self-Attention Mechanism and Parallelization Principles in Transformers in NLP.
Advanced Principles of Bert in Transformers.

Week 2

Theme: Applications of Transformers in Image Classification and Semantic Segmentation: Exploring ViT and SegFormer Technologies

Building on the content of the first lesson, further research will be conducted on how to transfer Transformer ideas to applications in two classification problems in computer vision: image classification and semantic segmentation. Using two classic architectures, ViT and SegFormer, students will experience how to apply Transformer ideas to visual domains.

Course Outline:

How to apply Transformer design ideas to image classification and semantic segmentation problems.

SegFormer

Week 3

Theme: Applications of Transformers in Object Detection: Exploring DETR and UP-DETR Technologies

This lesson will further study how to apply Transformer technology to object detection tasks, particularly how to design Transformer network structures that allow neural networks to learn both object classification and location information.

Course Outline:

Deep understanding of design ideas for applying Transformers to object detection.
DETR
UP-DETR

Week 4

Theme: Applications of Transformers in Video Understanding: Exploring TimeSformer Technology

This lesson will further study how to apply Transformer technology to video understanding applications, allowing Transformers to learn spatial correlations over time. Using TimeSformer as an example, students will deeply appreciate the design ideas involved.

Course Outline:

Important considerations when extending Transformer design ideas to temporal spatial correlation modeling.

TimeSformer

Week 5

Theme: Efficient Transformer Design Discussion: Exploring DeiT and Mobile-Transformer Technologies

Efficient Transformers have always been a goal pursued by researchers. This lesson will discuss how to design efficient Transformer network structures. Using DeiT and Mobile-Transformer as examples, we will delve into considerations to keep in mind when designing efficient networks.

Course Outline:

Considerations when designing Efficient Transformers, and discussions on optimizing Transformer perspectives.

DeiT

Mobile-Transformer

Week 6

Theme: Learning Classic Transformer Network Structures: Learning the SwinTransformer Model Family

This lesson will systematically learn the SwinTransformer model and its variant models. The goal is to help students further appreciate the design considerations when applying Transformers to visual tasks, including clever ideas and how to achieve parallel computation through reasonable design.

Course Outline:

The SwinTransformer Model Family
Design ideas of SwinTransformer. Considerations when designing Transformers to solve new problems.

Week 7

Theme: Transformers in Point Cloud

This lesson will share applications of Transformers in 3D Point Clouds. Based on the characteristics of 3D Point Cloud data, we will explore how to design suitable Transformer networks to handle massive, unstructured point cloud data, as well as how to further modify Transformer structures for segmentation and clustering tasks.

Course Outline:

Discuss important considerations when designing Transformers to handle point cloud data.
Point Transformer

Week 8

Theme: Transformer Design in Multimodal Applications

This lesson will learn about Transformer design issues in multi-modality. Transformers have been well applied in different fields. Recent work has explored how to design suitable Transformer structures to handle multimodal data. We will use MTTR, MMT, Uniformer, and related Transformers as examples for discussion.

Course Outline:

Discuss important considerations when designing Transformers to handle multi-modal data.
How to design suitable Transformers for handling multi-modal related issues: MTTR, MMT, Uniformer.

Project Introduction

Project One: Image Recognition System Based on ViT Model

Project Description: As a classic application case of Transformer in the visual field, the ViT model first applied the Transformer idea from the NLP field to the image domain, providing great inspiration for subsequent design work on Transformers in Vision. Going back to the source, we will use the ViT model for image classification tasks as an example to start a journey of applying Transformer ideas to the visual domain.

Algorithms Used in the Project:

ViT model

Cross-entropy loss

Multi-label/multi-class classification

Self-attention

LSTM/GRU

Tools Used in the Project:

Python

pytorch

OpenCV

ViT

Expected Results of the Project:

First, let students implement the ViT model themselves and test results on datasets. Then compare with the official implementation; if there are significant differences, students need to investigate the reasons.
Master how to apply the concepts of tokens and self-attention from Transformers to the image domain. Through related knowledge, students are encouraged to apply Transformer ideas to other related issues.
Master the training methods of ViT and guide students through the entire pipeline from data preparation, model training, parameter tuning, to model testing and metric calculation.

Corresponding Weeks of the Project:Weeks 1-3.

Project Two: Image Classification and Object Detection Tasks Based on SwinTransformer Model

Project Description: In the previous project, we learned about the ViT model, a successful visual Transformer model that applies Transformers to visual classification problems. However, the design of the ViT model is relatively singular and has some shortcomings, especially regarding issues present in images, such as scale transformation, which is not well addressed, and efficiency issues are not considered. In this project, we will learn about an advanced visual Transformer model: the SwinTransformer model.

Algorithms Used in the Project:

SwinTransformer

Cross-Entropy Loss

Regression Loss

Forward-Backward Propagation

Tools Used in the Project:

Python

pytorch

OpenCV

Expected Results of the Project:

Students will implement the SwinTransformer code (also refer to the official implementation) and optimize their implementation based on the official one; if there are significant differences in experimental results, students need to investigate the reasons.
Experience the ideas of using SwinTransformer for object detection.
Master how to optimize the implementation of the self-attention mechanism of SwinTransformer from local to global from a coding perspective.
Students will master how to apply Transformer ideas to practical problems in their work or studies.

Corresponding Weeks of the Project:Weeks 6-7.

Helping you become an industryTOP 10% engineer

Students interested in the course

Scan the QR code for inquiries

Understanding Transformers and Federated Learning

Federated Learning and Privacy Computing

Comprehensive technical knowledge explanation

The course content covers horizontal federated learning, vertical federated learning, and federated transfer learning architectures, including explanations of federated learning applications in vision, healthcare, finance, privacy computing, and government services.

Project practice, applying what you learn

Students use federated learning frameworks and algorithms to practice privacy computing and risk detection tasks in the financial field.
Professionally crafted course content, cutting-edge and in-depth

The course content has been meticulously designed over hundreds of hours to ensure reasonable content and project node settings, truly achieving learning outcomes.
Job-oriented, clear goals

Outstanding students can receive referral interview opportunities for federated learning engineer positions at major Internet companies like JD.com, Baidu, etc.

Content Outline

Week 1

Theme: Introduction to Federated Learning and Privacy Computing

Explanation of the definition of federated learning, classification of federated learning, research progress in federated learning, open-source platforms for federated learning, privacy protection technologies used in federated learning, and basic knowledge of privacy computing.

Course Outline:

Federated Learning System Architecture

Classification of Federated Learning

Common Open-Source Platforms for Federated Learning

Privacy Protection Technologies in Federated Learning

Definition and Classification of Privacy Computing

Homomorphic Encryption

Differential Privacy

Secure Multi-Party Computation

Week 2

Theme: Distributed Machine Learning

Explanation of the definition of distributed machine learning, distributed machine learning algorithms, and the evolution from distributed machine learning to federated learning.

Course Outline:

Definition of Distributed Machine Learning

Distributed Machine Learning Platforms

Large-Scale Machine Learning

Privacy-Preserving Machine Learning Solutions

Distributed Machine Learning Algorithms

Week 3

Theme: Horizontal Federated Learning

Explanation of the definition of horizontal federated learning, horizontal federated learning architecture, horizontal federated learning algorithms, and optimizations in horizontal federated learning.

Course Outline:

Definition of Horizontal Federated Learning
Architecture of Horizontal Federated Learning
Federated Averaging Algorithm
Horizontal Federated Learning Algorithms

Week 4

Theme: Building Financial Risk Control Models Using Privacy Computing

Explanation of the process and analysis of building financial risk control models using privacy computing.

Course Outline:

Process of Building Horizontal Federated Learning

Analysis of Horizontal Federated Learning Results

Week 5

Theme: Vertical Federated Learning

Explanation of the definition of vertical federated learning, vertical federated learning architecture, vertical federated learning algorithms, and optimizations in vertical federated learning.

Course Outline:

Definition of Vertical Federated Learning

Architecture of Vertical Federated Learning

Vertical Federated Linear Regression

Vertical Federated Decision Trees

Week 6

Theme: Federated Transfer Learning

Explanation of the definition of federated transfer learning, federated transfer learning architecture, federated transfer learning algorithms, and optimizations in federated transfer learning.

Course Outline:

Definition of Federated Transfer Learning

Federated Transfer Learning Framework

Training and Prediction in Federated Transfer Learning

Homomorphic Encryption in Federated Transfer Learning

Secret Sharing in Federated Transfer Learning

Week 7

Theme: Applications and Cutting-Edge Research of Privacy Computing and Federated Learning in Various Fields

Explanation of application cases, research content, and challenges faced by privacy computing and federated learning in various fields. For example, federated learning object detection networks in computer vision; differential privacy data sharing in government; federated learning user behavior prediction in smart IoT; federated learning health analysis, homomorphic encryption gene analysis in healthcare; federated learning anti-fraud, privacy-preserving joint risk control in finance.

Course Outline:

Federated Learning Application Cases (Federated Learning Object Detection Networks in Computer Vision, Differential Privacy Data Sharing in Government, Federated Learning User Behavior Prediction in Smart IoT, Federated Learning Health Analysis, etc.)

Research and Challenges Faced by Federated Learning

Application Cases of Privacy Computing

Research and Challenges Faced by Privacy Computing

Week 8

Theme: Building Financial Risk Monitoring Models Using Federated Learning

Explanation of the process and result analysis of building financial risk monitoring models using federated learning.

Course Outline:

Process of Building Financial Risk Monitoring Models

Analysis of Financial Risk Monitoring Model Results

Project Introduction

Project One: Practical Application of Financial Privacy Computing

Project Description: Explanation of concepts related to federated learning and privacy computing, the development of federated learning and privacy computing, basic technologies of federated learning (privacy protection technologies and distributed learning technologies), definition and architecture of horizontal federated learning, and detailed explanation of horizontal federated learning algorithms, ultimately using privacy computing to build financial risk control models.

Algorithms Used in the Project:

Horizontal Federated Learning

Tools Used in the Project:

FATE/examples/data, open-source datasets

Expected Results of the Project:

Familiarity with knowledge and basic concepts related to federated learning and privacy computing, familiarity with the definition and architecture of horizontal federated learning, mastery of horizontal federated learning algorithms, and implementation of financial risk monitoring based on privacy computing.

Corresponding Weeks of the Project:Weeks 1-4

Project Two: Practical Application of Federated Learning in Financial Risk Monitoring

Project Description: Explanation of the definition, architecture, and algorithms of vertical federated learning; explanation of the definition, architecture, and algorithms of federated transfer learning; explanation of the current application status and typical cases of federated learning in the industry; ultimately using the open-source framework FATE to implement the construction of a financial risk monitoring model.

Algorithms Used in the Project:

Federated Learning

Tools Used in the Project:

Python, open-source framework FATE

Expected Results of the Project:

Familiarity with the definitions and architectures of vertical federated learning and federated transfer learning, mastery of vertical federated learning and federated transfer learning algorithms, and implementation of a financial risk monitoring model based on the open-source framework FATE.

Corresponding Weeks of the Project:Weeks 5-8

Helping you become an industryTOP 10% engineer

Students interested in the course

Scan the QR code for inquiries

Understanding Transformers and Federated Learning

Target Audience

College Students

Good foundation in programming and deep learning, aiming to enter the AI industry
Strong interest in Transformers or Federated Learning, wishing to engage in practical work

Working Professionals

Need to apply machine learning, deep learning, and other technologies in their work
Aiming to enter the AI algorithm industry as an AI algorithm engineer
Wishing to broaden future career paths by mastering advanced AI knowledge

Instructor Team

Teacher Zheng

Main Lecturer on NLP

Postdoctoral researcher in the Department of Computer Science and Artificial Intelligence at Tsinghua University

Visiting Scholar at Lawrence Berkeley National Laboratory in the USA

Mainly engaged in pioneering research and commercialization in natural language processing and dialogue fields

Has published over ten high-level papers in top international conferences and journals such as AAAI, NeurIPS, ACM, EMNLP

Jackson

Main Lecturer on CV

Doctor of Computer Science from Oxford University

Previously worked as an algorithm scientist at companies like BAT

Engaged in research related to computer vision, deep learning, and speech signal processing

Has published several papers in top international conferences such as CVPR, ICML, AAAI, ICRA

Teacher Wanyan

Main Lecturer on Federated Learning

Doctor of Signal and Information Processing from Beihang University

Associate Senior Engineer at a financial central enterprise research institute

Has published over ten papers in renowned journals at different levels, including SCI Q1, EI, and core journals, and has five accepted and granted invention patents and three software copyrights

Participated in national 863 programs, police application innovation programs, and other research projects, leading multiple research projects at key laboratories for green development big data decision-making, Beihang University Jinhua BeiDou Application Research Institute, and Tsinghua Joint Research Institute

Jerry Yuan

Course Development Consultant

Head of Recommendation Systems at Microsoft (Headquarters) in the USA

Senior Engineer at Amazon (Headquarters) in the USA

Doctor from New Jersey Institute of Technology in the USA

14 years of research and project experience in the fields of artificial intelligence, digital image processing, and recommendation systems

Has published over 20 papers in international conferences related to AI

Li Wenzhe

CEO of Greedy Technology

Doctor from the University of Southern California in the USA

Former Chief Data Scientist at the unicorn company Kingsoft Group, Senior Engineer at Amazon and Goldman Sachs

Pioneer in using knowledge graphs for big data anti-fraud in the financial industry

Has published over 15 papers in international conferences such as AAAI, KDD, AISTATS, and CHI

Teaching Methods

Basic knowledge explanation

Interpretation of cutting-edge papers

Practical applications of the knowledge content

Project practice of the knowledge

Extension of knowledge in this direction and explanation of future trends

Helping you become an industryTOP 10% engineer

Students interested in the course

Scan the QR code for inquiries

Leave a Comment Cancel reply