The Transformer, as an attention-based encoder-decoder architecture, has not only revolutionized the field of Natural Language Processing (NLP) but has also made groundbreaking contributions in the field of Computer Vision (CV). Compared to Convolutional Neural Networks (CNNs), Vision Transformers (ViT) rely on excellent modeling capabilities, achieving outstanding performance on multiple benchmarks such as ImageNet, COCO, and ADE20k.
As Atlas Wang, a computer scientist at the University of Texas at Austin, said: “We have ample reason to attempt to use Transformers across the entire spectrum of AI tasks.”
Therefore, whether researchers in academia or practitioners in industry, it is essential to have an in-depth understanding of Transformer technology and to keep up with cutting-edge research on Transformers to solidify their technical foundation.
AI is an easy field to start with, but very difficult to delve into deeply. This is also a key reason why high-end AI talent has been in short supply.
In the workplace:
Can you flexibly propose new models according to actual scenarios?
Or propose modifications to existing models?
These are actually core competencies and a threshold that one must pass to become a high-end talent. Although it is challenging, once you cross this threshold, you will find yourself among the TOP 5% in the market.
So we designed a course with one purpose: to give you the opportunity to become part of the TOP 5% in the market. In the course, we will explain the principles, implementation methods, and application techniques of Transformers in the fields of NLP and CV in a comprehensive and detailed manner, as well as the theoretical knowledge system of Federated Learning and its applications in privacy computing and finance. During the learning process, you can expand your thinking through six practical projects, integrating knowledge and improving your problem-solving skills.
-
Comprehensive content explanation: Covers the hottest Transformer and Federated Learning topics in current applications and research, including over 70 Transformer models and three major types of Federated Learning with application cases.
-
In-depth technical analysis: A deep dive into the model and framework technical details of Transformers and Federated Learning, covering the most cutting-edge model principles and techniques.
-
Six practical projects: Each module includes a project, covering dialogue systems, text generation, image recognition, object detection, privacy computing, and financial risk control, enhancing students’ theoretical and practical skills.
-
Expert instructor team: Each module is taught by scientists or researchers with years of frontline experience in their respective fields, along with experienced teaching assistants, dedicated to providing the highest quality learning experience.
▶ A comprehensive mastery of knowledge in the fields of Transformer and Federated Learning, flexibly applied in your work
▶ Understanding of the implementation methods of Transformer models and Federated Learning frameworks, and proficiency in their key techniques and methods
▶ A deep understanding of cutting-edge Transformer and Federated Learning technologies, broadening your technical vision for work and research
▶ A comprehensive and systematic understanding of a field in a short time, greatly saving learning time
▶ Meeting a group of like-minded individuals for mutual exchange and learning
Helping you become an industryTOP 10% engineer
Students interested in the course
Scan the QR code for inquiries

Below is a detailed introduction to each part of the content, interested friends can inquire for more.
-
Comprehensive technical explanation
Course content covers over 60 models including ELMo, GPT3 Codex, Alpha-Code, UniLM v2, BERT, RoBERTa, XLM, Span BERT, and more.
-
Project practice, applying what you learn
Students use Transformer models to practice the most widely used dialogue systems and text generation tasks in the NLP field.
-
Professionally crafted course content, cutting-edge and in-depth
The course content has been meticulously designed over hundreds of hours to ensure reasonable content and project node settings, truly achieving learning outcomes.
-
Job-oriented, clear goals
Outstanding students can receive referral interview opportunities with major Internet companies such as ByteDance, Alibaba, Tencent, Meituan, as well as AI unicorn companies like SenseTime and Megvii.
Theme: Transformer in Autoregressive Language Models
This lesson will review important concepts in the field of natural language processing: language models. It will introduce autoregressive language models based on Transformers.
-
-
-
-
-
Tokenizers in NLP Models (wordpiece, BPE)
Theme: Knowledge Distillation Optimization, Low-Rank Decomposition Optimization
This lesson will explain neural network knowledge distillation optimization and low-rank decomposition for accelerating computation.
-
Introduction to Knowledge Distillation Methods
-
Principles and Steps of Knowledge Distillation
-
Demonstration of Knowledge Distillation Training Methods for Reducing Network Classification
-
Principles of Low-Rank Decomposition
-
Applications of Low-Rank Decomposition in Neural Network Inference
Theme: Transformer Structures in Permutation Language Models
This lesson will introduce the task of Permutation Language Models and modifications to Transformer models based on this task.
-
-
Relative Positional Embedding
-
Permutation Language Model
-
-
Theme: Transformer Models Combined with Contrastive Learning
This lesson will introduce the contrastive learning framework. It will discuss how to design positive and negative examples within the contrastive learning framework using Transformer models.
-
Common Loss Functions in Contrastive Learning
-
Word-Level Contrast: ELECTRA
-
Sentence-Level Contrast: ALBERT, StructBERT
-
Other Contrastive Learning Structures
Theme: Applications of Transformer Models in Knowledge Modeling
This lesson will mainly introduce how to apply Transformer models in knowledge modeling, including how to inject knowledge into the models and how to better utilize the knowledge within the models.
Theme: Transformers in Multilingual Applications and Transformers Suitable for Chinese
This lesson will introduce how to improve Transformers in multilingual applications and how to better apply Transformer models in Chinese.
-
Multilingual Understanding: mBERT, Unicoder, XLM-R, MultiFit
-
Multilingual Generation: MASS, mBART, XNLG
-
Transformers for Processing Chinese: BERT-wwm-Chinese, NEZHA, ZEN
-
Transformers for Other Languages: BERTje, CamemBERT, FlauBERT, RobBERT
Theme: Applications of Transformers in Dialogue and Summarization Tasks
This lesson will introduce the applications of Transformers in dialogue tasks and in text summarization tasks.
-
Transformer Models in Dialogue: TransferTransfo, DialoGPT, Blender Bot, Meena, PLATO, LaMDa, GALAXY
-
Transformer Models in Text Summarization: BART, Pegasus
Theme: Advanced Transformers: Faster, Larger, or Smaller
This lesson will introduce practical techniques in Transformer structures, how to achieve faster attention, how to increase the parameter count of Transformers, and how to reduce the parameter count.
-
Faster: Multi-query Attention, Sparse Attention, performer, fastformer
-
Larger: Mixture of Expert (MoE)
-
Smaller: CompressingBERT, Q-BERT, ALBERT, DistillBERT, TinyBERT, MiniLM, BERT-of-Theseus.
Project 1: Customer Service Dialogue System Based on Transformer
In this project, we will guide everyone to implement the Natural Language Understanding (NLU) module and the Natural Language Generation (NLG) module based on the Transformer model. The customer service dialogue system primarily implements the automatic dialogue function with users and helps users complete specific tasks, such as booking flights, hotels, restaurants, etc. The customer service dialogue system is the most widely applied type of task-oriented dialogue system.
Algorithms Used in the Project:
Contrastive Learning (Contrastive Loss/Triplet Loss)
Tools Used in the Project:
Expected Results of the Project:
Using pre-trained models based on Transformers to implement the intent recognition module:
a) Proficient in text classification/sequence labeling models based on CNN/LSTM
b) Proficient in fine-tuning methods for pre-trained models based on Transformers
c) Mastering model compression techniques
Using pre-trained models based on Transformers to implement retrieval-based dialogue models:
a) Mastering the development of a coarse screening module based on Elasticsearch
b) Proficient in text matching algorithms based on pre-trained models based on Transformers
Corresponding Weeks of the Project:Weeks 1-4
Project 2:Text Generation Model Based on Transformer
In this project, we will guide everyone to implement a generative NLG model based on the Transformer model. Although this type of NLG model has certain uncontrollability, with the development of pre-trained model technology, we can now achieve very exciting generative effects. Even in certain specific fields, text generation based on pre-trained Transformer models has begun to create huge commercial value (such as text summarization, code generation, psychological counseling, etc.).
Algorithms Used in the Project:
Autoregressive Language Model
Multi-input Self-Attention
Tools Used in the Project:
Expected Results of the Project:
1. Implement the generative NLG model based on pre-trained Transformers
2. Understand common techniques in text generation models, including
a) MLE loss in autoregressive language models
b) Various decoding techniques in text generation models
c) Using data parallelism to train models across multiple GPUs
3. Have the ability to independently develop and optimize text generation modules based on pre-trained models.
Corresponding Weeks of the Project:Weeks 1, 2, 5, 6, 7.
Helping you become an industryTOP 10% engineer
Students interested in the course
Scan the QR code for inquiries

-
Comprehensive technical knowledge explanation
Course content covers explanations of over 10 models including Bert, ViT, SegFormer, DETR, UP-DETR, TimeSformer, DeiT, Mobile-Transformer, Efficient Transformer, SwinTransformer, Point Transformer, MTTR, MMT, Uniformer, etc.
-
Project practice, applying what you learn
Students use Transformer models to practice the most widely used image recognition and object detection tasks in the CV field.
-
Professionally crafted course content, cutting-edge and in-depth
The course content has been meticulously designed over hundreds of hours to ensure reasonable content and project node settings, truly achieving learning outcomes.
-
Job-oriented, clear goals
Outstanding students can receive referral interview opportunities with major Internet companies such as ByteDance, Alibaba, Tencent, Meituan, as well as AI unicorn companies like SenseTime and Megvii.
Theme: Knowledge Review and Explanation of Transformers/Bert in NLP
This lesson will guide everyone through a review of Transformer/Bert technology in the NLP field. This will deepen understanding of Transformer/Bert technical details and algorithm advantages, facilitating further learning of Transformer technology applications in other fields.
-
Self-Attention Mechanism and Parallelization Principles in Transformers in NLP.
-
Advanced Principles of Bert in Transformers.
Theme: Applications of Transformers in Image Classification and Semantic Segmentation: Exploring ViT and SegFormer Technologies
Building on the content of the first lesson, further research will be conducted on how to transfer Transformer ideas to applications in two classification problems in computer vision: image classification and semantic segmentation. Using two classic architectures, ViT and SegFormer, students will experience how to apply Transformer ideas to visual domains.
-
How to apply Transformer design ideas to image classification and semantic segmentation problems.
Theme: Applications of Transformers in Object Detection: Exploring DETR and UP-DETR Technologies
This lesson will further study how to apply Transformer technology to object detection tasks, particularly how to design Transformer network structures that allow neural networks to learn both object classification and location information.
-
Deep understanding of design ideas for applying Transformers to object detection.
-
-
Theme: Applications of Transformers in Video Understanding: Exploring TimeSformer Technology
This lesson will further study how to apply Transformer technology to video understanding applications, allowing Transformers to learn spatial correlations over time. Using TimeSformer as an example, students will deeply appreciate the design ideas involved.
-
Important considerations when extending Transformer design ideas to temporal spatial correlation modeling.
Theme: Efficient Transformer Design Discussion: Exploring DeiT and Mobile-Transformer Technologies
Efficient Transformers have always been a goal pursued by researchers. This lesson will discuss how to design efficient Transformer network structures. Using DeiT and Mobile-Transformer as examples, we will delve into considerations to keep in mind when designing efficient networks.
-
Considerations when designing Efficient Transformers, and discussions on optimizing Transformer perspectives.
Theme: Learning Classic Transformer Network Structures: Learning the SwinTransformer Model Family
This lesson will systematically learn the SwinTransformer model and its variant models. The goal is to help students further appreciate the design considerations when applying Transformers to visual tasks, including clever ideas and how to achieve parallel computation through reasonable design.
-
The SwinTransformer Model Family
-
Design ideas of SwinTransformer. Considerations when designing Transformers to solve new problems.
Theme: Transformers in Point Cloud
This lesson will share applications of Transformers in 3D Point Clouds. Based on the characteristics of 3D Point Cloud data, we will explore how to design suitable Transformer networks to handle massive, unstructured point cloud data, as well as how to further modify Transformer structures for segmentation and clustering tasks.
-
Discuss important considerations when designing Transformers to handle point cloud data.
-
Theme: Transformer Design in Multimodal Applications
This lesson will learn about Transformer design issues in multi-modality. Transformers have been well applied in different fields. Recent work has explored how to design suitable Transformer structures to handle multimodal data. We will use MTTR, MMT, Uniformer, and related Transformers as examples for discussion.
-
Discuss important considerations when designing Transformers to handle multi-modal data.
-
How to design suitable Transformers for handling multi-modal related issues: MTTR, MMT, Uniformer.
Project One: Image Recognition System Based on ViT Model
Project Description: As a classic application case of Transformer in the visual field, the ViT model first applied the Transformer idea from the NLP field to the image domain, providing great inspiration for subsequent design work on Transformers in Vision. Going back to the source, we will use the ViT model for image classification tasks as an example to start a journey of applying Transformer ideas to the visual domain.
Algorithms Used in the Project:
Multi-label/multi-class classification
Tools Used in the Project:
Expected Results of the Project:
-
First, let students implement the ViT model themselves and test results on datasets. Then compare with the official implementation; if there are significant differences, students need to investigate the reasons.
-
Master how to apply the concepts of tokens and self-attention from Transformers to the image domain. Through related knowledge, students are encouraged to apply Transformer ideas to other related issues.
-
Master the training methods of ViT and guide students through the entire pipeline from data preparation, model training, parameter tuning, to model testing and metric calculation.
Corresponding Weeks of the Project:Weeks 1-3.
Project Two: Image Classification and Object Detection Tasks Based on SwinTransformer Model
Project Description: In the previous project, we learned about the ViT model, a successful visual Transformer model that applies Transformers to visual classification problems. However, the design of the ViT model is relatively singular and has some shortcomings, especially regarding issues present in images, such as scale transformation, which is not well addressed, and efficiency issues are not considered. In this project, we will learn about an advanced visual Transformer model: the SwinTransformer model.
Algorithms Used in the Project:
Forward-Backward Propagation
Tools Used in the Project:
Expected Results of the Project:
-
Students will implement the SwinTransformer code (also refer to the official implementation) and optimize their implementation based on the official one; if there are significant differences in experimental results, students need to investigate the reasons.
-
Experience the ideas of using SwinTransformer for object detection.
-
Master how to optimize the implementation of the self-attention mechanism of SwinTransformer from local to global from a coding perspective.
-
Students will master how to apply Transformer ideas to practical problems in their work or studies.
Corresponding Weeks of the Project:Weeks 6-7.
Helping you become an industryTOP 10% engineer
Students interested in the course
Scan the QR code for inquiries

Federated Learning and Privacy Computing
Comprehensive technical knowledge explanation
The course content covers horizontal federated learning, vertical federated learning, and federated transfer learning architectures, including explanations of federated learning applications in vision, healthcare, finance, privacy computing, and government services.
-
Project practice, applying what you learn
Students use federated learning frameworks and algorithms to practice privacy computing and risk detection tasks in the financial field.
-
Professionally crafted course content, cutting-edge and in-depth
The course content has been meticulously designed over hundreds of hours to ensure reasonable content and project node settings, truly achieving learning outcomes.
-
Job-oriented, clear goals
Outstanding students can receive referral interview opportunities for federated learning engineer positions at major Internet companies like JD.com, Baidu, etc.
Theme: Introduction to Federated Learning and Privacy Computing
Explanation of the definition of federated learning, classification of federated learning, research progress in federated learning, open-source platforms for federated learning, privacy protection technologies used in federated learning, and basic knowledge of privacy computing.
-
Federated Learning System Architecture
-
Classification of Federated Learning
-
Common Open-Source Platforms for Federated Learning
-
Privacy Protection Technologies in Federated Learning
-
Definition and Classification of Privacy Computing
-
Secure Multi-Party Computation
Theme: Distributed Machine Learning
Explanation of the definition of distributed machine learning, distributed machine learning algorithms, and the evolution from distributed machine learning to federated learning.
-
Definition of Distributed Machine Learning
-
Distributed Machine Learning Platforms
-
Large-Scale Machine Learning
-
Privacy-Preserving Machine Learning Solutions
-
Distributed Machine Learning Algorithms
Theme: Horizontal Federated Learning
Explanation of the definition of horizontal federated learning, horizontal federated learning architecture, horizontal federated learning algorithms, and optimizations in horizontal federated learning.
-
Definition of Horizontal Federated Learning
-
Architecture of Horizontal Federated Learning
-
Federated Averaging Algorithm
-
Horizontal Federated Learning Algorithms
Theme: Building Financial Risk Control Models Using Privacy Computing
Explanation of the process and analysis of building financial risk control models using privacy computing.
-
Process of Building Horizontal Federated Learning
-
Analysis of Horizontal Federated Learning Results
Theme: Vertical Federated Learning
Explanation of the definition of vertical federated learning, vertical federated learning architecture, vertical federated learning algorithms, and optimizations in vertical federated learning.
-
Definition of Vertical Federated Learning
-
Architecture of Vertical Federated Learning
-
Vertical Federated Linear Regression
-
Vertical Federated Decision Trees
Theme: Federated Transfer Learning
Explanation of the definition of federated transfer learning, federated transfer learning architecture, federated transfer learning algorithms, and optimizations in federated transfer learning.
-
Definition of Federated Transfer Learning
-
Federated Transfer Learning Framework
-
Training and Prediction in Federated Transfer Learning
-
Homomorphic Encryption in Federated Transfer Learning
-
Secret Sharing in Federated Transfer Learning
Theme: Applications and Cutting-Edge Research of Privacy Computing and Federated Learning in Various Fields
Explanation of application cases, research content, and challenges faced by privacy computing and federated learning in various fields. For example, federated learning object detection networks in computer vision; differential privacy data sharing in government; federated learning user behavior prediction in smart IoT; federated learning health analysis, homomorphic encryption gene analysis in healthcare; federated learning anti-fraud, privacy-preserving joint risk control in finance.
-
Federated Learning Application Cases (Federated Learning Object Detection Networks in Computer Vision, Differential Privacy Data Sharing in Government, Federated Learning User Behavior Prediction in Smart IoT, Federated Learning Health Analysis, etc.)
-
Research and Challenges Faced by Federated Learning
-
Application Cases of Privacy Computing
-
Research and Challenges Faced by Privacy Computing
Theme: Building Financial Risk Monitoring Models Using Federated Learning
Explanation of the process and result analysis of building financial risk monitoring models using federated learning.
-
Process of Building Financial Risk Monitoring Models
-
Analysis of Financial Risk Monitoring Model Results
Project One: Practical Application of Financial Privacy Computing
Project Description: Explanation of concepts related to federated learning and privacy computing, the development of federated learning and privacy computing, basic technologies of federated learning (privacy protection technologies and distributed learning technologies), definition and architecture of horizontal federated learning, and detailed explanation of horizontal federated learning algorithms, ultimately using privacy computing to build financial risk control models.
Algorithms Used in the Project:
Horizontal Federated Learning
Tools Used in the Project:
FATE/examples/data, open-source datasets
Expected Results of the Project:
Familiarity with knowledge and basic concepts related to federated learning and privacy computing, familiarity with the definition and architecture of horizontal federated learning, mastery of horizontal federated learning algorithms, and implementation of financial risk monitoring based on privacy computing.
Corresponding Weeks of the Project:Weeks 1-4
Project Two: Practical Application of Federated Learning in Financial Risk Monitoring
Project Description: Explanation of the definition, architecture, and algorithms of vertical federated learning; explanation of the definition, architecture, and algorithms of federated transfer learning; explanation of the current application status and typical cases of federated learning in the industry; ultimately using the open-source framework FATE to implement the construction of a financial risk monitoring model.
Algorithms Used in the Project:
Tools Used in the Project:
Python, open-source framework FATE
Expected Results of the Project:
Familiarity with the definitions and architectures of vertical federated learning and federated transfer learning, mastery of vertical federated learning and federated transfer learning algorithms, and implementation of a financial risk monitoring model based on the open-source framework FATE.
Corresponding Weeks of the Project:Weeks 5-8
Helping you become an industryTOP 10% engineer
Students interested in the course
Scan the QR code for inquiries

-
Good foundation in programming and deep learning, aiming to enter the AI industry
-
Strong interest in Transformers or Federated Learning, wishing to engage in practical work
-
Need to apply machine learning, deep learning, and other technologies in their work
-
Aiming to enter the AI algorithm industry as an AI algorithm engineer
-
Wishing to broaden future career paths by mastering advanced AI knowledge
Postdoctoral researcher in the Department of Computer Science and Artificial Intelligence at Tsinghua University
Visiting Scholar at Lawrence Berkeley National Laboratory in the USA
Mainly engaged in pioneering research and commercialization in natural language processing and dialogue fields
Has published over ten high-level papers in top international conferences and journals such as AAAI, NeurIPS, ACM, EMNLP
Doctor of Computer Science from Oxford University
Previously worked as an algorithm scientist at companies like BAT
Engaged in research related to computer vision, deep learning, and speech signal processing
Has published several papers in top international conferences such as CVPR, ICML, AAAI, ICRA
Main Lecturer on Federated Learning
Doctor of Signal and Information Processing from Beihang University
Associate Senior Engineer at a financial central enterprise research institute
Has published over ten papers in renowned journals at different levels, including SCI Q1, EI, and core journals, and has five accepted and granted invention patents and three software copyrights
Participated in national 863 programs, police application innovation programs, and other research projects, leading multiple research projects at key laboratories for green development big data decision-making, Beihang University Jinhua BeiDou Application Research Institute, and Tsinghua Joint Research Institute
Course Development Consultant
Head of Recommendation Systems at Microsoft (Headquarters) in the USA
Senior Engineer at Amazon (Headquarters) in the USA
Doctor from New Jersey Institute of Technology in the USA
14 years of research and project experience in the fields of artificial intelligence, digital image processing, and recommendation systems
Has published over 20 papers in international conferences related to AI
Doctor from the University of Southern California in the USA
Former Chief Data Scientist at the unicorn company Kingsoft Group, Senior Engineer at Amazon and Goldman Sachs
Pioneer in using knowledge graphs for big data anti-fraud in the financial industry
Has published over 15 papers in international conferences such as AAAI, KDD, AISTATS, and CHI
Basic knowledge explanation
Interpretation of cutting-edge papers
Practical applications of the knowledge content
Project practice of the knowledge
Extension of knowledge in this direction and explanation of future trends
Helping you become an industryTOP 10% engineer
Students interested in the course
Scan the QR code for inquiries
