Pre-training Archives

Current Status and Analysis of Pre-trained Models in NLP

2025-07-18 by AI Agent

Author | Wang Zeyang Organization | Niutrans Editor | Tang Li Reprinted from WeChat Official Account | AI Technology Review This article is submitted by Wang Zeyang, a graduate student from the Natural Language Processing Laboratory of Northeast University. Wang Zeyang’s research direction is machine translation. Niutrans, whose core members come from the Natural Language … Read more

Latest Overview of Multimodal Pre-training Models

2025-07-15 by AI Agent

Follow the public account “ML_NLP“ Set as “Starred“, delivering heavy content promptly! Reprinted from | Zhihu Author | Liang Chao Wei from Summer ResortOriginal link | https://zhuanlan.zhihu.com/p/412126626 01 – Background In the traditional NLP unimodal field, the development of representation learning is relatively mature. However, in the multimodal field, due to the scarcity of high-quality … Read more

Application Of Large-Scale Pre-trained Models In Quantitative Investment (Part 1)

2025-07-02 by AI Agent

Application of Large-Scale Pre-trained Models in Quantitative Investment (Part 1) Research Unit: Taiping Asset Management Co., Ltd. Project Leader: Wang Zhenzhou Project Team Members: Wang Teng, Yi Chao, Zuo Wenting, Hu Qiang, Yu Hui Abstract: This project deeply explores the application of large-scale pre-trained models in the field of quantitative investment, mainly addressing several key … Read more

Overview of Latest Transformer Pre-training Models

2025-06-21 by AI Agent

Reported by Machine Heart In today’s NLP field, we can see the success of “Transformer-based Pre-trained Language Models (T-PTLM)” in almost every task. These models originated from GPT and BERT. The technical foundations of these models include Transformer, self-supervised learning, and transfer learning. T-PTLM can learn universal language representations from large-scale text data using self-supervised … Read more

Bart: Seq2Seq Pre-training Model

2025-06-21 by AI Agent

Follow the public account “ML_NLP“ Set as “Starred“, heavy content delivered first-hand! Recently, I have started using Transformer for some tasks, specifically recording related knowledge points to build a relevant and complete knowledge structure system. The following is the article I am going to write; this is the sixteenth article in this series: Transformer: The … Read more

CMU Liu Pengfei: The Fourth Paradigm of NLP

2025-06-18 by AI Agent

Written by | Liu Pengfei Edited by | Jia Wei Source | AI Technology Review In the past two years, the research paradigm based on pre-training + fine-tuning has rapidly swept the entire field of NLP. This research paradigm is widely recognized as a revolutionary paradigm in NLP research, with previous paradigms including “expert systems,” … Read more

Top-Notch: Research Progress of Latest Pre-trained Models from XLNet’s Multi-stream Mechanism

2025-05-07 by AI Agent

Follow the public account “ML_NLP“ Set as “Starred“, heavy content delivered first! Written by | Lao Tao (Researcher from a certain company, hereditary parameter tuning) Translated by | Beautiful person with meticulous thoughts Introduction As the hottest topic in NLP over the past two years, the language pre-training technologies represented by ELMo/BERT are already familiar … Read more

NLP Pre-training Models in the Post-BERT Era

2025-04-10 by AI Agent

This article introduces several papers that improve the pretraining process of BERT, including Pre-Training with Whole Word Masking for Chinese BERT, ERNIE: Enhanced Representation through Knowledge Integration, and ERNIE 2.0: A Continual Pre-training Framework for Language Understanding. Note: These papers all implement different improvements to the masking of BERT’s pretraining phase, but do not modify … Read more

Interpretation of QWen2.5 Technical Report

2025-04-08 by AI Agent

Paper Link:https://arxiv.org/pdf/2412.15115 Github Code: https://github.com/QwenLM/Qwen2.5 The technical report of the Qwen2.5 series large language model launched by Alibaba Cloud has been released, covering improvements in model architecture, pre-training, post-training, evaluation, and more. Today, we will provide a simple interpretation. Summary: 1. Core Insights 1.1. Model Improvements ● Architecture and Tokenizer: The Qwen2.5 series includes dense … Read more

Analysis of Qwen2.5 Coder Training Process and Data Distribution

2025-03-23 by AI Agent

I have read some papers and training data on Qwen2.5 Coder and summarized them. Paper link: https://arxiv.org/pdf/2409.12186 1. Introduction The Qwen2.5-Coder series is a major upgrade from its predecessor CodeQwen1.5, aimed at achieving top-notch code task performance across various model sizes. This series includes six models: Qwen2.5-Coder-0.5B Qwen2.5-Coder-1.5B Qwen2.5-Coder-3B Qwen2.5-Coder-7B Qwen2.5-Coder-14B Qwen2.5-Coder-32B The architecture of … Read more