NLP Pre-training Models in the Post-BERT Era

NLP Pre-training Models in the Post-BERT Era

This article introduces several papers that improve the pretraining process of BERT, including Pre-Training with Whole Word Masking for Chinese BERT, ERNIE: Enhanced Representation through Knowledge Integration, and ERNIE 2.0: A Continual Pre-training Framework for Language Understanding. Note: These papers all implement different improvements to the masking of BERT’s pretraining phase, but do not modify … Read more

Interpretation of QWen2.5 Technical Report

Interpretation of QWen2.5 Technical Report

Paper Link:https://arxiv.org/pdf/2412.15115 Github Code: https://github.com/QwenLM/Qwen2.5 The technical report of the Qwen2.5 series large language model launched by Alibaba Cloud has been released, covering improvements in model architecture, pre-training, post-training, evaluation, and more. Today, we will provide a simple interpretation. Summary: 1. Core Insights 1.1. Model Improvements ● Architecture and Tokenizer: The Qwen2.5 series includes dense … Read more

Analysis of Qwen2.5 Coder Training Process and Data Distribution

Analysis of Qwen2.5 Coder Training Process and Data Distribution

I have read some papers and training data on Qwen2.5 Coder and summarized them. Paper link: https://arxiv.org/pdf/2409.12186 1. Introduction The Qwen2.5-Coder series is a major upgrade from its predecessor CodeQwen1.5, aimed at achieving top-notch code task performance across various model sizes. This series includes six models: Qwen2.5-Coder-0.5B Qwen2.5-Coder-1.5B Qwen2.5-Coder-3B Qwen2.5-Coder-7B Qwen2.5-Coder-14B Qwen2.5-Coder-32B The architecture of … Read more

Understanding Alibaba’s Qwen Model and Local Deployment

Understanding Alibaba's Qwen Model and Local Deployment

Introduction Overview Pre-training Data Sources Pre-processing Tokenization Model Design Extrapolation Capability Model Training Experimental Results Deployment Testing Alignment Supervised Fine-tuning (SFT) RM Model Reinforcement Learning Alignment Results (Automatic and Human Evaluation) Automatic Evaluation Human Evaluation Deployment Testing Conclusion Introduction This article mainly introduces the Chinese large model Alibaba Qwen, specifically including model details interpretation and … Read more

Understanding Model Pre-training in Neural Networks

Understanding Model Pre-training in Neural Networks

This article will explain the essence of pre-training principles, and applications in three aspects, helping you understand model pre-training Pre-training. Pre-training 1.Essence of Pre-training AI = Data + Algorithms + Computing Power Three Elements of AI Dataset:Data is one of the three pillars of AI and is very important in AI technology. Datasets are generally … Read more

From Word2Vec to BERT: The Evolution of NLP Pre-trained Models

From Word2Vec to BERT: The Evolution of NLP Pre-trained Models

Natural Language Processing Author: Zhang Junlin Source: Deep Learning Frontier Notes Zhihu Column Original Link: https://zhuanlan.zhihu.com/p/49271699 The theme of this article is the pre-training process in natural language processing (NLP). It will roughly explain how pre-training techniques in NLP have gradually developed into the BERT model, naturally illustrating how the ideas behind BERT were formed, … Read more

In-Depth Analysis of LLAMA3 Paper

In-Depth Analysis of LLAMA3 Paper

Introduction Recently, while reviewing the papers I had previously studied in depth, I found that some notes were still very valuable. I made some minor adjustments and am publishing them for everyone to see. LLama3 is a paper from a few months ago, but each reading still brings new insights. This article discusses key points, … Read more