Contextual Word Vectors and Pre-trained Language Models: From BERT to T5

Contextual Word Vectors and Pre-trained Language Models: From BERT to T5

[Introduction] The emergence of BERT has revolutionized the model architecture paradigm in many natural language processing tasks. As a representative of pre-trained language models (PLM), BERT has refreshed leaderboards in multiple tasks, attracting significant attention from both academia and industry. Stanford University’s classic natural language processing course, CS224N, invited the first author of BERT, Google … Read more

Can Embedded Vectors Understand Numbers? BERT vs. ELMo

Can Embedded Vectors Understand Numbers? BERT vs. ELMo

Selected from arXiv Authors:Eric Wallace et al. Translation by Machine Heart Contributors:Mo Wang Performing numerical reasoning on natural language text is a long-standing challenge for end-to-end models. Researchers from the Allen Institute for AI, Peking University, and the University of California, Irvine, attempt to explore whether “out-of-the-box” neural NLP models can solve this problem, and … Read more

BERT Implementation in PyTorch: A Comprehensive Guide

BERT Implementation in PyTorch: A Comprehensive Guide

Selected from GitHub Author: Junseong Kim Translated by Machine Heart Contributors: Lu Xue, Zhang Qian Recently, Google AI published an NLP paper introducing a new language representation model, BERT, which is considered the strongest pre-trained NLP model, setting new state-of-the-art performance records on 11 NLP tasks. Today, Machine Heart discovered a PyTorch implementation of BERT … Read more

Understanding Qwen1.5 MoE: Efficient Intelligence of Sparse Large Models

Understanding Qwen1.5 MoE: Efficient Intelligence of Sparse Large Models

Introduction Official Documentation: Qwen1.5-MoE: Achieving the Performance of 7B Models with 1/3 Activation Parameters | Qwen On March 28, Alibaba announced the open-source MoE technology large model Qwen1.5-MoE-A2.7B for the first time. This model is based on the existing Qwen-1.8B model. The activation parameters of Qwen1.5-MoE-A2.7B are 270 million, but it can achieve the performance … Read more

QWen1.5: The Path to Excellence in Models

QWen1.5: The Path to Excellence in Models

Introduction In the article about the upgrade path of QWen, we deeply explored the optimization process of the Qianwen model. The new version, QWen1.5, has made further improvements compared to the previous version. This article will continue to analyze the reasons behind the impressive performance of the new QWen1.5 model. The structure of the article … Read more

How Effective Is Tongyi Qwen-7B? Firefly Fine-Tuning Practice Shows Great Results

How Effective Is Tongyi Qwen-7B? Firefly Fine-Tuning Practice Shows Great Results

01 Introduction On August 3, Alibaba Cloud released its first open-source large model: Tongyi Qwen-7B, which is open-source and commercially usable. Although everyone has been raised expectations with various hundred-billion parameter models, the fact that it is produced by Alibaba has attracted widespread attention and discussion among peers, and it has performed excellently on various … Read more

In-Depth Study of Qwen 2.5 Paper

In-Depth Study of Qwen 2.5 Paper

Introduction I must say, Qwen is really impressive. It seems that its foundational capabilities have firmly established it as the leader in open source, and it is not at all inferior compared to most closed sources. Many companies’ foundational teams are likely already being judged on the significance of foundational models. Qwen’s open-source momentum is … Read more

Qwen Technical Report Details Sharing

Qwen Technical Report Details Sharing

Introduction Alibaba open-sourced the Qwen-7B model a long time ago, but for some reason, it was taken down. Just yesterday, Alibaba re-open-sourced the Qwen-14B model (the original 7B model was also released), and simultaneously released the technical report on Qwen. Today, I would like to share this with everyone. PS: Now domestic open-source large models … Read more

Understanding Alibaba’s Qwen Model and Local Deployment

Understanding Alibaba's Qwen Model and Local Deployment

Introduction Overview Pre-training Data Sources Pre-processing Tokenization Model Design Extrapolation Capability Model Training Experimental Results Deployment Testing Alignment Supervised Fine-tuning (SFT) RM Model Reinforcement Learning Alignment Results (Automatic and Human Evaluation) Automatic Evaluation Human Evaluation Deployment Testing Conclusion Introduction This article mainly introduces the Chinese large model Alibaba Qwen, specifically including model details interpretation and … Read more

Interpretation of Qwen2.5 Technical Report

Interpretation of Qwen2.5 Technical Report

MLNLP community is a well-known machine learning and natural language processing community both domestically and internationally, covering NLP master’s and doctoral students, university professors, and corporate researchers. The vision of the community is to promote communication and progress between the academic and industrial circles of natural language processing and machine learning, especially for the advancement … Read more