Contextual Word Vectors and Pre-trained Language Models: From BERT to T5

Contextual Word Vectors and Pre-trained Language Models: From BERT to T5

[Introduction] The emergence of BERT has revolutionized the model architecture paradigm in many natural language processing tasks. As a representative of pre-trained language models (PLM), BERT has refreshed leaderboards in multiple tasks, attracting significant attention from both academia and industry. Stanford University’s classic natural language processing course, CS224N, invited the first author of BERT, Google … Read more

NVIDIA’s 50-Minute BERT Training: Beyond Just GPUs

NVIDIA's 50-Minute BERT Training: Beyond Just GPUs

Selected from arXiv Author:Mohammad Shoeybi et al. Translated by Machine Heart Contributors:Mo Wang Previously, Machine Heart introduced a study by NVIDIA that broke three records in the NLP field: reducing BERT’s training time to 53 minutes; reducing BERT’s inference time to 2.2 milliseconds; and increasing the parameter count of GPT-2 to 8 billion (previously, GPT-2 … Read more

Amazon: We Extracted an Optimal BERT Subarchitecture, 16% of Bert-Large, 7x CPU Inference Speedup

Amazon: We Extracted an Optimal BERT Subarchitecture, 16% of Bert-Large, 7x CPU Inference Speedup

Selected from arXiv Authors: Adrian de Wynter, Daniel J. Perry Translated by Machine Heart Machine Heart Editorial Team Extracting BERT subarchitectures is a highly worthwhile topic, but existing research has shortcomings in subarchitecture accuracy and selection. Recently, researchers from the Amazon Alexa team refined the process of extracting BERT subarchitectures and extracted an optimal subarchitecture … Read more

Can Embedded Vectors Understand Numbers? BERT vs. ELMo

Can Embedded Vectors Understand Numbers? BERT vs. ELMo

Selected from arXiv Authors:Eric Wallace et al. Translation by Machine Heart Contributors:Mo Wang Performing numerical reasoning on natural language text is a long-standing challenge for end-to-end models. Researchers from the Allen Institute for AI, Peking University, and the University of California, Irvine, attempt to explore whether “out-of-the-box” neural NLP models can solve this problem, and … Read more

BERT Implementation in PyTorch: A Comprehensive Guide

BERT Implementation in PyTorch: A Comprehensive Guide

Selected from GitHub Author: Junseong Kim Translated by Machine Heart Contributors: Lu Xue, Zhang Qian Recently, Google AI published an NLP paper introducing a new language representation model, BERT, which is considered the strongest pre-trained NLP model, setting new state-of-the-art performance records on 11 NLP tasks. Today, Machine Heart discovered a PyTorch implementation of BERT … Read more

Redefining NLP Rules: From Word2Vec and ELMo to BERT

Redefining NLP Rules: From Word2Vec and ELMo to BERT

Introduction Remember not long ago in the field of machine reading comprehension, where Microsoft and Alibaba surpassed humans on SQuAD with R-Net+ and SLQA respectively, and Baidu topped the MS MARCO leaderboard with V-Net while exceeding human performance on BLEU? These networks can be said to be increasingly complex, and it seems that the research … Read more

Can NLP Work Like the Human Brain? Insights from CMU and MIT

Can NLP Work Like the Human Brain? Insights from CMU and MIT

Analyst Network of Machine Heart Analyst: Wu Jiying Editor:Joni Zhong As an important research topic in the fields of computer science and artificial intelligence, Natural Language Processing (NLP) has been extensively studied and discussed across various domains. With the deepening of research, some scholars have begun to explore whether there are connections between natural language … Read more

Detailed Explanation of HuggingFace BERT Source Code

Detailed Explanation of HuggingFace BERT Source Code

Follow the official account “ML_NLP“ Set as “Starred“, heavy content delivered first-hand! Reprinted from | PaperWeekly ©PaperWeekly Original · Author | Li Luoqiu School | Master’s Student at Zhejiang University Research Direction | Natural Language Processing, Knowledge Graphs This article records my understanding of the code in the HuggingFace open-source Transformers project. As we all … Read more

BERT Lightweight: Optimal Parameter Subset Bort at 16% Size

BERT Lightweight: Optimal Parameter Subset Bort at 16% Size

Zheng Jiyang from Aofeisi QbitAI Report | WeChat Official Account QbitAI Recently, the Amazon Alexa team released a research achievement: researchers performed parameter selection on the BERT model, obtaining the optimal parameter subset of BERT—Bort. The research results indicate that Bort is only 16% the size of BERT-large, but its speed on CPU is 7.9 … Read more

EdgeBERT: Limit Compression, 13 Times Lighter Than ALBERT!

EdgeBERT: Limit Compression, 13 Times Lighter Than ALBERT!

Machine Heart Reprint Source: Xixiaoyao’s Cute Selling House Author: Sheryc_Wang Su There are two types of highly challenging engineering projects in this world: the first is to maximize something very ordinary, like expanding a language model to write poetry, prose, and code like GPT-3; while the other is exactly the opposite, to minimize something very … Read more