Beyond ReLU: The GELU Activation Function in BERT and GPT-2

Beyond ReLU: The GELU Activation Function in BERT and GPT-2

Reported by Machine Heart Machine Heart Editorial Team At least in the field of NLP, GELU has become the choice of many industry-leading models. As the “switch” that determines whether a neural network transmits information, the activation function is crucial for neural networks. However, is the ReLU commonly used today really the most efficient method? … Read more

76 Minutes to Train BERT! Google’s Brain New Optimizer LAMB Accelerates Large Batch Training

76 Minutes to Train BERT! Google's Brain New Optimizer LAMB Accelerates Large Batch Training

Selected from arXiv Authors: Yang You, Jing Li, et al. Editor: Machine Heart Editorial Team Last year, Google released the large-scale pre-trained language model BERT based on the bidirectional Transformer and made it open-source. The model has a large number of parameters—300 million—and requires a long training time. Recently, researchers from Google Brain proposed a … Read more

Contextual Word Vectors and Pre-trained Language Models: From BERT to T5

Contextual Word Vectors and Pre-trained Language Models: From BERT to T5

[Introduction] The emergence of BERT has revolutionized the model architecture paradigm in many natural language processing tasks. As a representative of pre-trained language models (PLM), BERT has refreshed leaderboards in multiple tasks, attracting significant attention from both academia and industry. Stanford University’s classic natural language processing course, CS224N, invited the first author of BERT, Google … Read more

Choosing Between BERT, RoBERTa, DistilBERT, and XLNet

Choosing Between BERT, RoBERTa, DistilBERT, and XLNet

Planning | Liu Yan Author | Suleiman Khan Translation | Nuclear Cola Editor | Linda AI Frontline Overview: Google BERT and other transformer-based models have recently swept the entire NLP field, significantly surpassing previous state-of-the-art solutions in various tasks. Recently, Google has made several improvements to BERT, leading to a series of impressive enhancements. In … Read more

NVIDIA’s 50-Minute BERT Training: Beyond Just GPUs

NVIDIA's 50-Minute BERT Training: Beyond Just GPUs

Selected from arXiv Author:Mohammad Shoeybi et al. Translated by Machine Heart Contributors:Mo Wang Previously, Machine Heart introduced a study by NVIDIA that broke three records in the NLP field: reducing BERT’s training time to 53 minutes; reducing BERT’s inference time to 2.2 milliseconds; and increasing the parameter count of GPT-2 to 8 billion (previously, GPT-2 … Read more

Amazon: We Extracted an Optimal BERT Subarchitecture, 16% of Bert-Large, 7x CPU Inference Speedup

Amazon: We Extracted an Optimal BERT Subarchitecture, 16% of Bert-Large, 7x CPU Inference Speedup

Selected from arXiv Authors: Adrian de Wynter, Daniel J. Perry Translated by Machine Heart Machine Heart Editorial Team Extracting BERT subarchitectures is a highly worthwhile topic, but existing research has shortcomings in subarchitecture accuracy and selection. Recently, researchers from the Amazon Alexa team refined the process of extracting BERT subarchitectures and extracted an optimal subarchitecture … Read more

Can Embedded Vectors Understand Numbers? BERT vs. ELMo

Can Embedded Vectors Understand Numbers? BERT vs. ELMo

Selected from arXiv Authors:Eric Wallace et al. Translation by Machine Heart Contributors:Mo Wang Performing numerical reasoning on natural language text is a long-standing challenge for end-to-end models. Researchers from the Allen Institute for AI, Peking University, and the University of California, Irvine, attempt to explore whether “out-of-the-box” neural NLP models can solve this problem, and … Read more

BERT Implementation in PyTorch: A Comprehensive Guide

BERT Implementation in PyTorch: A Comprehensive Guide

Selected from GitHub Author: Junseong Kim Translated by Machine Heart Contributors: Lu Xue, Zhang Qian Recently, Google AI published an NLP paper introducing a new language representation model, BERT, which is considered the strongest pre-trained NLP model, setting new state-of-the-art performance records on 11 NLP tasks. Today, Machine Heart discovered a PyTorch implementation of BERT … Read more

Redefining NLP Rules: From Word2Vec and ELMo to BERT

Redefining NLP Rules: From Word2Vec and ELMo to BERT

Introduction Remember not long ago in the field of machine reading comprehension, where Microsoft and Alibaba surpassed humans on SQuAD with R-Net+ and SLQA respectively, and Baidu topped the MS MARCO leaderboard with V-Net while exceeding human performance on BLEU? These networks can be said to be increasingly complex, and it seems that the research … Read more

Can NLP Work Like the Human Brain? Insights from CMU and MIT

Can NLP Work Like the Human Brain? Insights from CMU and MIT

Analyst Network of Machine Heart Analyst: Wu Jiying Editor:Joni Zhong As an important research topic in the fields of computer science and artificial intelligence, Natural Language Processing (NLP) has been extensively studied and discussed across various domains. With the deepening of research, some scholars have begun to explore whether there are connections between natural language … Read more