Reviewing Progress and Insights on BERT Models

Reviewing Progress and Insights on BERT Models

Authorized Reprint from Microsoft Research AI Headlines Since BERT was published on arXiv, it has gained significant success and attention, opening the Pandora’s box of 2-Stage in NLP. Subsequently, a large number of pre-trained models similar to “BERT” have emerged, including the generalized autoregressive model XLNet that introduces bidirectional context information from BERT, as well … Read more

Training CT-BERT on COVID-19 Data from Twitter

Training CT-BERT on COVID-19 Data from Twitter

Big Data Digest authorized repost from Data Party THU Author: Chen Zhiyan Twitter has always been an important source of news, and during the COVID-19 pandemic, the public has been able to express their anxieties on Twitter. However, manually classifying, filtering, and summarizing the massive amount of COVID-19 information on Twitter is nearly impossible. This … Read more

Beyond ReLU: The GELU Activation Function in BERT and GPT-2

Beyond ReLU: The GELU Activation Function in BERT and GPT-2

Reported by Machine Heart Machine Heart Editorial Team At least in the field of NLP, GELU has become the choice of many industry-leading models. As the “switch” that determines whether a neural network transmits information, the activation function is crucial for neural networks. However, is the ReLU commonly used today really the most efficient method? … Read more

76 Minutes to Train BERT! Google’s Brain New Optimizer LAMB Accelerates Large Batch Training

76 Minutes to Train BERT! Google's Brain New Optimizer LAMB Accelerates Large Batch Training

Selected from arXiv Authors: Yang You, Jing Li, et al. Editor: Machine Heart Editorial Team Last year, Google released the large-scale pre-trained language model BERT based on the bidirectional Transformer and made it open-source. The model has a large number of parameters—300 million—and requires a long training time. Recently, researchers from Google Brain proposed a … Read more

Choosing Between BERT, RoBERTa, DistilBERT, and XLNet

Choosing Between BERT, RoBERTa, DistilBERT, and XLNet

Planning | Liu Yan Author | Suleiman Khan Translation | Nuclear Cola Editor | Linda AI Frontline Overview: Google BERT and other transformer-based models have recently swept the entire NLP field, significantly surpassing previous state-of-the-art solutions in various tasks. Recently, Google has made several improvements to BERT, leading to a series of impressive enhancements. In … Read more

Contextual Word Vectors and Pre-trained Language Models: From BERT to T5

Contextual Word Vectors and Pre-trained Language Models: From BERT to T5

[Introduction] The emergence of BERT has revolutionized the model architecture paradigm in many natural language processing tasks. As a representative of pre-trained language models (PLM), BERT has refreshed leaderboards in multiple tasks, attracting significant attention from both academia and industry. Stanford University’s classic natural language processing course, CS224N, invited the first author of BERT, Google … Read more

NVIDIA’s 50-Minute BERT Training: Beyond Just GPUs

NVIDIA's 50-Minute BERT Training: Beyond Just GPUs

Selected from arXiv Author:Mohammad Shoeybi et al. Translated by Machine Heart Contributors:Mo Wang Previously, Machine Heart introduced a study by NVIDIA that broke three records in the NLP field: reducing BERT’s training time to 53 minutes; reducing BERT’s inference time to 2.2 milliseconds; and increasing the parameter count of GPT-2 to 8 billion (previously, GPT-2 … Read more

Amazon: We Extracted an Optimal BERT Subarchitecture, 16% of Bert-Large, 7x CPU Inference Speedup

Amazon: We Extracted an Optimal BERT Subarchitecture, 16% of Bert-Large, 7x CPU Inference Speedup

Selected from arXiv Authors: Adrian de Wynter, Daniel J. Perry Translated by Machine Heart Machine Heart Editorial Team Extracting BERT subarchitectures is a highly worthwhile topic, but existing research has shortcomings in subarchitecture accuracy and selection. Recently, researchers from the Amazon Alexa team refined the process of extracting BERT subarchitectures and extracted an optimal subarchitecture … Read more

Can Embedded Vectors Understand Numbers? BERT vs. ELMo

Can Embedded Vectors Understand Numbers? BERT vs. ELMo

Selected from arXiv Authors:Eric Wallace et al. Translation by Machine Heart Contributors:Mo Wang Performing numerical reasoning on natural language text is a long-standing challenge for end-to-end models. Researchers from the Allen Institute for AI, Peking University, and the University of California, Irvine, attempt to explore whether “out-of-the-box” neural NLP models can solve this problem, and … Read more

BERT Implementation in PyTorch: A Comprehensive Guide

BERT Implementation in PyTorch: A Comprehensive Guide

Selected from GitHub Author: Junseong Kim Translated by Machine Heart Contributors: Lu Xue, Zhang Qian Recently, Google AI published an NLP paper introducing a new language representation model, BERT, which is considered the strongest pre-trained NLP model, setting new state-of-the-art performance records on 11 NLP tasks. Today, Machine Heart discovered a PyTorch implementation of BERT … Read more