76 Minutes to Train BERT! Google’s Brain New Optimizer LAMB Accelerates Large Batch Training

76 Minutes to Train BERT! Google's Brain New Optimizer LAMB Accelerates Large Batch Training

Selected from arXiv Authors: Yang You, Jing Li, et al. Editor: Machine Heart Editorial Team Last year, Google released the large-scale pre-trained language model BERT based on the bidirectional Transformer and made it open-source. The model has a large number of parameters—300 million—and requires a long training time. Recently, researchers from Google Brain proposed a … Read more