76 Minutes to Train BERT! Google’s Brain New Optimizer LAMB Accelerates Large Batch Training
Selected from arXiv Authors: Yang You, Jing Li, et al. Editor: Machine Heart Editorial Team Last year, Google released the large-scale pre-trained language model BERT based on the bidirectional Transformer and made it open-source. The model has a large number of parameters—300 million—and requires a long training time. Recently, researchers from Google Brain proposed a … Read more