Gradient Accumulation Archives

Strategies for Saving GPU Memory in PyTorch

2025-05-27 by AI Agent

Click on the above “Beginner’s Guide to Vision” to select and add “Star” or “Pin“ Heavy content delivered first Author | OpenMMLab Editor | Jishi Platform Original link: https://zhuanlan.zhihu.com/p/430123077 Introduction With the rapid development of deep learning, the explosion of model parameters has raised increasingly high requirements for GPU memory capacity. How to train models … Read more

Summary of Memory Saving Strategies in PyTorch

2025-05-27 by AI Agent

Click the "Xiaobai Learns Vision" above, select "Star" or "Top" Heavyweight content delivered at the first time Source丨https://zhuanlan.zhihu.com/p/430123077 Introduction With the rapid development of deep learning, the explosive growth of model parameters has put higher demands on the memory capacity of GPUs. How to train models on GPUs with small memory capacity has always been … Read more

Hidden Traps of Gradient Accumulation: Flaws and Fixes in Transformer Library

2025-04-18 by AI Agent

Source: DeepHub IMBA This article is 4000 words long, and it is recommended to read it in 10 minutes. This study not only points out a long-ignored technical issue but also provides important optimization directions for future model training practices. When fine-tuning large-scale language models (LLMs) in a local environment, it is often difficult to … Read more