Continuous Progress: Overview of New Features in TensorFlow 2.4!

Continuous Progress: Overview of New Features in TensorFlow 2.4!

By / Goldie Gadde and Nikita Namjoshi, TensorFlow TensorFlow 2.4 has been officially released! With increased support for distributed training and mixed precision, along with the introduction of a new NumPy frontend and tools for monitoring and diagnosing performance bottlenecks, this version highlights new features and enhancements in performance and scalability. New Features of tf.distribute … Read more

Summary of Multi-GPU Parallel Training with PyTorch

Summary of Multi-GPU Parallel Training with PyTorch

Why Use Multi-GPU Parallel Training In simple terms, there are two reasons: the first is that a model cannot fit on a single GPU, but can run completely on two or more GPUs (like the early AlexNet). The second is that parallel computation across multiple GPUs can speed up training. To become a “master alchemist”, … Read more

17 Ways To Speed Up PyTorch Training

17 Ways To Speed Up PyTorch Training

Reprinted from: Machine Heart Master these 17 methods to accelerate your PyTorch deep learning training in the most effortless way. Recently, a post on Reddit has gone viral. The topic is about how to speed up PyTorch training. The original author is Lorenz Kuhn, a master’s student in computer science at ETH Zurich, who introduces … Read more

Teaching You to Implement PyTorch Operators with CUDA

Introduction CUDA (Compute Unified Device Architecture) is a general parallel computing architecture launched by NVIDIA, enabling GPUs to solve complex computational problems. Developers can use C language to write programs for the CUDA architecture, which can run at ultra-high performance on CUDA-supported processors. Editor | Heart of Autonomous Driving Author | Yuppie@Zhihu Link | https://zhuanlan.zhihu.com/p/595851188 … Read more

Shanghai Jiao Tong University: Accelerating LSTM Training Based on Approximate Random Dropout

Shanghai Jiao Tong University: Accelerating LSTM Training Based on Approximate Random Dropout

Machine Heart Release Authors: Song Zhuoran, Wang Ru, Ru Dongyu, Peng Zhenghao, Jiang Li Shanghai Jiao Tong University In this article, the authors utilize the Dropout method to generate a large amount of sparsity during the neural network training process for acceleration. This paper has been accepted by the Design Automation and Test in Europe … Read more

How AI Large Models with Trillions of Parameters Are Built? Discussing Four Strategies of Parallel Computing

How AI Large Models with Trillions of Parameters Are Built? Discussing Four Strategies of Parallel Computing

Source: OneFlow This article is approximately 3,611 words long and suggests a reading time of 7 minutes. This article introduces the construction of AI large models and the four strategies of parallel computing. Many recent advancements in the field of AI revolve around large-scale neural networks, but training these large-scale neural networks is a daunting … Read more

Decoding Performance Issues in Large Model Training, Fine-tuning, and Inference

Decoding Performance Issues in Large Model Training, Fine-tuning, and Inference

Source: Shi Zhi AI wisemodel This article contains 3335 words, and it is recommended to read in 7 minutes. This article introduces the benchmark tests conducted by the research teams from the Hong Kong University of Science and Technology and Beijing DaMo Technology on the performance of different sizes of LLMs across various GPU platforms. … Read more

Why Is the 4090 Much Faster Than the A100?

Why Is the 4090 Much Faster Than the A100?

Click on the above “Beginner Learning Visuals“, select to add “Star” or “Top“ Important information delivered at the first time Author: Li Bojie @ Zhihu PhD in Computer Science from USTC and MSRA, Huawei Genius This is a good question. First, let’s state the conclusion: the 4090 is not suitable for training large models, but … Read more

Get Your GPU Ready for Deep Learning (With Code)

Get Your GPU Ready for Deep Learning (With Code)

Author: Saurabh Bodhe Translator: Chen Zhendong Proofreader: Che Qianzi This article is approximately 1000 words, suggested reading time is 5 minutes. This article discusses a tutorial on setting up a GPU-based TensorFlow platform using NVIDIA’s official tools. “Building Deep Learning on Google Cloud Platform”I know that building a high-end deep learning system based on GPU … Read more

Unexpected Results of Technological Evolution: How Games and Cryptocurrency Became AI’s “Computing Power Base”?

Unexpected Results of Technological Evolution: How Games and Cryptocurrency Became AI's "Computing Power Base"?

In the recently passed spring, we witnessed the largest tech carnival of the new century. Describing the development of artificial intelligence (AI) in the past few months as “springing up like bamboo shoots after a rain” would be too conservative; “big bang” might be a more appropriate description — even Lu Qi, the former president … Read more