Speeding Up PyTorch by Four Times: Enhancing DALI Utilization and Creating CPU-Based Pipelines

Speeding Up PyTorch by Four Times: Enhancing DALI Utilization and Creating CPU-Based Pipelines

Big Data Digest Production Source:Medium Compiled by:Zhao Jike In recent years, there have been significant advancements in deep learning hardware, with Nvidia’s latest products, the Tesla V100 and Geforce RTX series, featuring dedicated tensor cores designed to accelerate common operations in neural networks. Notably, the V100 has sufficient capability to train neural networks at thousands … Read more

17 Methods To Speed Up PyTorch Training

17 Methods To Speed Up PyTorch Training

Selected from efficientdl.com Author: LORENZ KUHN Translated by: Machine Heart Editor: Chen Ping Master these 17 methods to accelerate your PyTorch deep learning training with minimal effort. Recently, a post on Reddit gained immense popularity. The topic was about how to speed up PyTorch training. The original author is LORENZ KUHN, a master’s student in … Read more

Building and Training Deep Learning Models with PyTorch

Building and Training Deep Learning Models with PyTorch

PyTorch occupies an important position in the field of deep learning. In real life, it is widely used in various areas such as image recognition and natural language processing. For example, in medical image diagnosis, models built with PyTorch can quickly and accurately identify lesions in X-ray and CT images; in intelligent customer service systems, … Read more

Exploring NVIDIA Blackwell GPU Features Beyond Neural Rendering

Exploring NVIDIA Blackwell GPU Features Beyond Neural Rendering

During CES 2025, NVIDIA unveiled the GPU based on the Blackwell architecture and showcased the performance and features of NVIDIA RTX AI technology at its Editor’s Day event. Subsequently, NVIDIA held a further communication sharing session in Shenzhen, detailing the Blackwell architecture GPU and its functionalities. So, what other aspects are worth our in-depth exploration? … Read more

Colossal-AI: Reducing AIGC Training Costs Significantly

Colossal-AI: Reducing AIGC Training Costs Significantly

Machine Heart released Machine Heart Editorial Team How to better, faster, and cheaper achieve training and fine-tuning of AIGC models has become the biggest pain point for the commercialization and explosive application of AIGC. Colossal-AI, based on its professional technical accumulation in democratizing large models,open-sourced a complete Stable Diffusion pre-training and personalization fine-tuning solution, accelerating … Read more

Cost-Saving Techniques in DeepSeek: Unveiling the Secrets

Cost-Saving Techniques in DeepSeek: Unveiling the Secrets

Tencent Technology “AI Future Guide” Special Contributor: Hao Boyang Editor: Zheng Kejun No GPU Poor, Only Not Enough Squeeze. The launch of DeepSeek-V3 perfectly illustrates this statement with a set of astonishing data. While models like O1, Claude, Gemini, and Llama 3 struggle with billions in training costs, DeepSeek-V3 achieved performance on par with them … Read more