Automate Writing Spring Festival Couplets with VSCode and DeepSeek-R1
Introduction We introduced our first-generation reasoning model DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero is a model trained through large-scale Reinforcement Learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrating exceptional performance in reasoning. With the emergence of RL, DeepSeek-R1-Zero naturally exhibited many powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero faced challenges such as endless repetition, … Read more