Overview of Qwen Series Technology 1 – The Evolution of Qwen

Overview of Qwen Series Technology 1 - The Evolution of Qwen

Introduction The moon of ancient times is unseen by people today, yet this month once shone upon the ancients. Hello everyone, I am the little girl selling hot dry noodles. I am very glad to share cutting-edge technologies and thoughts in the field of artificial intelligence with my friends. With the rapid development of Large … Read more

Key Details of Qwen MoE: Enhancing Model Performance Through Global Load Balancing

Key Details of Qwen MoE: Enhancing Model Performance Through Global Load Balancing

Today, we share with you the latest paper from Alibaba Cloud Tongyi Qianwen team – Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models (Original paper link: https://arxiv.org/abs/2501.11873) This paper focuses on improving the training method of Mixture-of-Experts (MoEs) by relaxing local balance to global balance through lightweight communication, significantly … Read more

Qwen2.5-VL: Alibaba’s Latest Open Source Visual Language Model

Qwen2.5-VL: Alibaba's Latest Open Source Visual Language Model

🚀 Quick Read Model Introduction: Qwen2.5-VL is the flagship open-source visual language model from Alibaba’s Tongyi Qianwen team, available in three different sizes: 3B, 7B, and 72B. Main Features: Supports visual understanding, long video processing, structured output, and device operation. Technical Principles: Utilizes a series structure of ViT and Qwen2, supports multi-modal rotary position encoding … Read more

Qwen’s Year-End Gift: Enhancing MoE Training Efficiency

Qwen's Year-End Gift: Enhancing MoE Training Efficiency

Click the top to follow me Before reading this article, we sincerely invite you to click the “Follow” button, so that we can conveniently push similar articles to you in the future, and also facilitate your discussions and sharing. Your support is our motivation to keep creating~ Today, we will learn about a powerful technology … Read more

Qwen Series Technical Interpretation 3 – Architecture

Qwen Series Technical Interpretation 3 - Architecture

Shadows slant across the shallow water, a faint fragrance drifts in the moonlight at dusk. Hello everyone, I am the little girl selling hot dry noodles. I am very happy to share cutting-edge technology and thoughts in the field of artificial intelligence with my friends. Following the previous shares in the same series: Qwen Series … Read more

Understanding Qwen2.5 Technical Report: 18 Trillion Token Training

Understanding Qwen2.5 Technical Report: 18 Trillion Token Training

Introduction The development of large language models (LLMs) is advancing rapidly, with each major update potentially bringing significant improvements in performance and extending application scenarios. In this context, the latest Qwen2.5 series models released by Alibaba have garnered widespread attention. This technical report provides a detailed overview of the development process, innovations, and performance of … Read more

Bridging Virtual and Reality: AI Empowering the Future

Bridging Virtual and Reality: AI Empowering the Future

BUMBLE We are moving towards a world where we will see many robots capable of performing complex multi-step tasks at home and in other environments, but so far, we haven’t seen many attempts to truly accomplish this in open vocabulary tasks. Now, we have BUMBLE, which has over 90 hours of evaluation and user research! … Read more

Goodbye Baidu! Creating My Own Custom URL Navigation Page with AI

Goodbye Baidu! Creating My Own Custom URL Navigation Page with AI

There’s something a bit hard to say. Actually, I’ve been using Baidu’s homepage navigation feature for over a decade, which is this thing: Image from the internet Because hao123 is too flashy, and as a long-time Tieba user, using Baidu’s navigation is indeed more convenient. After years of using it, I have accumulated over a … Read more

Testing OpenAI Operator: Browser Automation Beyond Previous SOTA

Testing OpenAI Operator: Browser Automation Beyond Previous SOTA

Still copying and pasting manually? Are you still being tortured by tedious online tasks? The latest release from OpenAI, Operator, will completely revolutionize your work style! Today, let’s witness the powerful capabilities of this AI entity and see how it easily handles various complex tasks, boosting your efficiency! Hello everyone, I am Kate, welcome to … Read more

Assisting Program Development with DeepSeek

Assisting Program Development with DeepSeek

Concept DeepSeek: An AI model released by a company in Hangzhou, proficient in code generation. Official website: https://www.deepseek.com LLama: An AI model released by Meta Copilot: A code generation tool released by GitHub Roo-Cline: A plugin for using AI modelsConfiguration Log in to the DeepSeek official website, apply for API keys, and then configure in … Read more