Experience the Cloud Deployment of Qwen2.5 in 5 Minutes

Qwen2.5 is a large-scale language and multimodal model developed by the Tongyi Qianwen team. With its advantages in long text processing, knowledge integration, large-scale dataset pre-training, and multilingual processing, it provides users with quick and accurate responses, becoming an effective tool for enterprise intelligence transformation.

Deploying the Qwen2.5 model on Function Compute FC allows users to adjust resource configurations according to business needs, effectively handling high-concurrency scenarios. By optimizing resource allocation, such as adjusting instance specifications, multi-GPU deployment, and model quantization, inference speed can be improved. Additionally, Function Compute supports diverse GPU billing models (pay-as-you-go, tiered pricing, and ultra-fast mode), which can be adjusted based on business needs, significantly reducing overall costs when facing high-frequency requests and large-scale data processing.

Prize Experience Ongoing!

Use Function Compute FC to deploy the Qwen2.5 model with one click, complete the Ollama and Open WebUI applications, and win a couplet for the Year of the Snake in two steps!

Event Time:2024.12.23 00:00:00-2025.01.10 16:00:00

Experience Now:https://developer.aliyun.com/topic/dec/fcqwen

Applicable Customers

Cloud Native

Customers who have high requirements for deep understanding of AI, multi-domain knowledge integration, efficient instruction execution, and multilingual support.
Enterprises that expect to ensure efficient model inference and low-latency responses through controllable cloud service resource configurations.

Products Used

Cloud Native

Function Compute FC

Solution Overview

Cloud Native

This solution aims to introduce how to deploy the Qwen2.5 open-source model on Function Compute FC. It deploys two applications, Ollama and Open WebUI, through Function Compute FC. Ollama is an application responsible for hosting the Qwen2.5 model, providing users with a variety of model size choices, and offering a friendly interactive interface through Open WebUI, enabling users to easily deploy and use AI models. The solution provides images pre-configured with three different parameter sizes: 1.5B, 3B, and 7B for users to choose from, while Open WebUI offers a user-friendly interactive interface.

With Function Compute FC, users can quickly and conveniently deploy models without worrying about underlying resource management and operational issues, allowing them to focus on application innovation and development. Function Compute FC provides a maintenance-free efficient development environment, with elastic scaling and high availability, and uses a pay-as-you-go model to effectively reduce resource idle costs.

During the actual deployment, you can adjust some configurations based on specific resource planning, but the final operating environment will be similar to the architecture shown in the figure below.

Experience the Cloud Deployment of Qwen2.5 in 5 Minutes

The technical architecture of this solution includes the following cloud services:

Function Compute FC: Fully managed serverless computing service for deploying model services and web applications.

Deploying the Model

Cloud Native

Deploying the Qwen2.5 Model Based on Ollama

We will deploy the Ollama application to provide model services. Ollama is a convenient model deployment and management tool that helps developers efficiently host and service models, facilitating quick integration of AI capabilities.

1. Click the Ollama template link^[^1], visit the create Ollama application page.

2. The current application template provides versions of the Qwen2.5 model with parameter sizes of 1.5B, 3B, and 7B, which can be selected as needed from the Model Name dropdown list.

Experience the Cloud Deployment of Qwen2.5 in 5 Minutes

3. The remaining configuration items can be left unchanged, click the Create and Deploy Default Environment button to deploy the application. Once the application is successfully deployed, it will be as shown in the figure below.

Open WebUI Calls the Qwen2.5 Model

Open WebUI is an open-source project that provides a graphical interface for managing and operating models.

1. Click the Open WebUI template link^[^2], visit the create Open WebUI application page.

2. In the Advanced Configuration > Region dropdown list, select the region where the application belongs.

Warning: Ensure that the selected region is consistent with the region selected when creating the Ollama application.

3. The application template provides an option to Enable Authentication, which is recommended to be turned on in production environments to increase security and prevent unauthorized access.

Experience the Cloud Deployment of Qwen2.5 in 5 Minutes

4. Fill in the Ollama interface address with the internal access address of the Ollama application.

Experience the Cloud Deployment of Qwen2.5 in 5 Minutes

Note: How to obtain the internal access address of the Ollama application:

a. Open the Function Compute FC application^[^3] page, find the Ollama application, and click on the application name to enter the application details.

b. In the application details page, find the Function Resources, and click on the function name to enter the function details page.

c. Hover over HTTP Trigger, and copy the Internal Access Address from the expanded information.

Experience the Cloud Deployment of Qwen2.5 in 5 Minutes

5. The remaining configuration items can be left unchanged, click the Create and Deploy Default Environment button to deploy the application. Once the application is successfully deployed, it will be as shown in the figure below.

Experience the Cloud Deployment of Qwen2.5 in 5 Minutes

6. After the Open WebUI application is deployed, click the Access Domain Name.

Experience the Cloud Deployment of Qwen2.5 in 5 Minutes

7. The first time you open it, you need to complete the instance creation and model deployment, please wait for 3-5 minutes.

Application Experience

Cloud Native

1. Using Open WebUI to Call the Qwen2.5 Model

1. After logging into Open WebUI, click Select a model, and select the Qwen2.5 model from the expanded dropdown list. If the available models do not appear in the dropdown list, please try refreshing the page to update the list.

Experience the Cloud Deployment of Qwen2.5 in 5 Minutes

2. You can interact with the system in the dialog box to call the model service and obtain corresponding responses.

Experience the Cloud Deployment of Qwen2.5 in 5 Minutes

3. The Qwen2.5 model is capable of supporting over 29 languages, for example, it can introduce itself in French.

Experience the Cloud Deployment of Qwen2.5 in 5 Minutes

4. Due to the integration of domain expert models, the breadth of knowledge and capabilities of Qwen2.5 in coding and mathematics have significantly improved. We can pose a math problem, and Qwen2.5 can provide the correct answer.

Experience the Cloud Deployment of Qwen2.5 in 5 Minutes

5. Click Experience the Cloud Deployment of Qwen2.5 in 5 Minutes the icon to select and upload local documents, you can use the provided “Bailian” mobile detailed parameters.docx^[^4].

Experience the Cloud Deployment of Qwen2.5 in 5 Minutes

6. For the current document, write the prompt “Summarize the document content”.

Experience the Cloud Deployment of Qwen2.5 in 5 Minutes

7. You can see that the Qwen2.5 model successfully extracted the key information from the document.

Experience the Cloud Deployment of Qwen2.5 in 5 Minutes

Note: Open WebUI Chinese Settings:

1. Click the icon in the upper right corner Experience the Cloud Deployment of Qwen2.5 in 5 Minutes and select Settings from the dropdown list.

Experience the Cloud Deployment of Qwen2.5 in 5 Minutes

2. In the Settings popup, select General > Language.

Experience the Cloud Deployment of Qwen2.5 in 5 Minutes

3. In the expanded dropdown list, find and click Chinese (Simplified).

Experience the Cloud Deployment of Qwen2.5 in 5 Minutes

4. The page will automatically refresh, and at this point, the interface language of Open WebUI has switched to Simplified Chinese. Click the Save button to close the popup.

2. Function Compute FC Auto Scaling Mechanism

1. Go back to the Ollama function details page, click on the Instances tab. If the current instance list is empty, you can click Experience the Cloud Deployment of Qwen2.5 in 5 Minutes the icon to refresh the list.

Experience the Cloud Deployment of Qwen2.5 in 5 Minutes

2. In the instance list, you can see that the number of Ollama function instances has changed. This is because Function Compute FC automatically scales based on the function call volume. When calls increase, instances are created; when requests decrease and instances are idle for a certain period (usually 3-5 minutes), they are automatically destroyed to save resources. This dynamic scaling mechanism not only improves resource utilization but also reduces operating costs, allowing developers to focus on business logic without worrying about managing the underlying infrastructure.

Related Links:

[1] Ollama Template Link

https://fcnext.console.aliyun.com/applications/create?template=ollama-qwen2_5&deployType=template-direct&from=solution

[2] Open WebUI Template Link

https://fcnext.console.aliyun.com/applications/create?template=fc-open-webui&deployType=template-direct

[3] Function Compute FC Application

https://fcnext.console.aliyun.com/applications

[4] “Bailian” Mobile Detailed Parameters.docx

https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240701/geijms/%E7%99%BE%E7%82%BC%E7%B3%BB%E5%88%97%E6%89%8B%E6%9C%BA%E4%BA%A7%E5%93%81%E4%BB%8B%E7%BB%8D.docx

Leave a Comment Cancel reply