Background: From May 23 to 24, the Tencent “Cloud + Future” summit was held in Guangzhou with the theme “Rejuvenation”, where leaders from various government agencies in Guangdong Province, domestic and foreign academic experts, industry leaders, and technical experts gathered to discuss innovation and development in cloud computing and the digital industry.
Dr. Wang Caihua, the technical head of Tencent Cloud AI platform, presented a technical talk titled “Smart Titanium・One-stop Machine Learning TI-ONE: Machine (Deep) Learning IDE on Tencent Cloud” at the “Developer Session” of the Tencent “Cloud + Future” summit.
Below is the full speech shared by the guest (with modifications):
Have you all seen Marvel’s “Avengers: Infinity War” recently?
The Iron Man suit is made of titanium, which is lightweight and strong, and TI-ONE is an artificial intelligence platform, so we chose the tech-savvy name “Smart Titanium” to describe it.
First, let’s talk about why we need TI-ONE?
The importance of artificial intelligence does not need further emphasis from me; Andrew Ng stated at the Spark Summit 2017 that “AI: The New Electricity”.
Major companies are competing to propose their own machine learning platforms, such as Microsoft’s CNTK, Google’s TensorFlow, and so on.
However, to answer the question of why we need TI-ONE, we must start from the characteristics of cloud computing and the lifecycle of machine learning.
In the cloud, we lean towards cloud services, infrastructure services, platform services, and algorithm services, and machine learning algorithms are no exception.
However, machine learning algorithms have a long lifecycle, from data acquisition to data preprocessing, then choosing a framework and writing algorithms, training to obtain a model, and finally using this model for predictions. In the cloud, we also need to service the model.
This long process is why we need to accelerate the lifecycle of machine learning and the service of models; this is why we need TI-ONE.
Specifically, TI-ONE provides the following features:
First, it integrates a data preprocessing platform to improve data preprocessing efficiency.
It supports mainstream machine learning frameworks, with commonly used algorithms built-in, allowing algorithm development to be completed through drag-and-drop.
It supports automated hyperparameter tuning, multi-level collaboration, one-click model deployment and servicing, and online inference.
In developer terms, TI-ONE is the machine learning IDE on Tencent Cloud.
What is TI-ONE?
I will share from several aspects: architecture, workflow, hyperparameter tuning, collaboration, and deployment.
TI-ONE has a hierarchical architecture, with the lowest layer being the COS storage layer, above which is the GaiaStack resource scheduling layer. GaiaStack gives TI-ONE many commercial features, which I will elaborate on later.
Above the scheduling layer is the computing framework layer, where we integrate TensorFlow, PyTorch, XGBoost, Angel, and Spark, with Angel being developed by Tencent and Spark being enhanced by Tencent.
In terms of algorithms, we have integrated a large number of commonly used algorithms, including deep learning algorithms such as CNN, RNN, DBN, as well as traditional machine learning algorithms such as GBDT, FFM, etc. Users can use these algorithms to train their own models, supporting applications such as image recognition, speech recognition, precise recommendations, and real-time risk control.
TI-ONE provides users with a graphical development interface, allowing machine learning algorithms to be developed through drag-and-drop. Here is an example:
-
Obtain data from the COS layer or local file system
-
Preprocess the data
-
Split the data, noting that this is splitting the data into training and validation sets, rather than a test set
-
Then select an algorithm through drag-and-drop, taking logistic regression as an example
-
Set the parameters required by the algorithm
-
Train to obtain the model
If you want to validate this algorithm, it’s also very simple, just:
-
Obtain data from the storage layer
-
Preprocess the data
-
Input into the model
-
Algorithm evaluation
Upon completion, a confusion matrix and AUC value will be provided.
Hyperparameter tuning is a crucial part of machine learning and is very skillful. TI-ONE provides an automated hyperparameter tuning tool, characterized by generating multiple instances through parameter combinations, then running these instances in parallel and selecting the best one from them.
For example, if you want to train a random forest, you need to decide the number of trees in the forest and the number of features required to train each tree; just provide a parameter combination and hand it to TI-ONE, which can help you select the best combination.
In other cases, we may need to tune some regularization hyperparameters; we just need to provide a range and hand it to TI-ONE, which can help us select the optimal parameters.
Collaboration is also very important for machine learning, and TI-ONE provides multiple levels of collaboration.
The first is model-level sharing; trained models can be shared with your colleagues. For example, if you are both developing algorithms for the same business and want to compare whose accuracy is higher, you can share the model with each other.
The second is workflow-level sharing, where the workflow represents the machine learning lifecycle. Sharing the workflow means sharing the entire machine learning lifecycle. Suppose you previously completed a skin recommendation task and later need to do an equipment recommendation task; you can essentially do it with minor modifications.
The third is service-level sharing; once the model is deployed, it can also be shared. You can share the model with backend personnel to help you troubleshoot issues.
Deployment and service are the differences between cloud-based machine learning and traditional learning.
TI-ONE provides a one-click deployment tool.
We can deploy the trained model as an Application, then load multiple instances, and allow different versions within a single instance.
Third-party users and model developers can call it using REST API, which is very convenient.
TI-ONE’s Differentiated Competitive Advantage
Earlier, we discussed the features of TI-ONE; developers will certainly want to know the design philosophy behind it.
I like to use the iceberg theory to explain the principles behind things; what we see in terms of workflow, hyperparameter tuning, collaboration, and deployment tools is just the tip of the iceberg. What lies beneath the water?
We believe part of the reason is integration; we have integrated COS storage, GaiaStack scheduling, and commonly used machine learning frameworks and algorithms. However, mere integration is not enough; we also need to conduct independent research to build differentiated competitive advantages, which is the special feature of TI-ONE.
The first special feature is Angel; Angel is Tencent’s self-developed machine learning framework. It overcomes the limitation of Spark placing models on a single node by optimizing the underlying mathematical libraries, allowing it to support trillion-level parameter models. In the industry, there are very few computing frameworks that can support such large models.
In terms of algorithms, we have implemented commonly used traditional machine learning algorithms such as logistic regression, SVM, etc., and some are our original, such as LAD*, which is our achievement published in VLDB.
In terms of performance, we compared Angel with Spark, XGBoost, and other platforms and found that Angel performs exceptionally well, with some algorithms performing over 20 times better than Spark.
The second special feature is graph computing algorithms. We know that there are three main players in the graph computing field: Pregel, GraphLab, and GraphX, with Pregel being closed-source by Google, GraphLab being commercial software, and only GraphX being open-source.
However, GraphX updates slowly and has few algorithms. In light of this, we have added many graph computing algorithms based on GraphX, including point evaluation algorithms, community discovery algorithms, and statistical feature algorithms, which have been finely optimized to support trillion-level scale relationship chains.
The third special feature is support for user-defined algorithms. Earlier, we mentioned that we have integrated many algorithms, including deep learning algorithms, traditional machine learning algorithms, regression algorithms, classification algorithms, recommendation algorithms, etc. However, for some advanced users, this may not be enough, so we allow users to define their algorithms to execute on TI-ONE. Although it is a small feature, it brings great flexibility to users.
Earlier, we discussed the features and special aspects of TI-ONE; now we need to talk about the features that commercial users are particularly concerned about, which strictly speaking are not inherent to TI-ONE but are granted by GaiaStack.
The first is dedicated clusters; when a user’s data volume is large, we can provide multiple complete clusters for them to use. When the user’s data is relatively small, multiple users can share a cluster. We have implemented effective multi-tenancy with resource and data isolation for users.
Support for hot upgrades, business uninterrupted, and user imperceptible.
Support for automatic switch-over of primary and backup for high availability; when the service volume increases, new instances will be automatically loaded, and load balancing will be performed automatically.
Who are TI-ONE’s users?
Finally, let’s look at the users; we have many users both inside and outside the company. For example, Tencent Games, WeChat, Application Treasure, QQ Music, etc. are all our users.
Lastly, here’s a benefit; you can scan the code to get a trial or documentation. Thank you all.
Tencent Frontier Technology | Product | Industry Information Exchange and Release Platform

