Performance of 2B Parameters Surpasses Mistral-7B: Wall Intelligence Multimodal Edge Model Open Source

Machine Heart reports

Editor: Zenan

Low-cost devices can run locally.

As large models continue to evolve towards larger scales, recent developments have also been made in optimization and deployment.

On February 1, Wall Intelligence, in collaboration with Tsinghua NLP Laboratory, officially launched its flagship edge large model “Wall MiniCPM” in Beijing. The new generation large model is referred to as the “performance powerhouse,” designed for terminal deployment while boasting the strongest multimodal capabilities in its class.

The MiniCPM proposed by Wall Intelligence has a parameter count of only 2 billion, trained on a curated dataset of 1T tokens. This model has a parameter count comparable to the 2018 BERT model, and Wall Intelligence has achieved extreme performance optimization and cost control on top of it, allowing the model to “fight above its weight class.”

Li Dahai, co-founder and CEO of Wall Intelligence, compared the new model with the well-known open-source large model Mistral-7B, stating that MiniCPM 2B outperforms the latter across multiple mainstream evaluation rankings.

Performance of 2B Parameters Surpasses Mistral-7B: Wall Intelligence Multimodal Edge Model Open Source

Compared to Microsoft’s recently proposed “small model” Phi-2, MiniCPM also has significant advantages.

Performance of 2B Parameters Surpasses Mistral-7B: Wall Intelligence Multimodal Edge Model Open Source

Li Dahai stated that Wall Intelligence’s new model can also achieve capabilities equivalent to 13B, 30B, or even 40B models. On the MT-Bench evaluation ranking, which closely aligns with user experience, MiniCPM scored 7 points (GPT-4-Turbo scored 9 points).

Performance of 2B Parameters Surpasses Mistral-7B: Wall Intelligence Multimodal Edge Model Open Source

At the event, Wall Intelligence also demonstrated the practical application effects of MiniCPM. Despite its smaller parameter count, the model can perform various tasks typical of large models, such as text translation and role-playing, and possesses extensive knowledge, making it capable of handling complex code explanation tasks.

Performance of 2B Parameters Surpasses Mistral-7B: Wall Intelligence Multimodal Edge Model Open Source

Because it can be deployed at the edge, MiniCPM can provide timely assistance during unexpected events:

Performance of 2B Parameters Surpasses Mistral-7B: Wall Intelligence Multimodal Edge Model Open Source

Recently, various mobile phone manufacturers have proposed edge large models, and after compressing large language models to smaller sizes, we can connect them to more scenarios, achieving higher degrees of intelligence under limited computing power and memory. In contrast, the new technology proposed by Wall Intelligence is lighter and can be applied to lower configurations or earlier phone models.

According to Wall Intelligence, the MiniCPM edge model has been compressed by 75% after Int4 quantization, occupying only 2G of memory, while performance remains nearly unchanged, thus it has been successfully implemented on various common phone models.

Performance of 2B Parameters Surpasses Mistral-7B: Wall Intelligence Multimodal Edge Model Open Source

By supporting inference on mobile CPUs, MiniCPM can significantly reduce usage costs. Wall Intelligence calculated that a phone equipped with the Snapdragon 855 running MiniCPM can process 1.7 million tokens for just one yuan in electricity costs, which is only 1% of the cost for running Mistral-Medium in the cloud.

In addition to the edge model, Wall Intelligence also showcased its explorations in multimodal large models and open-sourced a 12B parameter OmniLMM. At the launch event, Wall Intelligence demonstrated a rock-paper-scissors demo similar to that released with Gemini. When asked in English: “What game am I playing?” the large model would respond: “Rock-paper-scissors.”

Performance of 2B Parameters Surpasses Mistral-7B: Wall Intelligence Multimodal Edge Model Open Source

Meanwhile, OmniLMM can also recognize human gestures and tell you what to play to win.

OmniLMM can understand a lot of information in images and perform reasoning, such as recognizing landmarks, television station logos, and organized events.

Performance of 2B Parameters Surpasses Mistral-7B: Wall Intelligence Multimodal Edge Model Open Source

It seems we are not far from truly multimodal large models and new forms of applications.

The extreme performance of Wall Intelligence’s large model stems from the company’s long-term technological accumulation. Since 2021, Wall Intelligence has built an efficient technology stack focused on three areas: Infra, algorithms, and data methodologies. Among them, the self-developed BMTrain efficient training framework is crucial.

Performance of 2B Parameters Surpasses Mistral-7B: Wall Intelligence Multimodal Edge Model Open Source

On the algorithmic level, Wall Intelligence has also accumulated a model sandbox system, elevating large models from mere experimentation to a scientific level, continuously seeking optimal solutions for hyperparameters and scales, such as optimal batch sizes and hyperparameter configurations applicable to all model sizes.

Currently, Wall Intelligence has accumulated a large amount of high-quality data. Following yesterday’s launch, Wall Intelligence open-sourced its next-generation large model series (including MiniCPM-SFT / DPOMiniCPM-V & MiniCPM-SFT / DPO-int4), as well as the training recipes for the two stages of MiniCPM for industry reference.

Open-source addresses (including technical reports):

MiniCPM GitHub: https://github.com/OpenBMB/MiniCPM

OmniLMM GitHub: https://github.com/OpenBMB/OmniLMM

Wall Intelligence originated from Tsinghua NLP Laboratory and is one of the earliest teams in China to conduct large model research, having released the world’s first knowledge-guided pre-training model ERNIE in 2018. Since becoming a company in August 2022, Wall Intelligence has undergone two rounds of financing, and its application “Wall Luka” has also received the second batch of large model filing from the Cyberspace Administration.

Currently, Wall Intelligence has assembled a research team of over 100 people, 80% of whom are from Tsinghua and Peking University, with an average age of 28.

Performance of 2B Parameters Surpasses Mistral-7B: Wall Intelligence Multimodal Edge Model Open Source

Wall Intelligence is building a dual-engine strategy of large models + agents, aiming to create smaller scale, faster speed, and lower-cost solutions.

This year, Wall Intelligence will also accelerate the iteration of new technologies. “We will continuously release new versions of MiniCPM after the Spring Festival, with performance further improved. We want to give everyone some rest time during the Spring Festival,” Liu Zhiyuan stated.

Performance of 2B Parameters Surpasses Mistral-7B: Wall Intelligence Multimodal Edge Model Open Source

© THE END

For reprints, please contact this public account for authorization

For submissions or inquiries: [email protected]

Leave a Comment