MiniCPM-2B Series Lightweight Model Surpasses Mistral-7B

Source: Shizhi AI

This article has 1838 words and suggests a 5-minute reading time.
The Tsinghua NLP Laboratory and Mianbi Intelligent have released the MiniCPM-2B series lightweight model on the wisemodel.cn open-source community, which is considered a performance powerhouse, surpassing Mistral-7B and even outdoing many larger models like 13B and 33B, capable of running directly on mobile devices and other edge devices.

MiniCPM-2B Series Lightweight Model Surpasses Mistral-7B

Lightweight large models are a hot topic in the recent open-source community. On February 1, 2024, Tsinghua NLP Laboratory and Mianbi Intelligent released and open-sourced the MiniCPM-2B series model, along with the MiniCPM-V based on MiniCPM-2B and the OmniLMM-12B multimodal model. The models released this time have been simultaneously launched in the Shizhi AI wisemodel.cn open-source community. The MiniCPM-2B has only 2 billion parameters but outperforms larger models like Mistral-7B and LLaMa2-13B.

https://wisemodel.cn/organization/OpenBMB (Organization Page)

1. Overview of MiniCPM

MiniCPM is a series of edge language large models co-open-sourced by Mianbi Intelligent and the Tsinghua University Natural Language Processing Laboratory. The main language model MiniCPM-2B has only 2.4 billion (2.4B) non-word embedding parameters.

After SFT, MiniCPM is comparable to Mistral-7B on public comprehensive evaluation sets (with better capabilities in Chinese, mathematics, and code), and overall performance surpasses models like Llama2-13B, MPT-30B, and Falcon-40B.

After DPO, MiniCPM outperformed many representative open-source large models, including Llama2-70B-Chat, Vicuna-33B, Mistral-7B-Instruct-v0.1, and Zephyr-7B-alpha on the current most user-perceptive evaluation set MTBench.

After Int4 quantization, MiniCPM can be deployed for inference on mobile phones, with a streaming output speed slightly higher than human speaking speed. MiniCPM-V also successfully ran the deployment of a multimodal large model on mobile for the first time.
A single 1080/2080 can efficiently fine-tune parameters, while a single 3090/4090 can fine-tune all parameters. A single machine can continuously train MiniCPM, and the secondary development cost is relatively low.

2. Overview of OmniLMM-12B

OmniLMM-12B is the most powerful version in the current series. This model uses a perception resampling layer to connect EVA02-5B and Zephyr-7B-β, employing a curriculum learning approach for training on multimodal data. The model has three notable features:

🔥 Excellent performance.OmniLMM-12B achieves leading performance in multiple benchmark tests compared to other models of the same scale (including MME, MMBench, SEED-Bench, etc.). This model also supports OCR functionality and possesses rich multimodal world knowledge.

🏆 Trustworthy behavior.The hallucination problem of LMMs has received much attention, as models often generate text that contradicts the facts in images (for example, confidently describing objects that do not exist in the picture). OmniLMM-12B is the first latest open-source LMM aligned through multimodal RLHF to achieve trustworthy behavior. This model ranks first among open-source models on the MMHal-Bench hallucination evaluation benchmark and surpasses GPT-4V in Object HalBench.

🕹 Real-time multimodal interaction.Combining OmniLMM-12B and GPT-3.5 into a real-time multimodal interaction assistant. This assistant accepts video streams from a camera and voice streams from a microphone, providing voice output. Although still in its early stages, this model has demonstrated the ability to reproduce some interesting examples from the Gemini demonstration video without video editing.

https://wisemodel.cn/space/gradio/OmniLMM-12B (Experience Address)

3. Overview of MiniCPM-V Model

MiniCPM-V is a high-efficiency version of the model suitable for deployment on terminal machines. This model is built on SigLip-400M and MiniCPM-2.4B, connected via a perceptron resampler. MiniCPM-V has several notable features:

⚡️ High efficiency.MiniCPM-V can be efficiently deployed on most GPU cards and personal computers, even on mobile phones and other edge devices. In visual encoding, the perceptron resampler compresses image representations into 64 tokens, far fewer than other LMMs based on MLP architecture (usually greater than 512 tokens). This allows MiniCPM-V to have lower memory costs and faster speeds during inference.

🔥 Outstanding performance.MiniCPM-V achieves state-of-the-art performance in multiple benchmark tests compared to similarly sized models, surpassing existing LMMs based on Phi-2. It even achieves performance comparable to or better than the 9.6B Qwen-VL-Chat.

🙌 Bilingual support.MiniCPM-V is the first terminal-deployable LMM that supports bilingual interaction in English and Chinese. This is achieved through cross-linguistic generalization of multimodal capabilities.

https://wisemodel.cn/space/gradio/MiniCPM-V (Experience Address)

4. DEMO Demonstration

——END——

Leave a Comment Cancel reply