The Dual Faces of AI Source God: Small Parameters, Big Models Can Reflect, But Are Only Limitedly Open Source

During the tuning process of the Llama 3 model, Chen Tianchu discovered that this open-source large model, supported by powerful computing power and high-quality massive data, indeed “opened a window for an open experience” for enterprises or individual users without sufficient computing power.

Author: Qian Yujuan

Cover Image: Tuchong Creative

More than two weeks have passed since the release of the Llama 3 model, but global developers are still enthusiastic about this open-source large model, which claims to be the “King of Open Source” and “AI Source God”.

As of May 8, before publication, the open-source large model under the American social giant Meta had nearly 19,600 stars on the global code hosting service platform GitHub, and this number is still increasing. Stars reflect the level of user interest in the large model project on GitHub.

The developers’ enthusiasm for the Llama 3 model is not only due to Meta’s claim that it is the best-performing open-source large model of the same scale on the market, but also because Meta provides strong supporting conditions—the two versions of the Llama 3 model were trained on a computing cluster containing 24,000 NVIDIA graphics cards (GPUs), using a high-quality pre-training dataset composed of 15 trillion (15T) tokens (the smallest unit in the text).

Chen Tianchu works in the Computer System Architecture Laboratory at Zhejiang University, engaged in research related to large models. He found that this open-source large model, supported by powerful computing power and high-quality massive data, indeed “opened a window for an open experience” for enterprises or individual users without sufficient computing power.

However, Chen Tianchu also stated that considering the usage permissions of the Llama 3 model in certain fields and the limitations on data output retraining, it is not a completely open-source large model.

What Makes Llama 3 Powerful?

On April 18, Meta released the Llama 3 model and opened two parameter scale versions to developers: the Llama 3 8B model and the Llama 3 70B model. Coincidentally, this day is also the birthday of AI industry scholar Andrew Ng. As an advocate of AI open-source, Andrew Ng remarked that the Llama 3 model is “the best gift to date” and thanked Meta.

Within hours of its release, the Llama 3 model unprecedentedly topped the model rankings on the AI code community Hugging Face. Subsequently, NVIDIA senior scientist Jim Fan predicted, “The upcoming Llama 3 400B from Meta will become a watershed moment, as the community will gain access to an open-source heavyweight GPT-4 model.”

Meta has always emphasized the importance of innovation, expansion, and optimization, but it did not make significant adjustments to the architecture and underlying algorithms of the previous generation model Llama 2 while developing the Llama 3 model. The differences between the two generations are more concentrated in data engineering.

The pre-training dataset used by the Llama 3 model exceeds 15 trillion tokens, which is seven times larger than the dataset used by Llama 2 and contains four times more code, reflecting the abundant resources Meta invested in developing this model. Previously, Meta also stated, “The increase in data helps the Llama 3 model better recognize differences and patterns.”

The Llama 3 8B model is the small parameter version of Llama 3. Chen Tianchu stated that most of the open-source 8B scale models currently available domestically and internationally generally meet training needs with a dataset of 200 billion (0.2T) tokens, while Meta’s expansion of the corpus for training the Llama 3 model is unexpected.

In addition, the computing cards used by the Llama 3 model are even more astonishing, as it was trained on a computing cluster containing 24,000 NVIDIA GPUs. Such a large-scale computing cluster poses engineering challenges such as network communication optimization and power infrastructure construction. Chen Tianchu noted that it is rare to see such extensive resources used to train a small parameter large model, whether in open-source or closed-source communities.

Therefore, once released, the Llama 3 model attracted many large model players to test, fine-tune, and retrain it. The open-source model community OpenBuddy is one of them, with Chen Tianchu serving as the model training leader.

Meta stated on its official blog that the Llama 3 model was fine-tuned only for English output, primarily providing conversational abilities in English. Just three days after the market launch of Llama 3, the OpenBuddy team released an optimized version of the Llama 3 8B model supporting Chinese—OpenBuddy-Llama3-8B model—within the Magic Dock community that gathers AI developers. Chen Tianchu said, “We primarily optimized its cross-language understanding ability, allowing it to have stronger stability and cognitive ability in Chinese.”

Chen Tianchu revealed that a user from the Magic Dock community provided feedback to the OpenBuddy team, stating that the OpenBuddy-Llama3-8B model does not grasp some traditional Chinese culture or niche Chinese knowledge points accurately enough, but it possesses greater potential than other open-source models of the same scale. The user also believed that its understanding of Chinese is now close to that of a large parameter native Chinese model.

Generally, large parameter models around 70B are believed to possess reflective and error-correcting abilities. However, Chen Tianchu observed that the OpenBuddy-Llama3-8B model “can recognize what it has said, realize it was wrong, and even undergo a reflective process to correct an answer after admitting the mistake.”

Small parameter large models are often considered suitable for handling simple daily tasks. However, the performance of Llama 3 8B and its derivative models has shown deeper reflective and error-correcting mechanisms, making users in the open-source community realize that complex cognition is no longer the exclusive domain of large parameter models. Based on this, Chen Tianchu predicts that applying the Llama 3 8B model in certain budget-constrained vertical industries may hold greater possibilities.

The Limited Open Source “King of Open Source”

The powerful and open-source characteristics of the Llama 3 model have led developers to bestow it with titles such as “King of Open Source” and “AI Source God,” but Chen Tianchu holds a different view on this.

The OpenBuddy team has been committed to providing strong cognitive intelligence Chinese open-source models for the open-source community. They often check the license of a new open-source large model immediately after its release to see if there are any restricted usage methods. Chen Tianchu stated that if it can only be used in a specific language or cannot be commercially used, “this means that our derivative results based on this open-source foundation (fine-tuning) may also be subject to the same restrictions.”

The OpenBuddy team found that the Llama 3 model actually restricts usage licenses in certain fields, and the data output from it cannot be used to train other models.

Chen Tianchu noted that for enterprises developing model applications, the bottleneck of the Llama 3 model is not the language, but its lack of support for large-scale enterprise commercial use. “Vendors with over 700 million monthly active users, including associated companies, are not feasible.”

Chen Tianchu is also very concerned about the sources and destinations of the training data of large models. However, during the tuning process of the Llama 3 model, the OpenBuddy team found that many open-source vendors, including Meta, are reluctant to disclose the sources or proportions of their data. Chen Tianchu stated that this may be because they use some copyrighted data for training.

Based on the aforementioned limiting factors, Chen Tianchu analyzed that from the strict definition of the open-source community, Llama 3 is not entirely an open-source large model. “It is still a reserved open model, and we cannot truly define it as an open-source work.”

Regarding the limited open-source nature of the Llama 3 model, Sun Jin, product director of CloudWalk Technology Research Institute, believes that the open-source version is definitely not the best version of the large model. “If there is a substitute for GPT-4, even if it is open-source, it will be a version that has been castrated before being open-sourced.”

Since last year, many vendors at home and abroad have open-sourced their large models. However, after communicating with some industry clients, Sun Jin found that, “They have all gone through the process of going from getting started to giving up on open-source models, and now they come to us to directly purchase large model algorithms.”

Sun Jin’s team has also received some requests from local governments, “To provide some subsidies for us to open-source large model technology.” However, they have not made any actual progress on this matter. In Sun Jin’s view, if a vendor chooses to open-source a large model, it needs a supporting computing operation ecosystem to sustain a profitable model; only cloud computing, computing hardware, and startup AI companies have the motivation to open-source large models.

Chen Tianchu understands the vendors’ concern about the commercial model of open-source large models, but he does not recommend that all vendors start from scratch to train large models. “Tracking the latest results in the open-source community may also be a worthwhile route to consider.”

From the perspective of the open model of Llama 3, Chen Tianchu believes that this model has opened a window for the open-source community—it not only allows developers to realize what results can be achieved using massive computing power and datasets but also provides many enterprises or individual users without sufficient computing power the opportunity to experience the capabilities of large models.

Chen Tianchu stated that Meta invested millions of hours of H100 (a type of NVIDIA GPU) computing power to train the Llama 3 8B model, which is something that no startup can afford. Considering that this model has achieved good training results, he predicts that for a long time to come, especially in English-speaking environments, further optimization and development based on the Llama 3 model will become a commercially meaningful option for some startups.

Yunnan Baiyao Outbreak Case

Lu Feng: Which Industries Face Overcapacity

All Taken Off the Shelves! Intelligent Deposits Suspected of High-Interest Fundraising

Leave a Comment Cancel reply