ChatGPT Development History, Principles, Technical Architecture, and Future

Source: Chen Wei Talks on Chips, This article will introduce the characteristics, functions, technical architecture, limitations, industrial applications, investment opportunities, and future of ChatGPT.

Author: Dr. Chen Wei, the author previously served as the chief scientist of a Huawei-affiliated natural language processing (NLP) company.

Integrated storage/computing/GPU architecture and AI expert, senior title. Expert in the Zhongguancun Cloud Computing Industry Alliance, member of the Chinese Optical Engineering Society, member of the Association for Computing Machinery (ACM), and professional member of the China Computer Federation (CCF). Former chief scientist of an AI company and head of 3D NAND design at a major storage chip manufacturer, with major achievements including the country’s first high-performance reconfigurable storage computing processor product architecture (prototyped and tested in major internet companies), the first AI processor dedicated to the medical field (already applied), and the first AI acceleration compiler compatible with RISC-V/x86/ARM platforms (cooperated with Alibaba Pingtouge/Xinlai, already applied), and established the first 3D NAND chip architecture and design team in the country (benchmarking with Samsung), and the first embedded flash compiler in the country (benchmarking with TSMC, already applied at the platform level).

Related Topics:

ChatGPT Special Report (1)

1. Industry Special Research: ChatGPT, Opening a New Era of AI (2023) 2. From ChatGPT to Generative AI: A New Paradigm of Artificial Intelligence Redefining Productivity (2023) 3. Overseas ChatGPT Special: The Windfall of ChatGPT Has Arrived, Commercialization Accelerates (2023) 4. ChatGPT: Opening a New Era of AI (2023) 5. ChatGPT Leads, Focusing on New Scenarios in the AI Industry in 2023

ChatGPT Special Report (2)

1. ChatGPT: Optimizing Language Models for Dialogue (2023) 2. ChatGPT: The Pinnacle of Chatbots, Opening a New Chapter in Natural Language Processing 3. ChatGPT Has Broad Prospects, Major Players Joining May Accelerate AI Implementation (2023) 4. ChatGPT: Exciting Prospects for Humanoid Robot Applications

0. Introduction

First, refer to the webpage or paper. Professional readers can directly look at the paper.

ChatGPT: Optimizing Language Models for Dialogue

GPT Paper: Language Models are Few-Shot Learners

InstructGPT Paper: Training Language Models to Follow Instructions with Human Feedback

Huggingface Interpretation of RHLF Algorithm: Illustrating Reinforcement Learning from Human Feedback (RLHF)

RHLF Algorithm Paper: Augmenting Reinforcement Learning with Human Feedback

TAMER Framework Paper: Interactively Shaping Agents via Human Reinforcement

PPO Algorithm: Proximal Policy Optimization Algorithms

On December 1 of this year, OpenAI launched the AI chat prototype ChatGPT, once again attracting attention and sparking discussions in the AI community similar to those caused by AIGC that led to job losses for artists.

Reportedly, ChatGPT attracted over 1 million internet registered users within just a few days of open trial. Various interesting dialogues of inquiries or teasing with ChatGPT circulated on social networks. Some even likened ChatGPT to a combination of a “search engine + social software,” capable of providing reasonable answers to questions through real-time interaction.

ChatGPT is a language model focused on dialogue generation. It can generate corresponding intelligent responses based on user text input. This response can be a brief phrase or a long discourse. GPT stands for Generative Pre-trained Transformer.

By learning a large amount of existing text and dialogue collections (e.g., Wiki), ChatGPT can converse in real-time like a human, smoothly answering various questions (of course, its response speed is still slower than a human). Whether in English or other languages (e.g., Chinese, Korean, etc.), from answering historical questions to writing stories, and even drafting business plans and industry analyses, it is “almost” capable of anything. Even programmers have posted dialogues showing ChatGPT modifying code.

ChatGPT can also be used in conjunction with other AIGC models to obtain more cool and practical functions. For example, it can generate living room design drawings through dialogue. This greatly enhances the AI’s application and dialogue capabilities with customers, making us see the dawn of large-scale AI implementation.

1. The Heritage and Characteristics of ChatGPT

ChatGPT Development History, Principles, Technical Architecture, and Future

1.1 OpenAI Family

First, let’s understand who OpenAI is.

OpenAI is headquartered in San Francisco and was co-founded in 2015 by Tesla’s Elon Musk, Sam Altman, and other investors, with the goal of developing AI technology that benefits all humanity. Musk left in 2018 due to differences in the company’s direction.

Previously, OpenAI was known for launching the GPT series of natural language processing models. Since 2018, OpenAI has begun to release generative pre-trained language models GPT (Generative Pre-trained Transformer), which can be used to generate articles, code, machine translation, Q&A, and various types of content.

Each generation of the GPT model has experienced an explosive growth in parameters, proving that “the bigger, the better.” The GPT-2 released in February 2019 had 1.5 billion parameters, while GPT-3 released in May 2020 had 175 billion parameters.

Comparison of Main Models in the GPT Family

1.2 Main Characteristics of ChatGPT

ChatGPT is a dialogue AI model developed based on the GPT-3.5 (Generative Pre-trained Transformer 3.5) architecture and is a sibling model of InstructGPT. ChatGPT is likely a rehearsal for OpenAI before the official launch of GPT-4 or is used to collect a large amount of dialogue data.

Main Characteristics of ChatGPT

OpenAI trained ChatGPT using RLHF (Reinforcement Learning from Human Feedback) technology and added more human supervision for fine-tuning.

Additionally, ChatGPT has the following features:

1) It can actively admit its mistakes. If a user points out its errors, the model will listen and optimize its answer.

2) ChatGPT can question incorrect queries. For example, when asked about “Columbus arriving in America in 2015,” the robot will explain that Columbus did not belong to that era and adjust its output.

3) ChatGPT can acknowledge its ignorance and admit its lack of knowledge about specialized techniques.

4) It supports continuous multi-turn dialogues.

Unlike various smart speakers and “artificial intelligence disabilities” that we use in daily life, ChatGPT can remember previous users’ dialogue information during the conversation, i.e., context understanding, to answer certain hypothetical questions. ChatGPT can achieve continuous dialogue, greatly enhancing the user experience in dialogue interaction modes.

For accurate translation (especially for Chinese and name transliteration), ChatGPT still has a way to go to perfection, but in terms of textual fluency and recognition of specific names, it is comparable to other online translation tools.

Since ChatGPT is a large language model, it currently does not possess online search capabilities, so it can only respond based on the dataset it has as of 2021. For example, it does not know about the 2022 World Cup or answer questions like Siri about today’s weather or help you search for information. If ChatGPT could browse the internet to find learning materials and search for knowledge, it would likely achieve even greater breakthroughs.

Even with limited knowledge, ChatGPT can still answer many quirky questions posed by imaginative humans. To prevent ChatGPT from developing bad habits, it employs algorithmic filtering to reduce harmful and deceptive training inputs, filtering queries through a moderate API, and rejecting potential racist or sexist prompts.

2. Principles of ChatGPT/GPT

2.1 NLP

The known limitations in the NLP/NLU field include misunderstanding of repetitive texts, misinterpretation of highly specialized topics, and misunderstandings of contextual phrases.

For both humans and AI, it usually takes years of training to converse normally. NLP models not only need to understand the meanings of words but also how to construct sentences and provide contextually meaningful answers, even using appropriate slang and technical vocabulary.

Application Areas of NLP Technology

Essentially, GPT-3 or GPT-3.5, which underpins ChatGPT, is a massive statistical language model or sequential text prediction model.

2.2 GPT vs. BERT

Similar to the BERT model, ChatGPT or GPT-3.5 automatically generates answers based on input statements according to language/corpus probabilities. From a mathematical or machine learning perspective, a language model is a modeling of the probability distribution of sequences of words, using previously spoken statements (which can be viewed as vectors in mathematics) as input conditions to predict the probability distribution of different statements or even language sets appearing at the next moment.

ChatGPT is trained using reinforcement learning from human feedback, a method that enhances machine learning through human intervention for better results. During training, human trainers act as users and AI assistants and fine-tune the model through the proximal policy optimization algorithm.

Due to ChatGPT’s stronger performance and vast parameters, it contains more topic data and can handle more niche topics. ChatGPT can now further handle tasks such as answering questions, writing articles, summarizing texts, language translation, and generating computer code.

Technical Architecture of BERT and GPT (In the figure, En is the input word, Tn is the output word)

3. Technical Architecture of ChatGPT

3.1 Evolution of the GPT Family

When talking about ChatGPT, one cannot avoid mentioning the GPT family.

Before ChatGPT, there were several well-known siblings, including GPT-1, GPT-2, and GPT-3. These siblings are increasingly larger, with ChatGPT being more similar to GPT-3.

Technical Comparison of ChatGPT and GPT 1-3

Both the GPT family and the BERT model are well-known NLP models, both based on Transformer technology. GPT-1 has only 12 Transformer layers, while GPT-3 has increased to 96 layers.

3.2 Reinforcement Learning from Human Feedback

The main difference between InstructGPT/GPT3.5 (the predecessor of ChatGPT) and GPT-3 is the addition of a training paradigm known as RLHF (Reinforcement Learning from Human Feedback). This training paradigm enhances human regulation of the model’s output and provides a more understandable ranking of results.

In InstructGPT, the following are the evaluation criteria for the “goodness of sentences.”

Truthfulness: Is it false or misleading information?
Harmlessness: Does it cause physical or mental harm to people or the environment?
Usefulness: Does it solve the user’s task?

3.3 TAMER Framework

Here, we must mention the TAMER (Training an Agent Manually via Evaluative Reinforcement) framework. This framework introduces human annotators into the learning loop of agents, allowing humans to provide reward feedback to guide agents during training, thus quickly achieving training task goals.

TAMER Framework Paper

The main purpose of introducing human annotators is to accelerate training speed. Although reinforcement learning techniques perform well in many areas, they still have many shortcomings, such as slow convergence speed and high training costs. In the real world, the exploration cost or data acquisition cost for many tasks is very high. How to speed up training efficiency is one of the important issues to be solved in current reinforcement learning tasks.

TAMER can train agents using the knowledge of human annotators in the form of reward feedback, accelerating their rapid convergence. TAMER does not require annotators to have expert knowledge or programming skills, resulting in lower corpus costs. By combining TAMER with RL (reinforcement learning), the feedback from human annotators can enhance the process of reinforcement learning (RL) based on Markov Decision Processes (MDP) rewards.

Application of TAMER Architecture in Reinforcement Learning

In practice, human annotators act as users and AI assistants, providing dialogue samples for the model to generate responses. The annotators will score and rank the response options, feeding back the better results to the model. Agents learn from two feedback modes—human reinforcement and Markov Decision Process rewards—as an integrated system, continuously fine-tuning the model through reward strategies.

Based on this, ChatGPT can better understand and complete human language or instructions than GPT-3, mimicking humans and providing coherent and logical text information.

3.4 Training of ChatGPT

The training process of ChatGPT is divided into the following three stages:

First Stage: Training the Supervised Policy Model

It is difficult for GPT 3.5 to understand the different intentions embedded in various types of human instructions and to determine whether the generated content is of high quality. To initially equip GPT 3.5 with the ability to understand the intentions of instructions, questions are randomly sampled from the dataset, and human annotators provide high-quality answers, which are then used to fine-tune the GPT-3.5 model (resulting in the SFT model, Supervised Fine-Tuning).

At this stage, the SFT model already outperforms GPT-3 in following instructions/dialogues, but may not necessarily align with human preferences.

Training Process of ChatGPT Model

Second Stage: Training the Reward Model (RM)

This stage primarily involves training the reward model through manually annotated training data (about 33K data). Questions are randomly sampled from the dataset, and multiple different answers are generated using the model from the first stage. Human annotators provide a ranking for these results based on comprehensive considerations. This process is similar to coaching or tutoring.

Next, this ranking result data is used to train the reward model. For multiple ranking results, pairwise combinations are formed to create multiple training data pairs. The RM model receives an input and gives a score evaluating the quality of the responses. Thus, for a pair of training data, parameters are adjusted so that the score for high-quality responses is higher than that for low-quality responses.

Third Stage: Using PPO (Proximal Policy Optimization) Reinforcement Learning to Optimize Policies.

The core idea of PPO is to convert the On-policy training process in Policy Gradient into Off-policy, i.e., transforming online learning into offline learning, a process known as Importance Sampling. This stage uses the reward model trained in the second stage to update the parameters of the pre-trained model based on reward scoring. Questions are randomly sampled from the dataset, responses are generated using the PPO model, and the quality scores are given by the RM model trained in the previous stage. The reward scores are sequentially passed to generate policy gradients, updating the PPO model parameters through reinforcement learning.

If we continuously repeat the second and third stages through iteration, we will train a higher quality ChatGPT model.

4. Limitations of ChatGPT

Just because ChatGPT can provide answers to user questions, does it mean we no longer need to feed keywords to Google or Baidu to get the answers we want?

Although ChatGPT demonstrates excellent contextual dialogue abilities and even programming capabilities, transforming the public’s perception of chatbots from “artificial intelligence disabilities” to “interesting,” we must acknowledge that ChatGPT technology still has some limitations and is continuously improving.

1) ChatGPT lacks “common sense” and inferential abilities in areas where it has not been trained on a large corpus, and it can even “speak nonsense” seriously. ChatGPT can “create answers” in many fields, but when users seek correct answers, ChatGPT may also provide misleading responses. For example, asking ChatGPT to solve a primary school application problem, it can write out a long calculation process, but the final answer may be incorrect.

2) ChatGPT cannot handle complex, lengthy, or particularly specialized language structures. For questions from highly specialized fields such as finance, natural sciences, or medicine, if not sufficiently “fed” with corpus, ChatGPT may fail to generate appropriate responses.

3) ChatGPT requires a very large amount of computing power (chips) to support its training and deployment. Aside from the need for a large corpus to train the model, ChatGPT still requires powerful servers for application, which ordinary users cannot afford, even models with billions of parameters require an astonishing amount of computing resources to run and train. If facing billions of user requests for real search engines, any enterprise would struggle to bear this cost under the current free strategy. Therefore, ordinary people must wait for lighter models or more cost-effective computing platforms.

4) ChatGPT cannot incorporate new knowledge online, and retraining the GPT model whenever new knowledge emerges is unrealistic, both in terms of training time and costs that are difficult for ordinary trainers to accept. If an online training model for new knowledge is adopted, it seems feasible and the corpus cost is relatively low, but it can easily lead to catastrophic forgetting of existing knowledge due to the introduction of new data.

5) ChatGPT remains a black box model. Currently, we cannot decompose ChatGPT’s internal algorithmic logic, so we cannot guarantee that ChatGPT will not produce statements that attack or harm users.

Of course, flaws do not obscure the merits, as engineers have posted dialogues requesting ChatGPT to write Verilog code (chip design code). It can be seen that ChatGPT’s level has already surpassed that of some Verilog beginners.

5. Future Improvement Directions for ChatGPT

5.1 Reducing Human Feedback with RLAIF

At the end of 2020, Dario Amodei, former vice president of research at OpenAI, founded an AI company called Anthropic with ten employees. Most of the founding team members of Anthropic were early and core employees of OpenAI, having participated in OpenAI’s GPT-3, multimodal neurons, and reinforcement learning of human preferences.

In December 2022, Anthropic published a paper titled “Constitutional AI: Harmlessness from AI Feedback” introducing the AI model Claude.

Training Process of CAI Model

Claude and ChatGPT both rely on reinforcement learning (RL) to train preference models. CAI (Constitutional AI) is also based on RLHF, but the difference is that CAI uses models (rather than humans) to provide an initial ranking of all generated outputs.

CAI uses AI feedback to replace human preferences for expressing harmlessness, i.e., RLAIF, where AI evaluates response content based on a set of constitutional principles.

5.2 Addressing Mathematical Shortcomings

Although ChatGPT has strong dialogue capabilities, it often produces serious nonsense in mathematical calculation dialogues.

Computer scientist Stephen Wolfram proposed a solution to this problem. Stephen Wolfram created the Wolfram Language and the computational knowledge search engine Wolfram | Alpha, which is implemented through Mathematica.

Integration of ChatGPT and Wolfram | Alpha for Problem Solving

In this combined system, ChatGPT can “converse” with Wolfram|Alpha like a human, and Wolfram|Alpha will use its symbolic translation capabilities to convert the natural language expressions obtained from ChatGPT into corresponding symbolic computation language. In the past, there has been a divide in academia regarding the “statistical methods” used by ChatGPT and the “symbolic methods” of Wolfram|Alpha. However, the complementarity of ChatGPT and Wolfram|Alpha now offers the possibility of advancing the NLP field.

ChatGPT does not need to generate such code; it only needs to generate conventional natural language, which Wolfram|Alpha can then translate into precise Wolfram Language, and then computed by the underlying Mathematica.

5.3 Miniaturization of ChatGPT

Although ChatGPT is powerful, its model size and usage costs deter many people.

There are three types of model compression methods that can reduce model size and costs.

The first method is quantization, which reduces the precision of the numerical representation of individual weights. For example, reducing Transformer from FP32 to INT8 has little impact on its accuracy.

The second model compression method is pruning, which involves deleting network elements, from individual weights (unstructured pruning) to higher granularity components like channels of weight matrices. This method is effective in visual and smaller scale language models.

The third model compression method is sparsification. For example, the SparseGPT proposed by the Institute of Science and Technology Austria (ISTA) can prune the GPT series models to 50% sparsity in one go without any retraining. For the GPT-175B model, this can be achieved in a few hours using a single GPU.

SparseGPT Compression Process

6. Future of ChatGPT in Industry and Investment Opportunities

6.1 AIGC

When talking about ChatGPT, one must mention AIGC.

AIGC refers to the use of artificial intelligence technology to generate content. Compared to the UGC (User Generated Content) and PGC (Professionally Generated Content) of the previous Web1.0 and Web2.0 eras, AIGC, which represents AI-conceived content, is a new round of content production mode transformation, and AIGC content is expected to see exponential growth in the Web3.0 era.

The emergence of the ChatGPT model is of great significance for text/audio modality AIGC applications and will have a significant impact on the upstream and downstream of the AI industry.

6.2 Beneficial Scenarios

From the perspective of downstream related beneficial applications, including but not limited to no-code programming, novel generation, conversational search engines, voice companionship, voice work assistants, conversational virtual humans, AI customer service, machine translation, chip design, etc. From the perspective of increased demand upstream, including computing chips, data annotation, natural language processing (NLP), etc.

Large models are in an explosive state (more parameters/more demand for computing chips)

With the continuous advancement of algorithm and computing technologies, ChatGPT will further evolve into a more advanced version with stronger functionality, applying in more fields and generating more beautiful dialogues and content for humanity.

Finally, the author asks about the position of integrated storage and computing technology in the ChatGPT field (the author is currently focusing on promoting the product landing of integrated storage and computing chips), ChatGPT boldly predicts that integrated storage and computing technology will occupy a dominant position in ChatGPT chips. (Deeply resonates with me)

Source: https://zhuanlan.zhihu.com/p/590655677

(End)

More Exciting:

Call for Papers

Yan Shi│Thoughts and Suggestions on the “Difficulties” of Young Teachers in Colleges and Universities

Xu Xiaofei et al. | Metaverse Education and Its Service Ecosystem

[Directory] “Computer Education” Issue 12, 2022

[Directory] “Computer Education” Issue 11, 2022

[Directory] “Computer Education” Issue 10, 2022

[Editorial Board Message] Professor Li Xiaoming from Peking University: Reflections from the “Year of Classroom Teaching Improvement”…

Professor Chen Daoxu from Nanjing University: Teaching students to ask questions and teaching students to answer questions, which is more important?

[Yan Shi Series]: Development Trends of Computer Disciplines and Their Impact on Computer Education

Professor Li Xiaoming from Peking University: From Fun Math to Fun Algorithms to Fun Programming—A Path for Non-professional Learners to Experience Computational Thinking?

Several Reflections on Building a First-class Computer Discipline

New Engineering and Big Data Major Construction

Lessons from Other Stones Can Polish Jade—Compilation of Research Articles on Computer Education at Home and Abroad