The Evolution of the GPT Family

Abstract

GPT (Generative Pre-trained Transformer) is a neural network model based on the Transformer architecture, which has become an important research direction in the field of natural language processing. This article will introduce the development history and technological changes of GPT, outlining the technical upgrades and application scenarios from GPT-1 to GPT-3, exploring the applications of GPT in natural language generation, text classification, language understanding, and the challenges it faces, as well as future development directions.

Timeline

June 2018

OpenAI released the GPT-1 model with 110 million parameters.

November 2018

OpenAI released the GPT-2 model with 1.5 billion parameters, but did not fully release the model’s code and data due to concerns about misuse.

February 2019

OpenAI released partial code and data for the GPT-2 model, but access was still restricted.

June 10, 2019

OpenAI released the GPT-3 model with 175 billion parameters and provided access to some partners.

September 2019

OpenAI released the full code and data for GPT-2 and launched a larger version.

May 2020

OpenAI announced the beta version of the GPT-3 model, which has 175 billion parameters, making it the largest natural language processing model to date.

March 2022

OpenAI released InstructGPT, which utilizes Instruction Tuning.

November 30, 2022

OpenAI officially launched ChatGPT, a new conversational AI model fine-tuned from the GPT-3.5 series of large language models.

December 15, 2022

ChatGPT received its first update, enhancing overall performance and adding new features to save and view historical conversations.

January 9, 2023

ChatGPT received its second update, improving answer authenticity and adding a new “stop generating” feature.

January 21, 2023

OpenAI released the paid version of ChatGPT Professional, limited to select users.

January 30, 2023

ChatGPT received its third update, improving answer authenticity and enhancing mathematical capabilities.

February 2, 2023

OpenAI officially launched the subscription service for the paid version of ChatGPT, which is faster and more stable compared to the free version.

March 15, 2023

OpenAI launched the groundbreaking large multimodal model GPT-4, which can not only read text but also recognize images and generate text results, now available to Plus users through ChatGPT.

GPT-1: A Pre-trained Model Based on Unidirectional Transformer

Before the emergence of GPT, NLP models were mainly trained on a large amount of labeled data for specific tasks. This led to several limitations:

High-quality labeled data is difficult to obtain;
Models are limited to their training, lacking generalization ability;
They cannot perform out-of-the-box tasks, limiting the practical application of the models.

To overcome these issues, OpenAI embarked on the path of pre-training large models. GPT-1, released by OpenAI in 2018, was the first pre-trained model that used a unidirectional Transformer model and was trained on over 40GB of text data. The key features of GPT-1 include: generative pre-training (unsupervised) + discriminative task fine-tuning (supervised). It first pre-trained using unsupervised learning, spending one month on 8 GPUs to enhance the AI system’s language capabilities from a large amount of unlabeled data, acquiring extensive knowledge, and then performed supervised fine-tuning, integrating with large datasets to improve the system’s performance in NLP tasks. GPT-1 demonstrated excellent performance in text generation and understanding tasks, becoming one of the most advanced natural language processing models at the time.

GPT-2: A Multi-task Pre-trained Model

Due to the lack of generalization in single-task models and the need for a large number of effective training pairs in multi-task learning, GPT-2 expanded and optimized on the basis of GPT-1, eliminating supervised learning and retaining only unsupervised learning. GPT-2 was trained using larger text data and more powerful computing resources, achieving a parameter scale of 1.5 billion, far exceeding GPT-1’s 110 million parameters. In addition to using larger datasets and models to learn, GPT-2 also introduced a new, more challenging task: zero-shot learning, which directly applies the pre-trained model to various downstream tasks. GPT-2 demonstrated outstanding performance in several natural language processing tasks, including text generation, text classification, and language understanding.

GPT-3: Creating New Natural Language Generation and Understanding Capabilities

GPT-3 is the latest model in the GPT series, utilizing a larger parameter scale and richer training data. The parameter scale of GPT-3 reaches 175 trillion, more than 100 times that of GPT-2. GPT-3 demonstrates astonishing capabilities in natural language generation, dialogue generation, and other language processing tasks, even creating new forms of language expression in some tasks.

GPT-3 introduced a very important concept: in-context learning, which will be explained in the next article.

InstructGPT & ChatGPT

The training of InstructGPT/ChatGPT consists of three steps, each requiring slightly different data, which we will introduce separately.

Starting from a pre-trained language model, the following three steps are applied.

Step 1: Supervised Fine-Tuning (SFT): Collect demonstration data to train a supervised policy. Our labelers provide demonstrations of the desired behavior on the input prompt distribution. We then fine-tune the pre-trained GPT-3 model on this data using supervised learning.

Step 2: Reward Model training. Collect comparison data to train a reward model. We collected a dataset comparing outputs between models, where labelers indicated which output they preferred for a given input. We then train a reward model to predict the outputs preferred by humans.

Step 3: Reinforcement Learning with Proximal Policy Optimization (PPO) using the reward model’s output as scalar rewards. We fine-tune the supervised policy using the PPO algorithm to optimize this reward.

Steps 2 and 3 can iterate consecutively; more comparison data is collected on the current optimal policy, which is used to train a new reward model, followed by a new policy.

The prompts for the first two steps come from user usage data on OpenAI’s online API, as well as handwritten by hired labelers. The last step is entirely sampled from API data. The specific data for InstructGPT includes:

1. SFT Dataset

The SFT dataset is used to train the supervised model in Step 1, using newly collected data to fine-tune GPT-3 according to its training method. Since GPT-3 is a prompt-based generative model, the SFT dataset consists of samples composed of prompt-response pairs. Part of the SFT data comes from users using OpenAI’s Playground, and another part comes from 40 labelers hired by OpenAI, who were trained for this task. In this dataset, the labelers’ job is to write instructions based on the content.

2. RM Dataset

The RM dataset is used to train the reward model in Step 2, and we also need to set a reward goal for the training of InstructGPT/ChatGPT. This reward goal does not need to be differentiable but must align as comprehensively and accurately as possible with the content we need the model to generate. Naturally, we can provide this reward through manual labeling, giving lower scores to generated content that involves bias, thereby encouraging the model not to generate content that humans dislike. InstructGPT/ChatGPT’s approach is to first generate a batch of candidate texts and then rank these generated contents based on their quality by labelers.

3. PPO Dataset

The PPO data for InstructGPT is not labeled; it comes entirely from users of the GPT-3 API. It includes various types of generation tasks provided by different users, with the highest proportions being generation tasks (45.6%), QA (12.4%), brainstorming (11.2%), dialogue (8.4%), etc.

Appendix:

The capabilities of ChatGPT come from:

The capabilities and training methods of GPT-3 to ChatGPT and the iterative versions in between:

References

1. Dissecting the Origins of GPT-3.5’s Capabilities: https://yaofu.notion.site/GPT-3-5-360081d91ec245f29029d37b54573756

2. The Most Comprehensive Timeline! From the Past and Present of ChatGPT to the Current Competitive Landscape in AI: https://www.bilibili.com/read/cv22541079

3. GPT-1 Paper: Improving Language Understanding by Generative Pre-Training, OpenAI.

4. GPT-2 Paper: Language Models are Unsupervised Multitask Learners, OpenAI.

5. GPT-3 Paper: Language Models are Few-Shot Learners, OpenAI.

6. Jason W, Maarten B, Vincent Y, et al. Finetuned Language Models Are Zero-Shot Learners[J]. arXiv preprint arXiv: 2109.01652, 2021.

7. How OpenAI “Devil Trained” GPT? — InstructGPT Paper Interpretation: https://cloud.tencent.com/developer/news/979148

China Confidential Association

Science and Technology Branch

Long press to scan the code to follow us

Author: Zhang Wanyue, University of Chinese Academy of Sciences

Editor: Vision

Top 5 Highlights of 2022

Cross-network attacks: An introduction to breakthrough physical isolation network attack techniques

Thoughts on the Top-Level Design of Smart City Security

Revisiting Some New Issues Facing the Development of Digital Forensics Technology

The Development and Challenges of Low Earth Orbit Satellite Interconnected Networks

Introduction to LaserShark Contactless Attack Implant Technology

Exciting Article Review of the Period

Overview of Searchable Encryption Technology

Security Analysis of Electric Vehicle Charging Station Management Systems

Summary of Common Cybersecurity Threats and Their Protective Technologies

Analysis of Security Algorithm and Protocol Verification Issues

Analysis of IoT Security Issues in Blockchain

Leave a Comment Cancel reply