Editor’s Note

Since its launch on December 2, 2022, OpenAI’s ChatGPT has gained over a million users and sparked extensive discussion in Silicon Valley. It can perform a variety of common text output tasks, including writing code, debugging, translating literature, writing novels, creating business copy, generating recipes, doing homework, and grading assignments. Moreover, it can remember the context of conversations with users, providing remarkably realistic responses.

Although industry insiders believe that ChatGPT still has issues such as outdated training data, we cannot stop pondering questions like: Where will the endpoint of human-made artificial intelligence be? How will the relationship between humans and thinking machines evolve? These are questions we cannot cease to contemplate.

Written by | Sun Ruichen

Reviewed by | Zhang Zheng

Edited by | Chen Xiaoxue

The Evolution of ChatGPT: Past, Present, and Future Promotional poster for the movie “Dune” (Image source: IMBD.com)

The movie “Dune,” released at the end of last year, is a science fiction story set in the year 10191 (8169 years from now). During the viewing, I kept wondering: the lives of people in this story seem more primitive than today, and there are not many traces of artificial intelligence (AI) in the story. Later, I read the original novel of “Dune” and realized this was a deliberate setting by the author: at some point before the year 10191, a war occurred. In this war, humanity’s opponent was the thinking machines they had created themselves. In the brutal end of the war, humanity fought with all their might to defeat these thinking machines. Afterward, humanity decided to permanently ban the existence of these machines. Thus, the primitive world of Dune in 10191 came to be.

Last Friday, OpenAI launched a new AI conversational model, ChatGPT. Many, including myself, have experienced this new chatbot over the past week. After trying it out, you might have guessed it—I was reminded of the world of Dune.

The past decade seems to have been a “Cambrian explosion” in the field of artificial intelligence technology, with a plethora of new terms emerging and quickly gaining popularity in a short time. Many of these new terms and their abbreviations lack standardized Chinese translations, and industry insiders generally communicate directly using English abbreviations. This creates cognitive barriers for outsiders who wish to fully understand these technologies.

To understand the ChatGPT chatbot, one must first understand InstructGPT, GPT-3, GPT-2, GPT, Transformer, and the commonly used RNN models in the field of natural language processing prior to these.

1. The Predecessors of ChatGPT

In 2017, the Google Brain team published a paper titled “Attention is All You Need” at the Neural Information Processing Systems conference (NeurIPS, a premier academic conference in machine learning and AI). In this paper, the authors introduced the transformer model based on the self-attention mechanism for the first time and applied it to the understanding of human language, i.e., natural language processing.

Before this paper was published, the mainstream model in the field of natural language processing was the Recurrent Neural Network (RNN). The advantage of RNNs is their ability to better handle sequential data, such as language. However, this also leads to instability or premature cessation of effective training when processing longer sequences, such as long articles or books (due to phenomena like gradient vanishing or exploding during model training, which I won’t elaborate on here), as well as longer training times (since data must be processed sequentially and cannot be trained in parallel).

The architecture of the original Transformer model (Image source: Reference [1])

The transformer model proposed in 2017 can perform data computation and model training in parallel, reducing training time, and the resulting model can be interpreted grammatically, meaning it has interpretability.

This initial transformer model had 65 million adjustable parameters. The Google Brain team used various publicly available language datasets to train this initial transformer model. These datasets included the 2014 English-German Machine Translation Workshop (WMT) dataset (with 4.5 million pairs of English-German sentences), the 2014 English-French Machine Translation Workshop dataset (36 million English-French pairs), and some sentences from the Pennsylvania Treebank language dataset (including 40,000 sentences from the Wall Street Journal and an additional 17 million sentences from that corpus). Moreover, the Google Brain team provided the model architecture in the paper, allowing anyone to build similar models using their own data for training.

After training, this initial transformer model achieved top scores in translation accuracy, English constituent syntactic analysis, and other metrics, becoming the most advanced large language model (LLM) at that time.

Major milestones of large language models (LLM)

From the moment the transformer model was born, it profoundly influenced the trajectory of AI development in the following years. In just a few years, its impact has spread across various fields of artificial intelligence—from diverse natural language models to the AlphaFold2 model for predicting protein structures, all utilize it.

2. Continuous Iteration: Seeking the Limits of Language Models

Among the many teams researching the transformer model, OpenAI has been one of the few consistently focused on exploring its limits.

OpenAI was founded in December 2015 in San Francisco, California. Elon Musk, the founder of Tesla, was one of the co-founders of the company, providing early funding (he later exited the company but retained his financial support). In its early days, OpenAI was a nonprofit organization with the mission of developing AI technologies beneficial and friendly to human society. In 2019, OpenAI changed its nature and announced it would become a for-profit entity, a change related to the transformer model.

In 2018, less than a year after the transformer model was born, OpenAI published a paper titled “Improving Language Understanding by Generative Pre-training” and launched the GPT-1 (Generative Pre-training Transformer) model with 117 million parameters. This model was trained using a large dataset of classic book texts (BookCorpus), which contains over 7,000 unpublished books covering genres such as adventure, fantasy, and romance. After pre-training, the authors further trained the model using different specific datasets for four different language scenarios (also known as fine-tuning). The final trained model achieved better results in question answering, text similarity assessment, semantic entailment determination, and text classification than the base transformer model, becoming the new industry leader.

In 2019, the company announced a model with 1.5 billion parameters: GPT-2. The architecture of this model is similar to that of GPT-1, with the main difference being that GPT-2 is ten times larger. At the same time, they published a paper introducing this model titled “Language Models are Unsupervised Multitask Learners.” In this work, they used a new dataset primarily composed of web text that they collected. Unsurprisingly, the GPT-2 model set new scoring records for large language models across multiple language scenarios. In the paper, they provided the results of the GPT-2 model answering new questions (questions and answers not present in the model’s training data).

Results of the GPT-2 model answering new questions (Image source [3])

In 2020, this startup team surpassed itself again, publishing the paper “Language Models are Few-Shot Learners” and launching the latest GPT-3 model, which has 175 billion parameters. The architecture of the GPT-3 model is fundamentally the same as that of GPT-2, except that it is two orders of magnitude larger. The training set for GPT-3 is also much larger than that of the previous two GPT models: it includes a filtered dataset of all web crawled data (429 billion tokens), Wikipedia articles (3 billion tokens), and two different book datasets (670 million tokens in total).

Due to the enormous number of parameters and the scale of the training dataset required, training a GPT-3 model conservatively costs between five million and twenty million dollars—if more GPUs are used for training, the cost increases while the time decreases, and vice versa. One can say that this magnitude of large language model is no longer an affordable research project for ordinary scholars or individuals. Faced with such a large GPT-3 model, users can provide only small sample prompts or not provide any prompts at all to receive high-quality answers that meet their needs. Small sample prompts refer to users providing a few examples to the model before posing their language tasks (translation, text creation, answering questions, etc.).

GPT-3 can provide better answers based on user prompts (Image source: [4])

When GPT-3 was released, it did not offer a broad user interaction interface and required users to submit applications and get approved before registering, so the number of people who directly experienced the GPT-3 model was not large. Based on the experiences shared online by those who did try it, we know that GPT-3 can automatically generate complete, coherent long articles based on simple prompts, making it hard to believe that this is the work of a machine. GPT-3 can also write program code, create recipes, and perform almost all text creation tasks. After early testing, OpenAI commercialized the GPT-3 model: paying users can connect to GPT-3 via an application programming interface (API) to use the model for their required language tasks. In September 2020, Microsoft obtained exclusive licensing for the GPT-3 model, meaning they could exclusively access the source code of GPT-3. This exclusive license does not affect paying users’ continued access to the GPT-3 model via the API.

In March 2022, OpenAI published a paper titled “Training Language Models to Follow Instructions with Human Feedback” and launched the InstructGPT model, which is based on the GPT-3 model and further fine-tuned. InstructGPT’s model training incorporated human evaluations and feedback data, rather than relying solely on pre-prepared datasets.

During the public testing of GPT-3, users provided a large amount of dialogue and prompt data, and OpenAI’s internal data labeling team also generated a significant amount of manually labeled datasets. These labeled data can help the model learn not only from the data itself but also from human annotations (for example, certain sentences or phrases should be avoided).

OpenAI first used these data for supervised training to fine-tune GPT-3.

Next, they collected samples of answers generated by the fine-tuned model. Generally, for each prompt, the model can provide countless answers, but users typically only want to see one answer (which aligns with human communication habits). The model needs to rank these answers and select the best one. Therefore, the data labeling team manually scores and ranks all possible answers to select the one that best aligns with human communication habits. The results of these manual scores can further establish a reward model—a reward model can automatically provide feedback to the language model, encouraging it to provide good answers and suppressing bad ones, helping the model autonomously find the optimal answer.

Finally, the team used the reward model and more labeled data to continue optimizing the fine-tuned language model and iterated on it. The resulting model is called InstructGPT.

3. The Birth of ChatGPT

Today’s main character is ChatGPT and its predecessors, so we cannot avoid discussing OpenAI as the main line. From GPT-1 to InstructGPT, if we focus solely on OpenAI, we might overlook that other AI companies and teams were also making similar attempts during the same period. In the two years following the launch of GPT-3, several similar large language models emerged, but it must be said that the most famous model remains GPT-3.

Some competitors of GPT-3 (Image source: gpt3demo.com)

Returning to the present, during this year’s Neural Information Processing Systems conference, OpenAI announced to the world via social media their latest large-scale pre-trained language model: ChatGPT.

Similar to the InstructGPT model, ChatGPT is a chatbot developed by OpenAI after fine-tuning the GPT-3 model (also known as GPT-3.5). According to information from OpenAI’s official website, the ChatGPT model and the InstructGPT model are sibling models. Given that the largest InstructGPT model has 175 billion parameters (the same as the GPT-3 model), it is reasonable to believe that the parameter count of ChatGPT is also in that range. However, according to the literature, the InstructGPT model that performs optimally on conversational tasks has 1.5 billion parameters, so it is also possible that ChatGPT’s parameter count is similar.[5]

Since its launch on December 2, 2022, ChatGPT has gained over a million users. Users have shared examples of conversations on social media, indicating that ChatGPT, like GPT-3, can perform a range of common text output tasks, including writing code, debugging, translating literature, writing novels, creating business copy, generating recipes, doing homework, and grading assignments. One advantage of ChatGPT over GPT-3 is that the former responds more conversationally, while the latter is better at producing long articles but lacks colloquial expression. Some users have utilized ChatGPT to converse with customer service and retrieve overpaid amounts (this may suggest that ChatGPT has, in a sense, passed the Turing test), and it may become a good companion for socially anxious individuals.

4. Warning of Issues

The OpenAI development team warned users of some issues with the ChatGPT model upon its release, and these issues have been confirmed by global internet users through repeated testing.

First, the training dataset behind the large language model of ChatGPT is limited to data up to the end of 2021, so it cannot provide accurate answers regarding events that occurred in the past year. Secondly, when users seek accurate information from ChatGPT (for example, writing code or checking recipes), the accuracy of ChatGPT’s responses is inconsistent, and users need to possess the ability to discern the quality and accuracy of the answers. Due to accuracy issues, the coding community website StackOverflow has prohibited users from citing code generated by ChatGPT on its site.

In response, Zhang Zheng, director of the Amazon AWS AI Research Institute in Shanghai, commented: The training method of the ChatGPT model has a fatal flaw; the scoring mechanism for various possible answers during question answering is based on ranking, meaning the second step is a rough scoring. This leads to erroneous guesses being mixed in (for example, a higher-ranked answer A being better than a lower-ranked answer B does not mean that answer A is free from common sense or factual errors). Q&A is not only open-ended but also requires nuanced differentiation between reasonable and unreasonable responses. This issue is not insurmountable; there is still much foundational work to be done.

Finally, the way the questioner describes a problem can also affect the accuracy of ChatGPT’s answers. This issue may have unforeseen consequences. Earlier this year, OpenAI launched its latest AI drawing system, DALL·E 2 (alongside several similar products like Midjourney). Users only need to provide a text description, and DALL·E 2 can generate an image based on that description. It is not an exaggeration to say that the quality and style of these images can rival those created by professional artists.

An abstract painting generated by DALL·E 2 (Image source: openai.com)

As a result, while the art community is shocked, the business of prompt engineering has quietly emerged: good prompts can guide AI models to generate more suitable and aesthetically pleasing works; whereas inadequate prompts often lead to subpar student-level works (or even worse). Therefore, how to write good prompts and engage in high-quality dialogue with AI models has become a new entrepreneurial hotspot. A startup company in San Francisco, PromptBase, offers a service for $1.99 per prompt, primarily targeting content creation models like DALL·E 2 and GPT-3. Perhaps they will soon add ChatGPT to their business scope.

Based on the previously mentioned principles of few-shot learning and incorporating human feedback, we know that if we first provide the ChatGPT model with a few examples before posing a language task, or continuously provide feedback to guide ChatGPT, its responses will better meet our requirements. Thus, crafting a good prompt can lead to more surprises from ChatGPT.

5. The Evolution of Artificial Intelligence: Where Will It End?

From the 2017 Transformer to today’s ChatGPT, large language models have undergone numerous iterations, each more powerful than the last. In the future, OpenAI will continue to bring us GPT-4, GPT-5, and even GPT-100. Meanwhile, our exciting, bizarre, and imaginative chat records with ChatGPT will all become training data for the next generation of models.

In 2016, when OpenAI was founded, the initial intention was to develop AI technologies beneficial to humanity. Over the past six years, there has been no evidence suggesting they have deviated from this intention—in fact, ChatGPT and the large language models behind it seem to represent advanced productivity aimed at the future. We have reason to believe that AI technologies exemplified by large language models can help us better accomplish learning and work, leading to a better life; we also have reason to believe that we should continue to support, develop, and promote AI so that it can benefit the public. However, we can no longer ignore that the speed of AI technology evolution and iteration far exceeds that of human and biological evolution.

Elon Musk, co-founder of OpenAI, once discussed the original intention of OpenAI when realizing the enormous potential of artificial intelligence: “How can we ensure that the future brought by AI is friendly? In the process of trying to develop friendly AI technologies, there is always a risk that we might create something that worries us. However, the best barrier might be to allow as many people as possible to access and possess AI technology. If everyone can utilize AI technology, then the possibility of a small group of people possessing excessively powerful AI technology leading to dangerous consequences would diminish.”

However, what Musk did not mention is that even if everyone has the opportunity and ability to use AI technology, if the development of AI technology reaches a point where it becomes uncontrollable by humans, how can we establish our own fortress? How can we avoid a world war between humans and thinking machines, as alluded to in the story of “Dune”? The existence of ChatGPT is still far from a concern for people, but where will the endpoint of AI evolution be?

In the journey of creating artificial intelligence, it is difficult for humanity to stop questioning—will the rapidly developing AI technology one day force us to choose a primitive future like that in “Dune”?

ChatGPT does not know either.

Author Bio:

Sun Ruichen, PhD in Neurobiology from the University of California, San Diego, currently a data scientist at a pharmaceutical company.

References:

1.https://arxiv.org/abs/1706.03762

2.https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf

3.https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf

4.https://arxiv.org/abs/2005.14165v4

5.https://arxiv.org/abs/2203.02155

Editorial editing | Xiao Mao

Reprinted content only represents the author’s views

Does not represent the position of the Institute of Physics, Chinese Academy of Sciences

For reprints, please contact the original public account

Source: Sirexian

Editor: Xiao Fan

Recent Popular Articles Top 10

↓ Click the title to view ↓

1.Why is there a small hole in the bus seat? For farting?

2.Is it healthier to sleep hungry or eat a midnight snack?

3.Will pies fall from the sky? No! But frogs and dried fish will!

4.Why are some people always late? Actually, they do it on purpose…

5.How did ancient people verify their identity without an ID card?

6.Why can’t you take a light bulb out of your mouth? | No.334

7.Always feeling down at night? You really can’t blame yourself for everything!

8.Shocking! Staying up late actually has so many “benefits,” come see how many you have!

9.Which is larger: “infinity” or “infinity + 1”?

10.Why does people’s urine volume increase after it gets cold? | No.332

Click here to view all past popular articles

The Evolution of ChatGPT: Past, Present, and Future