With the popularity of AI products like ChatGPT and Wenxin Yiyan, generative AI has become a hot topic of discussion.

But why do we need to add the word “generative” in front of AI? Is there another type of AI?

What Exactly Is Generative AI?

If we simply categorize artificial intelligence by its purpose, AI can actually be divided into two types: decision-making AI and generative AI.

Decision-making AI: Focuses on analyzing situations and making decisions. It helps users or systems choose the best course of action by evaluating multiple options and possible outcomes. For example, in self-driving vehicles, decision-making AI systems determine when to accelerate, decelerate, or change lanes.For instance, in self-driving vehicles, decision-making AI systems determine when to accelerate, decelerate, or change lanes.

Generative AI: Focuses on creating new content. It can automatically generate text, images, music, and other content based on learned data.For example, you can send several papers to generative AI, and it can generate a literature review that encompasses the key ideas and important conclusions of these papers.

Now you understand why ChatGPT and Wenxin Yiyan belong to generative AI, right?

Next, let’s formally step into the world of generative AI.

The Past and Present of Generative AI

2In fact, generative AI has not just emerged in recent years; it has actually gone through three stages:

Early Development Stage

In 1950, Alan Turing proposed the famous “Turing Test,” which is a milestone in the field of generative AI, indicating the possibility of AI content generation.
In 1957, Lejaren Hiller and Leonard Isaacson completed the first music piece entirely composed by a computer, titled “Illiac Suite.”
Between 1964 and 1966, Joseph Weizenbaum developed the world’s first chatbot, “Eliza,” which completed interactive tasks through keyword scanning and reorganization.
In the 1980s, IBM created the voice-controlled typewriter “Tangora” based on a hidden Markov chain model.

The Accumulation Stage

With the development of the internet, the scale of data rapidly expanded, providing massive training data for AI algorithms. However, due to limited hardware foundations, the development at this time was not rapid.

In 2007, AI researcher Ross Goodwin from New York University’s AI system wrote the novel “1 The Road,” which is the world’s first novel entirely created by AI.
In 2012, Microsoft publicly demonstrated a fully automated simultaneous interpretation system that could automatically translate an English speaker’s content into Chinese speech through technologies such as speech recognition, language translation, and speech synthesis..

Rapid Development Stage

Starting from 2014, the introduction and iteration of numerous deep learning methods marked a new era for generative AI.

In 2017, Microsoft AI girl “Xiaoice” launched the world’s first poetry collection entirely composed by AI, titled “The Sunshine Lost the Glass Window.”
In 2019, the Google DeepMind team released the DVD-GAN architecture for generating continuous videos.
In 2020, OpenAI released ChatGPT-3, marking an important milestone in the field of natural language processing (NLP) and AIGC.
In 2021, OpenAI launched DALL-E, primarily used for generating content through text-image interaction.
Since 2022 to now, OpenAI has released multiple new models of ChatGPT, sparking another wave of AIGC, capable of understanding and generating natural language, engaging in complex conversations with humans.

Since then, generative AI has entered a state of explosive growth. So, what principles does generative AI rely on?

Understanding the Principles of Generative AI

From the previous introduction, you should have a superficial understanding of generative AI: learning knowledge + generating new knowledge.But how does it learn? How does it generate?At this point, we need to look at the deeper definition of generative AI:DefinitionGenerative AI, represented by ChatGPT, is about vectorizing and summarizing existing data and knowledge to derive the joint probabilities of the data. Thus, when generating content, it creates new content based on user needs and the probabilities of associated words.Are you feeling a bit confused? Don’t worry, this touches on the principles of generative AI, which I will slowly explain to you.

In fact, creating a generative AI is like turning a clay figure into a genius, requiring four steps: shape the clay figure → install the brain → feed knowledge → produce output.

Step 1: Shaping the Clay Figure – Building the Hardware Architecture

To create a “clay figure” of generative AI, the first consideration is the underlying hardware. The underlying hardware constitutes the computing and storage power of generative AI.

Computing Power – The Skeleton of the Clay FigureGenerative AI requires a lot of computation, especially when processing images and videos. Large-scale computing tasks rely on the following key hardware:

GPU (Graphics Processing Unit): Provides powerful parallel computing capabilities. By enabling thousands of small processing units to work in parallel, it significantly improves computing efficiency.
TPU (Tensor Processing Unit): Hardware specifically designed to accelerate AI learning, significantly speeding up computation and further enhancing the strength of the skeleton.

Storage Power – The Blood of the Clay FigureGenerative AI needs to process and store a large amount of data.For example, GPT-3 has 175 billion training parameters and 45TB of training data, generating 4.5 billion characters of content daily.The storage of this data relies on the following hardware facilities:

Large-capacity RAM: When training generative AI models, a large number of intermediate computation results and model parameters need to be stored in memory, and large-capacity RAM can significantly improve data processing speed.
SSD (Solid State Drive): A large-capacity SSD has high-speed read and write capabilities, allowing for quick loading and saving of data, enabling the clay figure to efficiently store information.

Once the clay figure is shaped, it is still just a puppet without any abilities, so we need to install a brain.

Step 2: Installing the Brain – Constructing the Software Architecture

The software architecture is the brain of the clay figure, determining how it will think and reason about data.

From a bionic perspective, humans hope that AI can mimic the operation mechanism of the human brain to think and reason about knowledge — this is commonly known as deep learning.

To achieve deep learning, researchers have proposed a variety of neural network architectures:

Deep Neural Networks (DNN) are the most common neural network architecture, but as data requirements for network architecture become increasingly complex, this method gradually becomes challenging.
Convolutional Neural Networks (CNN) are specifically designed for processing image data and can effectively handle image data, but require complex preprocessing of input data.
As task complexity increases, Recurrent Neural Network (RNN) architecture has become a common method for processing sequential data.
Due to RNNs encountering issues of gradient vanishing and model degradation when processing long sequences, the famous Transformer algorithm was proposed.

With the development of computing power, the network architecture of generative AI has become increasingly mature and has begun to focus on different aspects:

Transformer Architecture: Currently the mainstream architecture in the field of text generation, models like GPT and Llama2 are based on Transformer, achieving outstanding performance.
GANs Architecture: Widely used in image and video generation, capable of generating high-quality images and videos.
Diffusion Architecture: Achieved good results in image and audio generation, capable of generating high-quality and diverse content.

With the network architecture built, the brain is there, but it is empty. Therefore, we need to feed knowledge into this artificial brain through data training.

Step 3: Feeding Knowledge – Data Training

Currently, there are two training methods: Pre-training and SFT (Supervised Fine-Tuning)

Pre-training: This refers to feeding a large, general dataset to the AI for initial learning.

Models that have undergone pre-training are called “base models,” which have some understanding of every domain but cannot become experts in any specific field.
SFT: SFT refers to feeding a specific task dataset to the AI after pre-training to further train the model.

For example, based on an already pre-trained language model, fine-tuning it with specialized medical texts makes it more adept at handling medical-related Q&A or text generation tasks.

However, whether it is pre-training or SFT, how does the AI’s brain absorb this knowledge?

This involves the ability of “understanding”. Taking the Transformer architecture as an example, let’s discuss how AI understands text.For AI, understanding occurs in two steps:Understanding Words and Understanding Sentences.Understanding words essentially means classifying them. Researchers have proposed a method: decomposing words across different dimensions for classification.Assume there are four words: watermelon, strawberry, tomato, and cherry. AI decomposes these words across two dimensions:

Color Dimension: 1 represents red, 2 represents green.
Shape Dimension: 1 represents round, 2 represents oval.

Based on this dimension, AI scores and classifies the words.

Watermelon: Color=2 (green), Shape=1 (round)
Strawberry: Color=1 (red), Shape=2 (oval)
Tomato: Color=1 (red), Shape=1 (round)
Cherry: Color=1 (red), Shape=1 (round)

Through these scores, we can see the classification of words across different dimensions.For example, “tomato” and “cherry” have the same meaning in terms of color and shape, indicating they are similar in these dimensions; “strawberry” and “watermelon” differ in both dimensions, indicating they have different meanings.Of course, the dimensions distinguishing them are not limited to just two; AI can also score them based on size, sweetness, whether they have seeds, and many other dimensions for classification.As long as the dimensions are numerous enough and the scoring is accurate, the AI model can understand the meaning of a word more precisely.

For currently advanced AI models, the number of dimensions can often reach thousands.Learning words and understanding them as quantifiable results only completes the first step. Next, AI needs to further understand a collection of words: sentences.We know that even the same word can have different meanings in different sentences.For example:

This is a green hat.
A certain company is dedicated to building a green data center.

In different sentences, the word “green” has different meanings. How does AI know they have different meanings?

This is thanks to the “Self-Attention” mechanism in the transformer architecture.

Simply put, when AI understands a sentence containing a group of words, it not only understands the words themselves but also “looks at” the surrounding words. The correlation between individual words and other words in the sentence is called “attention,” and since it combines with the words of the same sentence, it is called “self-attention.”

Therefore, in the Transformer architecture, the process can be divided into the following two steps:

Convert each word into a vector. This vector represents the position of the word in multi-dimensional space, reflecting various features of the word.
Use the self-attention mechanism to focus on different parts of the sentence. It can consider the information from other words in the sentence while processing each word.

Step 4: Producing Output – Content Generation

Once AI has understood a large number of words and sentences, it can generate content. How does it generate content?This becomes a question of probability.Let me ask you a question:I eat × in a restaurant.What would you fill in for ×?Based on your previous experience, you would likely fill in “rice.” In fact, × could also be “cake,” “noodles,” “egg,” etc.

Like humans, generative AI also assigns probabilities to these words based on the experiences it learned in the third step. It then selects the word with the highest probability as the generated content. Next, AI repeats this process, choosing the next most probable word to generate more content.However, sometimes we want the answers to be diverse. Returning to the previous example, if we do not want AI to fill in “rice,” what should we do?AI provides a tuning parameter called temperature, ranging from 0 to 1.

At a temperature of 0, it means the matching probability should select the largest possible, in the above example, AI would likely choose “rice”;
At a temperature of 1, it means the matching probability should select the smallest possible, in the above example, AI would likely choose “cake.”

The closer the value is to 1, the more imaginative the content becomes.For example, if the temperature is set to 0.8, the sentence generated by AI might be:I eat a cake in the restaurant, this cake is big and round, I want to put it around my neck……

However, we see that most AI products only have a dialogue box. How do we adjust the temperature parameter?The answer is “prompts,” which are what we usually refer to as prompts.

If your input is “You are an expert in a certain field, please write a literature review about xx in a rigorous tone.” At this time, AI’s temperature is close to 0, and it will choose words with high matching probabilities to generate sentences.
If your input is “Please imagine the future of xx.” At this time, AI’s temperature is close to 1, and it will choose words with low matching probabilities to form sentences, generating unexpected content.

Now you understand the importance of prompts!Therefore, we can consider the essence of AI generation as a game ofword association: AI selects the next word based on the current word, the probabilities of previously recorded words, and your expectations.

Of course, the internal principles of generative AI are far more complex than what I have described. This is just a basic introduction.

Where Is Generative AI Headed?

So will generative AI truly achieve general artificial intelligence and replace humans? Currently, there are two perspectives:

Optimists: Optimists led by OpenAI CEO Altman and NVIDIA CEO Jensen Huang are very optimistic about the future of generative AI. They have stated that “in a few years, artificial intelligence will be stronger and more mature than it is now; in ten years, it will surely shine,” and that “AI may surpass human intelligence within 5 years.” “
Pessimists: Pessimists, led by deep learning pioneer Yang Likun, have always believed that generative AI cannot lead to artificial intelligence. He has stated on multiple occasions that “large language models like ChatGPT will never reach human intelligence levels,” and that “human-trained AI is difficult to surpass humans.”

So how should we ordinary people treat generative AI?I believe we ordinary people should treat it as a tool, learn to use it to improve our work efficiency, enrich our daily lives, maintain curiosity about the world, and fully enjoy the convenience brought by technology!

Source: ZTE Document

Original Title: This Is What Generative AI Really Is!!

Editor: K.Collider

Reproduced content only represents the author’s views

Does not represent the position of the Institute of Physics, Chinese Academy of Sciences

For reprint requests, please contact the original public account

Recent Popular Articles Top 10

↓ Click the title to view ↓

1.Why Does Paper Soaked in Oil Become Transparent? | No.410

2.Does Randomness Not Exist? Are We Really Living in a Huge Virtual World? | No.407

3.Is Something That Flies into Your Nose That Terrifying? Just a Glance Makes You Feel Uncomfortable!

4.Why Are the Steps at the Library Entrance So High and Long?

5.Why Does Taking Pictures of the Screen with a Phone Result in Colorful Patterns? | No.4086.Self-Identifying as an i-Person, But Others Say You Are an e-Person: You Might Be an “a-Person”7.Why Are Some People Directionally Challenged as Soon as They Step Out, While Others Are Living Maps?8.Reviewing Those Physics Knowledge You Forgot During the Holidays (Includes a Lot of Hand-Drawn Diagrams)9.This Type of Organism Lives in Everyone’s Body and May Weigh More Than the Brain10.Why Do Cats Drink Water More Elegantly Than Dogs? Click Here to View All Previous Popular Articles

Understanding Generative Artificial Intelligence

1.Why Does Paper Soaked in Oil Become Transparent? | No.410

2.Does Randomness Not Exist? Are We Really Living in a Huge Virtual World? | No.407

4.Why Are the Steps at the Library Entrance So High and Long?

Leave a Comment Cancel reply