Introduction to AI for Beginners

█ What Exactly Is AI?

AI is short for artificial intelligence.

The term artificial can confuse many students who may think it relates to the adjective for art. However, artificial means “man-made” or “synthetic,” which is the opposite of natural.

Intelligence is not easily mistaken; it means “intelligence.” The name of Intel Corporation is based on the first five letters of this word.

Combined, AI means “artificial intelligence,” which is the creation of intelligence through human means.

There are many definitions of AI within the industry. A more academic definition states:

AI is a comprehensive science that studies and develops theories, methods, technologies, and application systems to simulate, extend, and enhance human intelligent behavior.

This definition is quite convoluted and can be overwhelming.

In fact, we can break down AI for better understanding.

First of all, the essential attribute of AI is that it is a science, a field of technology.

It involves knowledge from various disciplines such as computer science, mathematics, statistics, philosophy, and psychology, but overall, it falls under the category of computer science.

Secondly, the purpose of AI research is to enable a “system” to possess intelligence.

This “system” can be a software program, a computer, or even a robot.

Thirdly, what level constitutes true intelligence?

This is the crux of the issue. Currently, the ability to perceive, understand, think, judge, and decide like a human is considered the realization of artificial intelligence.

When paired with physical carriers like robots and mechanical arms, AI can also achieve mobility.

Combining these three points makes understanding the definition of AI much easier.

█ What Is the Difference Between AI and Ordinary Computers?

AI is still fundamentally based on the basic functions of computers, utilizing semiconductor chip technology (which is why it is often referred to as “silicon-based”), as well as some computer systems and platforms.

So, how does it differ from traditional computer programs?

A traditional computer program is simply a set of rules. Programmers instruct the computer on the rules through code, and the computer processes input data based on these rules.

For example, consider the classic “if…else…” statement—”If age is greater than 65, then retire; otherwise, continue working.”

Then, the computer program will judge and process all input age data according to this rule.

However, in real life, many factors (such as images and sounds) are extremely complex and diverse, making it challenging to provide fixed rules that allow the computer to make highly accurate judgments and processes.

For instance, determining whether a dog is indeed a dog.

Dogs come in many breeds, each with different colors, sizes, and facial features. Dogs also display different expressions and postures at different times, and they can be in various backgrounds.

Therefore, the images of dogs captured by the computer through the camera are endless. It is very difficult to help the computer make judgments through a limited number of rules.

To enable computers to achieve intelligence similar to humans, we cannot rely on simple rule-driven methods; instead, we should continuously input data and answers, allowing the system to summarize features and form its own judgment rules.

In other words, in classic program design, what people input are rules (i.e., the program) and data, and the system outputs answers.

In contrast, the computational process of AI is divided into two steps:

The first step involves inputting data and the expected answers, while the system outputs the rules.

The second step applies the outputted rules to new data and then outputs answers.

The first step can be termed “training.” The second step is the actual “work.”

This is a typical difference between traditional computing programs and current mainstream AI technology. (Note that I am referring to “current mainstream AI.” There are some “historical AI” and “non-mainstream AI” that operate differently and cannot be generalized.)

█ What Categories of AI Exist?

As mentioned earlier, artificial intelligence is a vast scientific field.

Since its official emergence in the 1950s, many scientists have conducted extensive research on AI and produced remarkable results.

These studies are categorized into various schools based on different directional approaches. The most representative ones are the symbolic school, connectionist school, and behaviorist school.

These schools do not have a right or wrong distinction and often intersect and integrate.

In the early days (1960-1990), symbolic AI (represented by expert systems and knowledge graphs) was mainstream. Then, starting in 1980, connectionism (represented by neural networks) rose and has remained mainstream to this day.

In the future, new technologies may emerge, forming new schools as well.

In addition to directional approaches, we can also categorize AI based on intelligence levels and application fields.

By intelligence level, we can classify AI into: Weak AI, Strong AI, and Super AI.

Weak AI specializes in a single task or a group of related tasks and does not possess general intelligence capabilities. We are currently at this stage.

Strong AI is more advanced, possessing some general intelligence capabilities, able to understand, learn, and apply to various tasks. This is still in the theoretical and research stages and has not yet materialized.

Super AI is, of course, the most advanced. It surpasses human intelligence in almost all aspects, including creativity and social skills. Super AI is the ultimate form of the future, assuming it can be realized.

We will discuss the classification of AI by application field later.

█ What Is Machine Learning?

In fact, we have already mentioned machine learning when discussing rule summarization.

The core idea of machine learning is to construct a model that can learn from data and use this model for prediction or decision-making.

Machine learning is not a specific model or algorithm. It encompasses various types, such as:

Supervised Learning: The algorithm learns from labeled datasets, where each training sample has a known result.

Unsupervised Learning: The algorithm learns from unlabeled datasets.

Semi-supervised Learning: Combines a small amount of labeled data with a large amount of unlabeled data for training.

Reinforcement Learning: Learns which actions can yield rewards and which actions can lead to penalties through trial and error.

█ What Is Deep Learning?

Deep learning, specifically, refers to deep neural network learning.

Deep learning is an important branch of machine learning. Beneath machine learning lies a “neural network” pathway, and deep learning is an enhanced version of “neural network” learning.

Neural networks are representative of connectionism. As the name suggests, this pathway mimics the workings of the human brain, establishing connection models between neurons to achieve artificial neural computation.

The “depth” in deep learning refers to the number of hidden layers in the neural network.

Classic machine learning algorithms use neural networks with an input layer, one or two “hidden layers”, and an output layer.

Deep learning algorithms utilize more “hidden layers” (hundreds of them). This enhances their capabilities, allowing neural networks to accomplish more complex tasks.

The relationship between machine learning, neural networks, and deep learning can be illustrated by the following diagram:

█ What Are Convolutional Neural Networks and Recurrent Neural Networks?

Since the rise of neural networks in the 1980s, many models and algorithms have been developed. Different models and algorithms have their own characteristics and functions.

Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are well-known neural network models that emerged around the 1990s.

Their specific working principles are quite complex. Just remember:

Convolutional Neural Networks (CNNs) are used for processing data with a grid-like structure (such as images and videos). Therefore, they are typically used in computer vision for image recognition and image classification.

Recurrent Neural Networks (RNNs) are used for processing sequential data, such as language models and time series predictions. Thus, they are commonly used in natural language processing and speech recognition.

█ What Is a Transformer?

A transformer is also a neural network model. It is younger (introduced by Google Research in 2017) and more powerful than CNNs and RNNs.

As a non-specialist, you do not need to study its working principles; just know:

1. It is a deep learning model;

2. It uses a mechanism called self-attention;

3. It effectively resolves the bottleneck (limitations) issues of CNNs and RNNs;

4. It is well-suited for natural language processing (NLP) tasks. Compared to RNNs, its computations can be highly parallelized, simplifying model architecture and greatly improving training efficiency;

5. It has also been extended to other fields, such as computer vision and speech recognition;

6. Most of the large models we frequently mention are based on transformer.

There are many types of neural networks. I found a diagram online for reference:

█ What Is a Large Model?

The recent AI boom is largely due to the rise of large models. So, what is a large model?

A large model is a machine learning model with a massive parameter scale and complex computational structure.

Parameters refer to the variables learned and adjusted during model training. They define the model’s behavior, performance, implementation costs, and demand for computational resources. In simple terms, parameters are the components within the model used to make predictions or decisions.

Large models typically have millions to billions of parameters. In contrast, models with fewer parameters are classified as small models. For some niche areas or scenarios, small models may suffice.

Large models require extensive data for training and consume substantial computational resources.

There are many types of large models. The term “large model” primarily refers to language large models (trained on text data). However, there are also visual large models (trained on image data) and multimodal large models (involving both text and images).

The vast majority of large models are built on the core structure of the transformer and its variants.

By application field, large models can be divided into general large models and industry-specific large models.

General large models have broader training datasets covering more comprehensive fields. Industry-specific large models, as the name suggests, are trained on data from specific industries and applied in specialized areas (such as finance, healthcare, law, and industry).

█ What Is the Essence of GPT?

GPT-1, GPT-2… GPT-4o, etc., are all language large models launched by OpenAI in the United States, all based on the transformer architecture.

The full name of GPT is Generative Pre-trained Transformer.

Generative indicates that the model is capable of generating coherent and logical text content, such as completing dialogues, creating stories, writing code, or composing poetry and songs.

It is worth mentioning that the term AIGC stands for AI Generated Content, which refers to content generated by AI. This content can include text, images, audio, video, etc.

The GPT series focuses on text, while Google has also launched a competing product called BERT.

For text-to-image generation, notable examples include DALL·E (also from OpenAI), Midjourney (well-known), and Stable Diffusion (open-source).

For text-to-audio (music) generation, there are Suno (OpenAI), Stable Audio Open (by Stability.ai), and Audiobox (Meta).

For text-to-video generation, there are Sora (OpenAI), Stable Video Diffusion (by Stability.ai), and Soya (open-source). Images can also be used to generate videos, such as Tencent’s Follow-Your-Click.

AIGC is a definition from the “application dimension”; it is not a specific technology or model. The emergence of AIGC has expanded the capabilities of AI, breaking the previous limitations of AI primarily being used for recognition and broadening application scenarios.

Now, let’s continue explaining the second letter of GPT—Pre-trained.

Pre-trained indicates that the model undergoes training on a large-scale unlabeled text corpus to learn the statistical rules and potential structures of language.

Through pre-training, the model gains a certain degree of generality. The larger the training data (such as web text, news, etc.), the stronger the model’s capabilities.

The public’s attention toward AI surged mainly due to the explosion of ChatGPT in early 2023.

The “chat” in ChatGPT means chatting. ChatGPT is an AI dialogue application service developed by OpenAI based on the GPT model (it can also be understood as GPT-3.5).

Through this service, people can personally experience the power of the GPT model, which is beneficial for the promotion and dissemination of technology.

It has been proven that OpenAI’s strategy was successful. ChatGPT has attracted significant public attention and has successfully driven the development boom in the AI field.

█ What Can AI Do?

The applications of AI are extremely broad.

In summary, compared to traditional computer systems, AI provides expanded capabilities, including image recognition, speech recognition, natural language processing, and embodied intelligence.

Image recognition is sometimes categorized as computer vision (Computer Vision, CV), which enables computers to understand and process images and videos. Common applications include cameras, industrial quality inspection, and facial recognition.

Speech recognition involves understanding and processing audio to extract the information carried by the audio. Common applications include smartphone voice assistants, telephone call centers, and voice-controlled smart homes, often used in interactive scenarios.

Natural language processing, as previously mentioned, enables computers to understand and process natural language, knowing what we are actually saying. This is quite popular, often used in creative tasks such as writing news articles, drafting written materials, video production, game development, and music creation.

Embodied intelligence involves integrating artificial intelligence into a physical form (“body”) to gain and demonstrate intelligence through interaction with the environment.

Robots equipped with AI fall under embodied intelligence.

Stanford University recently launched a typical home embodiment robot called “Mobile ALOHA” that can cook, brew coffee, and even play with cats, going viral online.

It is worth mentioning that not all robots are humanoid, nor do all robots utilize AI.

Humanoid Robot

AI excels at processing vast amounts of data, learning and training through massive datasets, and completing tasks that are impossible for humans. In other words, it identifies potential patterns within large datasets.

Currently, AI’s applications across various vertical industries primarily extend from the capabilities mentioned above.

Let’s look at some common examples.

In healthcare, AI can analyze X-rays, CT scans, and MRI images to help identify abnormal areas and even make diagnostic judgments. AI can also identify cell mutations in tissue samples, assisting pathologists in cancer screening and other disease diagnoses.

AI can analyze patient genomic data to determine the most suitable treatment options. It can also predict disease trends based on a patient’s medical history and physiological indicators.

In drug development, AI can help simulate the interactions of chemical compounds, shortening the drug development cycle.

During serious public health events, AI can analyze epidemic data to predict disease transmission trends.

In finance, AI can monitor market dynamics in real-time, identify potential market risks, and formulate corresponding risk hedging strategies.

AI can also assess credit risks by analyzing multidimensional data such as borrowers’ credit histories, income situations, and spending behaviors. Additionally, AI can provide optimal investment portfolio recommendations based on investors’ personal financial situations, risk preferences, and return goals.

There are countless similar examples. In industrial manufacturing, education and tourism, commercial retail, agriculture, forestry, animal husbandry, public safety, and government governance, AI has already established practical scenarios and cases.

AI is changing society and transforming the work and lives of each of us.

█ How Should We View AI?

The commercial and social value of AI is undeniable. Its rise is also unstoppable.

From the perspective of enterprises, AI can automate repetitive and tedious tasks, improve productivity and quality while reducing production costs and labor costs.

This advantage is crucial for manufacturing and service industries, directly impacting a company’s competitiveness and even survival.

From the government’s perspective, AI can enhance governance efficiency, bring new business models, products, and services, and stimulate the economy.

Powerful AI is also a form of national competitiveness. In the context of technological competition and national defense, if AI technology lags behind others, it could lead to severe consequences.

From an individual perspective, AI can help us accomplish certain tasks and enhance our quality of life.

From the perspective of humanity as a whole, AI can play a significant role in disease treatment, disaster prediction, climate forecasting, and eradicating poverty.

However, everything has two sides. As a tool, AI has both advantages and disadvantages.

The most immediate disadvantage is the potential threat to a large number of human jobs, leading to massive unemployment. According to McKinsey’s research, between 2030 and 2060, approximately 50% of occupations may gradually be replaced by AI, especially for knowledge workers.

Introduction to AI for Beginners

Image from The New Yorker

In addition, AI could be used to wage war, commit fraud (such as voice imitation or deepfakes for scams), and infringe on civil rights (excessive data collection, invasion of privacy).

If only a few companies possess advanced AI technology, it may exacerbate social inequities. AI algorithm biases could also lead to unfairness.

As AI becomes increasingly powerful, it may create dependency, causing people to lose their ability to think independently and solve problems. The powerful creativity of AI could diminish human motivation and confidence in creativity.

There are also a series of issues surrounding AI development, including security (data breaches, system crashes), ethics, and moral dilemmas.

Currently, we do not have reliable solutions to these problems. Therefore, we can only explore, reflect, and resolve these issues gradually as we develop AI. It is essential to maintain vigilance and precaution regarding AI.

As ordinary people, the most practical approach is to first understand and learn about AI. Start by learning to use common AI tools and platforms to enhance work efficiency and improve quality of life.

As the saying goes: “In the future, it is not AI that will eliminate you, but those who master AI.” Instead of being anxious, it is better to face it bravely and embrace it actively, taking control early on.

Well, that’s all for today’s article. For an ordinary person, knowing these AI fundamentals is the first step to embracing AI. At least when chatting with others about AI, one won’t be completely lost.

Thank you all for your patience in reading, and see you next time!

Leave a Comment Cancel reply