Reviewing Hot Terms in the LLM Field: Which Ones Do You Not Know?

MLNLP community is a well-known machine learning and natural language processing community both domestically and internationally, covering NLP master’s and doctoral students, university teachers, and corporate researchers.

The Vision of the Community is to promote communication and progress between the academic and industrial sectors of natural language processing and machine learning, especially for beginners.

Reprinted from | Zhihu

Author | Glan

The following hot terms are not limited to the year 2024, nor do they encompass all of 2024. Let’s see if there are any you are familiar with.

[ MoE, Agent/Agentic, Sora, GraphRAG, GPT-4o, o1, ORM, PRM, test-time compute, Inference Scaling Laws, MCTS, Self-Play, Self-Rewarding, RFT, PPO, DPO, GRPO ……]

Di Xia Jia: The “Chinese” field takes the most common Chinese name; if there isn’t one, it’s a hard translation; the “Related” field refers to similar terms, not necessarily all; the “Source” is taken from what can be found on the internet, which may not be accurate; the “Hu Kan” field is just for fun, one person’s opinion;

Feel free to share your opinions in the comments section, let’s begin.

MoE

Full Name: Mixture-of-Experts

Chinese: 混合专家（模型）

Source: The concept was first proposed by Hinton in 1991 in the paper “Adaptive Mixtures of Local Experts” [1], and it gained popularity after the release of GPT-4 in March 2023, as hackers speculated that it used the MoE architecture. In December 2023, Mistral AI released the first open-source MoE architecture model Mixtral-8x7B [2], followed by DeepSeek releasing the first domestic open-source MoE architecture model DeepSeekMoE in January 2024 [3].

Hu Kan: In 2024, models started with DeepSeekMoE, thinking that the MoE architecture would shine in 2024, but it was interrupted by o1. However, by the end of the year, DeepSeek-V3 [4] somewhat restored MoE’s reputation. But V3 is so large; its download count has reached 155K [5] so far. Is everyone so wealthy now?

Agentic

Chinese: 智能体化

Source: To discuss Agentic, one must first mention Agent, a term that has been around for a long time. However, in the LLM field, it was first comprehensively defined in a June 2023 blog by OpenAI titled “LLM Powered Autonomous Agents” [6]. Later, in December 2023, OpenAI’s report “Practices for Governing Agentic AI Systems” [7] mentioned the term Agentic.

Hu Kan: In 2024, applications starting with Agent/Agentic were expected to flourish, but by the end of 2024, the impact was minimal. However, Anthropic’s report at the end of 2024, “Building effective agents” [8], was quite realistic. Currently, most agents are essentially Workflow + Prompt, but the story still needs to be told using the term “Agent”.

Sora

Source: In February 2024, OpenAI released a video generation model.

Related: In June 2024 Kuaishou Keling, July Zhiyu Qingying, August MiniMax video-01, September ByteDance PixelDance and Seaweed.

Hu Kan: At the end of 2024, I finally can pay to experience Sora, but after the experience, I found it quite different from what was touted at the beginning of the year!

GraphRAG

Chinese: 图检索增强生成

Source: In April 2024, Microsoft proposed this in “From Local to Global: A Graph RAG Approach to Query-Focused Summarization” [9].

Related: The concept of RAG was first proposed by Meta in 2020 in “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks” [10]. Currently, various RAG implementations are emerging, which will not be discussed here.

Hu Kan: Currently, RAG has become a paradigm for addressing issues such as poor knowledge timeliness, hallucinations, and lack of domain expertise in LLMs. The RAG direction remains hot in 2024, as it can be realistically applied. But is the Graph approach the correct way to unlock RAG?

GPT-4o

Source: In May 2024, OpenAI released a multimodal model.

Related: In July 2024, Step-1.5V, September Meta Llama 3.2, September Mistral AI Pixtral 12B, October Alibaba Qwen2-VL, October Baichuan Omni.

Hu Kan: 4o is multimodal end-to-end, but looking at 2024, multimodal still has a long way to go. The future AGI will certainly be multimodal, but the current AGI is still text-based.

o1

Source: In September 2024, OpenAI released an inference model.

Related: In November 2024, Alibaba QwQ-32B-Preview, November DeepSeek-R1-Lite, November Yuean k0-math, December Zhiyu GLM-Zero-Preview.

Hu Kan: 2024 is truly a divine year!

Next are the hot terms related to o1, as everyone has been researching o1 in the second half of the year

ORM; PRM

Full Name: Outcome-supervised Reward Model; Process-supervised Reward Model

Chinese: 结果监督奖励模型；过程监督奖励模型

Source: As early as May 2023, OpenAI proposed this in “Let’s Verify Step by Step” [11].

Hu Kan: After the emergence of o1, everyone has been decrypting it. PRM should be one of its core methods, and people have started training PRM. However, OpenAI has 800K labeled data. Although it is open-source, how much is not open-source?

train-time compute; test-time compute

Chinese: 训练时计算量；测试时计算量

Source: In September 2024, OpenAI mentioned this in the blog “Learning to reason with LLMs” [12].

Hu Kan: Looking at the original text,

We have found that the performance of o1 consistently improves with more reinforcement learning (train-time compute) and with more time spent thinking (test-time compute)

Time spent truly strengthens performance.

Inference Scaling Laws/Test-Time Scaling

Chinese: 推理扩展定律

Source: After the release of o1, the inference version of Scaling Laws emerged. The exact source is unclear, but this paper performed well: “Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for LLM Problem-Solving” [13].

Hu Kan: Opening a new phase, this diagram by Lao Huang is quite good.

Reviewing Hot Terms in the LLM Field: Which Ones Do You Not Know?

MCTS

Full Name: Monte Carlo Tree Search

Chinese: 蒙特卡洛树搜索

Source: First proposed in 2006 in “Bandit based Monte – Carlo Planning” [14].

Hu Kan: Is o1 really useful?

Speculated o1 reasoning paradigm: SA, MR, DC, SR, CI, EC

Full Name in Chinese:

Systematic Analysis (SA) Method Reuse (MR) Divide and Conquer (DC) Self-Refinement (SR) Context Identification (CI) Emphasizing Constraints (EC)

Source: A paper speculating on the o1 reasoning paradigm, “A Comparative Study on Reasoning Patterns of OpenAI’s O1 Model” [15].

Hu Kan: You close yours, and I’ll study mine.

Next are several “self” terms

Self-Play

Chinese: 自博弈

Source: The first heat was after the 2016 AlphaGo battle against Lee Sedol; this time, it became popular again with o1. Self-Play is a concept in traditional RL, and in August 2024, this review was quite good: “A Survey on Self-play Methods in Reinforcement Learning” [16].

Hu Kan: Those from NLP who are working on LLMs should embrace traditional RL.

Self-Rewarding

Chinese: 自我奖励

Source: In January 2024, Meta mentioned this in “Self-Rewarding Language Models” [17].

Hu Kan: It means no need for humans to label data, letting LLM act as a judge, but it feels like a long road ahead.

Self-Correct/Correction

Chinese: 自我纠错

Source: This concept emerged after the appearance of LLMs, and in September 2024, DeepMind mentioned it in “Training Language Models to Self-Correct via Reinforcement Learning” [18].

Hu Kan: Eight days after the release of o1, DeepMind released this paper, but it seems to have received little attention.

Self-Refine

Chinese: 自我优化

Source: Generally refers to a paper from Carnegie Mellon University in March 2023 titled “Self-refine: Iterative refinement with self-feedback” [19].

Hu Kan: It has become a baseline in many papers.

Self-Reflection

Chinese: 自我反思

Source: This has been mentioned quite a lot, here are a few good papers:

“Self-Reflection in LLM Agents: Effects on Problem-Solving Performance” [20];

“Self-RAG: Learning to Retrieve, Generate and Critique through Self-Reflection” [21];

“Towards Mitigating Hallucination in Large Language Models via Self-Reflection” [22];

Hu Kan: What is human reflection, and what is LLM reflection?

Self-Consistency

Chinese: 自我一致性

Source: Generally refers to a paper from Google in 2023 titled “Self-Consistency Improves Chain of Thought Reasoning in Language Models” [23].

Hu Kan: Looking forward to more practical “self” terms, as humans prefer low-energy tasks and do not like to do things themselves (like data cleaning).

RFT

Full Name: Reinforcement Fine-Tuning

Chinese: 强化微调

Source: Proposed during OpenAI’s 12 Days live stream on the second day; this is the video of the live stream [24], and this is the application form [25].

Note the difference from ByteDance’s ReFT (so is there really a difference?), OpenAI’s official abbreviation is RFT.

Today, we’re excited to introduce a new way of model customization for our O1 series: reinforcement fine-tuning, or RFT for short.

ReFT

Full Name: Reinforced Fine-Tuning

Chinese: 强化微调

Source: In January 2024, ByteDance proposed this in “ReFT: Reasoning with Reinforced Fine-Tuning” [26].

Hu Kan: From the information currently disclosed by OpenAI, the principles should not differ significantly from ByteDance’s ReFT. However, OpenAI’s concept is divine, PPO～RFT, Reward Model ～ Verifier. But in specialized fields where answers are fixed and Verifiers can also be well-defined tasks, if “dozens of data” can be very effective, it’s quite exciting. Let’s not wait until the end of 2025 to experience it like Sora…

Next are several “O” terms

PPO

Full Name: Proximal Policy Optimization

Chinese: 近端策略优化

Source: Proposed by OpenAI in 2017 in “Proximal Policy Optimization Algorithms” [27].

Hu Kan: The ancestor of the following O terms.

DPO

Full Name: Direct Preference Optimization

Chinese: 直接偏好优化

Source: Proposed by Stanford in 2023 in “Direct Preference Optimization: Your Language Model is Secretly a Reward Model” [28].

Hu Kan: Your emergence has delighted small workshops!

GRPO

Full Name: Group Relative Policy Optimization

Source: Proposed by DeepSeek in February 2024 in “DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models” [29].

Hu Kan: Elegant, practical, and efficient.

Some common “O” terms that have already been implemented

ORPO

Full Name: Odds Ratio Preference Optimization

Source: Proposed by KAIST AI in March 2024 in “ORPO: Monolithic Preference Optimization without Reference Model” [30].

KTO

Full Name: Kahneman-Tversky Optimization

Source: Proposed in February 2024 in “KTO: Model Alignment as Prospect Theoretic Optimization” [31].

SimPO

Full Name: Simple Preference Optimization

Source: Proposed in May 2024 in “SimPO: Simple Preference Optimization with a Reference-Free Reward” [32].

RLOO

Full Name: Reinforce Leave-One-Out

Source: Proposed by Cohere For AI in February 2024 in “Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs” [3].

As 2024 ends, what waves will GPT-5 and o3 bring in 2025? The bigger the waves, the more expensive the fish!