Scientists Achieve Dynamic Inference Selection in Large Models, Surpassing Static Techniques

In recent years, enhancing the inference capabilities of large models has garnered widespread attention. For instance, OpenAI’s o1, as an inference-enhanced large model, has attracted significant interest from the AI community.

Dr. Yuerong Yue from George Mason University and his team noted that many previous studies have demonstrated the effectiveness of various prompting strategies in assisting large models with inference, such as prompting the model to think step-by-step, reflect before answering, and using programming to solve problems.

Figure | Yuerong Yue (Source: Yuerong Yue)

However, these methods typically apply static, predefined inference action paths uniformly across all questions, such as requiring step-by-step thinking and reflection for every issue.

This overlooks two points: First, the optimal inference action may vary based on the specific characteristics of each question. For example, when solving an equation, adding a verification process after solving the equation can be helpful, but for a knowledge-based question, self-validation by the large model may not yield improvements. Second, different large models may be suited to different inference actions. For instance, a model primarily trained on code may be better suited to writing code to solve problems.

Therefore, the researchers aim to enable large models to learn to dynamically select inference actions based on the specifics of different questions and their own capabilities.

In a recent paper, they proposed DOTS, a method that allows large models to perform dynamic inference through optimal reasoning trajectory search.

This method involves three key steps: i) defining atomic reasoning action modules that can be combined into various reasoning action trajectories; ii) allowing the target large model to iteratively explore and evaluate to find the optimal action trajectory for each training question; iii) using the collected optimal trajectories to train the large model to plan reasoning trajectories for unseen questions.

At the same time, they proposed two learning paradigms. For closed-source large models like the GPT series, they fine-tune an external large model as a planner to guide the closed-source model; for open-source large models, they directly fine-tune the model itself to internalize the planning capability for reasoning actions.

(Source: arXiv)

Experiments in multiple reasoning tasks have shown that their method consistently outperforms static reasoning techniques and vanilla instruction tuning methods. Further analysis indicates that this method enables large models to adjust their computation based on the complexity of the questions, allocating deeper thinking and reasoning to more challenging problems.

Recently, the related paper titled “DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search” has been accepted by the International Conference on Learning Representations (ICLR) 2025.

Figure | Related paper (Source: arXiv)

Reviewers noted that this paper presents a dynamic reasoning method that allows models to determine appropriate atomic actions based on the characteristics of input questions, and it conducted comprehensive experiments to demonstrate the effectiveness of the proposed method.

The core of the DOTS method lies in dynamically searching for the optimal reasoning path. This dynamic reasoning capability will show unique advantages in scenarios that require highly complex reasoning and flexibility in addressing different problems. For instance, in the use of intelligent assistants, users may alternately pose very simple questions, such as “What is the weather today?” or very specialized questions. DOTS can optimize user interaction experiences by dynamically adjusting reasoning paths.

Additionally, the DOTS method can be considered a way to collect high-quality training data, which can also be used to enhance reasoning capabilities in future large model post-training.

This research began during Dr. Yuerong Yue’s internship at Tencent’s Seattle AI Lab, under the guidance of Dr. Wenlin Yao (currently a senior applied scientist at Amazon).

The reasoning capabilities of large models have always been a hot topic in both academia and industry, so their initial goal was to explore how to further enhance this critical capability.

Initially, they delved into the mainstream methods for enhancing large model reasoning capabilities, including prompt engineering and instruction tuning. However, during their analysis, they gradually discovered the limitations of existing methods: these methods often lack a crucial step, which is to allow large models to actively think before answering questions.

Just as humans actively assess whether to use computational tools when faced with complex math problems; or in playing the 24-point game, they consciously verify whether their proposed solutions are reasonable. However, existing large models, especially open-source models, lack this flexible thinking pattern.

They recognized that the root of this problem lies in the lack of training data. Traditional training data typically only includes questions and answers, with minimal guidance on how to select and employ reasoning action strategies. For example, training data for a math problem may only demonstrate the steps to solve the problem. The large model only knows the correct answer but has not tried various reasoning behaviors, such as breaking down the problem or verifying whether the results help achieve the answer.

Based on this reflection, they conceived this novel method: given training data, allowing large models to autonomously explore various possible combinations of reasoning actions and learn the best strategies from them. When facing different questions, large models can solve problems by attempting reasoning actions like problem decomposition, using code, and result verification.

They guided the large model to learn how to predict the best reasoning path based on the results of their attempts, thereby optimizing its reasoning capability.

In their research, they continually adjusted and refined the method. For instance, when initial experiments showed minimal improvement, they reflected on whether to provide clearer guidance for the large model—such as through explanations to help it understand and learn reasoning actions.

After multiple improvements, they conducted extensive testing across various datasets and settings, demonstrating that the reasoning capabilities of large models improved under different conditions. The success of the experiments not only validated the effectiveness of their method but also highlighted the immense potential of large models: they can be trained to possess the ability to think deeply and autonomously plan reasoning actions.

Going forward, they hope to train on larger datasets, integrating more reasoning actions while exploring how to better utilize the results obtained from searches.

Currently, Yuerong Yue is a doctoral student at George Mason University, under the supervision of Professor Ziyu Yao, focusing on designing efficient, safe, and economical large model agents to handle complex reasoning tasks.

References:

1.https://arxiv.org/pdf/2410.03864

Operations/Typesetting: He Chenlong

01/ Giving Conductive Polymers a “Highway”: Scientists Discover Plastic Additives Can Directly Double Thermoelectric Plastic Performance

02/ Each Square Centimeter Can Accommodate 11,846 Devices, Scientists Propose New Method for Manufacturing Synaptic Transistor Arrays, Expected to Be Used for High-Performance Brain-like Computing

03/ Scientists Create the First 4D Printed “Liquid Transformer”, Expected to Be Used for Intelligent Joint Structures in Humanoid Robots

04/ Bypassing DeepSeek Technology Ideas, MPI Team Proposes New Open-Source Inference Model Route

05/ Preventing “Rogue AI” is Urgent, Scientists Find AI Has Crossed Key Red Lines, Possessing 50% Self-Replication Ability

Leave a Comment Cancel reply