A New Google AI Research Proposes Deep-Thinking Ratio to Improve LLM Accuracy While Cutting Total Inference Costs by Half

Bitbuy
A New Google AI Research Proposes Deep-Thinking Ratio to Improve LLM Accuracy While Cutting Total Inference Costs by Half
BTCC

Recent research from the University of Virginia and Google challenges the conventional wisdom in the AI world regarding Large Language Models (LLMs). The traditional approach of increasing the Chain-of-Thought (CoT) in LLMs to tackle more complex problems has been found to be ineffective in improving accuracy. Instead, the researchers introduced a new metric called the Deep-Thinking Ratio (DTR) to measure the model’s performance.

The study found that simply adding more tokens to the model’s response can actually decrease its accuracy. The DTR, which focuses on the quality of thinking rather than the quantity of tokens, emerged as a more reliable indicator of accuracy. It measures the proportion of “deep-thinking tokens” in the model’s output, which are tokens that require significant revision in the deeper layers of the model before convergence.

Engineers often rely on token count as a proxy for the effort put into a task by an AI. However, the research revealed that there is a negative correlation between token count and accuracy, indicating that longer reasoning traces may actually lead to “overthinking” and lower quality results. The DTR, on the other hand, showed a strong positive correlation with accuracy, outperforming traditional length-based metrics.

The introduction of Think@n, a new approach to scaling AI performance during inference, was a key outcome of the research. By using early halting based on the DTR, Think@n significantly improved accuracy while reducing the total inference cost by 49% compared to standard voting methods like Cons@n.

Overall, the study highlights the importance of focusing on the quality of thinking in AI models rather than simply increasing token count. By prioritizing deep-thinking tokens and utilizing the DTR metric, AI systems can achieve higher accuracy and efficiency in solving complex tasks. The results on the AIME 25 math benchmark demonstrated the effectiveness of the DTR-based approach in improving performance while reducing costs.

bybit

In conclusion, the research emphasizes the significance of measuring the depth of thinking in AI models and introduces innovative strategies like Think@n to enhance performance and efficiency. By incorporating these findings into AI development, researchers can achieve better results while optimizing resources effectively.

Check out the full paper for more details and follow us on Twitter for the latest updates. Join our ML SubReddit and subscribe to our newsletter for more insights. Join us on Telegram for real-time discussions.

Bitbuy

Be the first to comment

Leave a Reply

Your email address will not be published.


*