The MIT method teaches AI to recognize uncertainty without performance loss

News

4/24/2026, 4:21:37 AM

The MIT method teaches AI to recognize uncertainty without performance loss

MIT researchers have developed the RLCR method, which allows AI models to express uncertainty, significantly improving the reliability of their responses and eliminating "hallucinations" without sacrificing performance.

Researchers from the Massachusetts Institute of Technology's (MIT) Computer Science and Artificial Intelligence Laboratory (CSAIL) have developed an innovative training method that allows artificial intelligence (AI) models to express uncertainty in their responses. This breakthrough aims to combat the problem of overconfidence inherent in modern reasoning models, which often leads to so-called "hallucinations" and significantly undermines the overall reliability of their conclusions.

The new technique, named RLCR (Reinforcement Learning with Calibration Rewards), trains language models to generate not only responses but also adequately calibrated confidence scores for them. Experiments have shown impressive results: RLCR reduces calibration error by up to 90%, while maintaining or even improving accuracy on tasks the model has already been trained on, as well as on entirely new ones. The results of this work will be presented at the International Conference on Learning Representations later this month.

The root cause of AI models' overconfidence lies in standard reinforcement learning (RL) methods, which underpin recent advancements in AI reasoning. Such approaches, including those used in systems like OpenAI's o1, reward models solely for correct answers and penalize them for incorrect ones, completely ignoring intermediate states or uncertainty. This deprives models of the incentive to express uncertainty, effectively training them to answer any question with full confidence, even when their response is merely a guess.

The RLCR method effectively solves this problem by adding a single, yet crucial, parameter to the reward function: the Brier score. This well-established mathematical measure penalizes the discrepancy between a model's stated confidence and its actual accuracy. Thus, during training, models learn to simultaneously reason about the task itself and objectively assess their own uncertainty, producing an answer and a corresponding confidence score. The system penalizes both confidently incorrect answers and unduly uncertain but correct ones.

The consequences of such artificial overconfidence are particularly critical in areas where user decisions directly depend on AI outputs, such as medicine, law, or finance. A system that expresses a high degree of confidence regardless of the actual reliability of its recommendations becomes unreliable and potentially dangerous, as the user does not receive the necessary signal to seek a second opinion or additional verification. It has been mathematically proven that the RLCR reward structure guarantees the creation of models that are both accurate and well-calibrated.

The MIT team also demonstrated the practical utility of confidence scores produced by RLCR during the inference stage. For example, when a model generates several possible answers, selecting the one with the highest reported confidence, or weighting votes by confidence in a majority voting scheme, significantly improves the overall accuracy and reliability of the entire system. Notably, standard reinforcement learning not only fails to promote calibration but actively harms it, making models more capable yet also more overconfident.

Sources

MIT News — AI topic · 4/22/2026

Replies (0)

No replies in this topic yet.

Back