Sign Up to Our Newsletter

Be the first to know the latest updates

Monday, 30 June 2025
AI & Robotics

MIT Researchers Unveil “SEAL”: A New Step Towards Self-Improving AI

The concept of AI self-reform has been a warm theme in recent research circles, with the emerging papers such as OpenEE CEO Sam Altman and weighing in the future of self-developed intelligent systems with a hurry of major figures. Now, a new paper of MIT, titled “Self-Adapting Language Model”, introduces SealA novel structure that allows the large language model (LLM) to update its own weight. This development is actually seen as another important step towards the attainment of self-developed AI.

The research paper published yesterday has already ignited a lot of discussion including hacker news. Seal proposes a method where an LLM can generate its own training data through “self-editing” and later update its weight based on new input. Severse, this self-editing process is learned through learning reinforcement, with the reward mechanism associated with the downstream performance of the update model.

The time of this letter is particularly notable, which is looking at the recent bounce in the interest around AI self-development. Earlier this month, several other research efforts attracted attention, including Sakan AI and British Columbia University’s “Darwin-Goddal Machine (DGM),” CMU’s “Self-Interests Training (SRT),” Shanghai Jiao Tong University’s “MM-Apt” framework from MM-APP Fremework to Multimodal Framwew, “

Connecting the buzz, Openai CEO Sam Altman recently shared his future vision of the future with self-improvement AI and robot in his blog post, “The Gentle Singularity”. He said that while the initial millions of humanoid robots would need traditional manufacturing, they would then be able to operate the entire supply chain to create more robots, which in turn can build more chip manufacturing facilities, data centers and similarly. ” After this, after a tweet by @vraserx, an Openai, claiming an inner formula that the company was already internally recurrence of AI, a claim that instigated a comprehensive debate about its truth.

Regardless of the nuances of internal openai development, the mit paper on the seal provides solid evidence of the progression of AI towards self-development.

Understanding Seal: Self-Adapting Language Model

The main idea behind the seal is to enable the language model to improve themselves when by generating your own synthetic data while facing new data and by adapting your parameters through self-editing. The training purpose of the model is to generate these self-edits (SES) directly using data provided in terms of model.

The generation of these self-edits is learned through learning reinforcement. The model is rewarded when generated self-edits, once implemented, leads better performance on target work. Therefore, the seal can be conceptualized as an algorithm with two nested loops: an external reinforcement learning (RL) loop that optimizes the generation of self-edits, and an internal update loop uses self-edits generated to update the model through the gradient dynasty.

This method can be seen as an example of meta-learning, where attention is on how to produce an effective self-edit in meta-learning fashion.

A general structure

Sea is operated on a single function examples (C ,,), where C is relevant reference information, and τ defines downstream assessment to assess the adaptation of the τ model. For example, in a knowledge integration work, C may be a path to integrate into the internal knowledge of the model, and τ a set of questions about that route.

Given C, the model produces a self-edit SE, which then updates its parameters through supervised fine-tuning: ′ ′ ← SFT (θ, SE). Strengthening learning is used to customize this self-edited generation: the model takes an action (produces SE), based on the performance of LM ′, receives a reward R, and updates its policy to maximize the expected reward.

Researchers found that traditional online policy methods such as GRPOs and PPOs led unstable training. He eventually chose rest^m, a simple, filtering-based behavior cloning approach from a deepmind paper. This method can be seen as an expectation-overcrowding (EM) process, where e-step samples output from the current model policy, and the M-step only reinforces samples that receive a positive reward through supervised fine-tuning.

The paper also notes that the current implementation uses a single model to generate and learn from self-edit, these roles can be separated in the “teacher-student” setup.

Urgent seal in specific domain

The MIT team accelerated the seal in two specific domains: Knowledge integration And Learn some-shots,

  • Knowledge integration: The goal here is effectively integrating information from articles to the weight of the model.
  • Learning some-shot: It contains models compatible with very few examples.

Practical results

Some-succumb to which experimental results for both learning and knowledge integration show the effectiveness of the seal structure.

In Learn some-shotsUsing a Llama-3.2-1B-insstruct model, Seal improved the adaptation success rate considerably, RL received 72.5% compared to 20% for the model using the basic self-edit without training, and 0% without optimization. While still below “Oracle TTT” (an ideal base line), it indicates adequate progress.

For Knowledge integrationThe seals perform better with constant baseline methods, using a large Qwen2.5-7B model to integrate new facts from squad articles. Training with artificially generated data from the base qwen-2.5-7B model showed remarkable improvements already, and later proceeded from learning reinforcement. Accuracy showed rapid improvement on external RL recurrences, often crossing the setup using GPT-4.1 data within only two recurrents.

Qualitative examples from paper explain how learning reinforcement leads to generation of more elaborate self-edits, resulting in performing performance.

Promising, the researchers also accepted some limitations of the seal structure, including aspects related to terrible mistakes, computational overheads and reference-dependent evaluation. These are discussed in detail in the original paper.

original paper: https://arxiv.org/pdf/2506.10943

Project Site: https://jyopari.github.io/Posts/seal

Github repo: https://github.com/Continual- Intelligence/seal

Source link

Anuragbagde69@gmail.com

About Author

Leave a Reply

Your email address will not be published. Required fields are marked *

Stay updated with the latest trending news, insights, and top stories. Get the breaking news and in-depth coverage from around the world!

Get Latest Updates and big deals

    Our expertise, as well as our passion for web design, sets us apart from other agencies.