Wandering Nomad

13.5.25

Sakana AI Unveils Continuous Thought Machines: A Leap Towards Human-like AI Reasoning

Tokyo-based Sakana AI has introduced a novel AI architecture named Continuous Thought Machines (CTMs), aiming to enable artificial intelligence models to reason more like human brains and with significantly less explicit guidance. This development, announced on May 12, 2025, tackles a core challenge in AI: moving beyond pattern recognition to achieve genuine, step-by-step reasoning.

CTMs represent a departure from traditional deep learning models by explicitly incorporating time and the synchronization of neuron activity as a fundamental component of their reasoning process. This approach is inspired by the complex neural dynamics observed in biological brains, where the timing and interplay between neurons are critical to information processing.

Most current AI architectures, while powerful, abstract away these temporal dynamics. Sakana AI's CTMs, however, are designed to leverage these neural dynamics as their core representation.The architecture introduces two key innovations: neuron-level temporal processing, where individual neurons use unique parameters to process a history of incoming signals, and neural synchronization, which is employed as a latent representation for the model to observe data and make predictions.

This unique design allows CTMs to "think" through problems in a series of internal "thought steps," effectively creating an internal dimension where reasoning can unfold. This contrasts with conventional models that might process information in a single pass.The ability to observe this internal process also offers greater interpretability, allowing researchers to visualize how the model arrives at a solution, much like tracing a path through a maze.

Sakana AI's research indicates that CTMs demonstrate strong performance and versatility across a range of challenging tasks, including image classification, maze solving, sorting, and question-answering. A notable feature is their capacity for adaptive compute, meaning the model can dynamically adjust its computational effort, stopping earlier for simpler tasks or continuing to process for more complex challenges without needing additional complex instructions.

The introduction of Continuous Thought Machines marks a significant step in the quest for more biologically plausible and powerful AI systems.[2] By focusing on the temporal dynamics of neural activity, Sakana AI aims to bridge the gap between the computational efficiency of current AI and the nuanced reasoning capabilities of the human brain, potentially unlocking new frontiers in artificial intelligence.

10.5.25

Zencoder Introduces Zen Agents: Revolutionizing Team-Based AI in Software Development

On May 9, 2025, Zencoder announced the launch of Zen Agents, a groundbreaking platform designed to transform software development by introducing collaborative AI tools tailored for team environments. Unlike traditional AI coding assistants that focus on individual productivity, Zen Agents emphasizes team-based workflows, enabling organizations to create, share, and deploy specialized AI agents across their development processes.

Bridging the Collaboration Gap in Software Engineering

Andrew Filev, CEO and founder of Zencoder, highlighted the limitations of current AI tools that primarily cater to individual developers. He pointed out that in real-world scenarios, software development is inherently collaborative, and existing tools often overlook the complexities of team dynamics. Zen Agents addresses this gap by facilitating the creation of AI agents that can be customized for specific frameworks, workflows, or codebases, and shared across teams to ensure consistency and efficiency.

Technical Innovation: Integration with Model Context Protocol (MCP)

A standout feature of Zen Agents is its implementation of the Model Context Protocol (MCP), a standard initiated by Anthropic and supported by OpenAI. MCP allows large language models to interact seamlessly with external tools, enhancing the capabilities of AI agents within the development lifecycle. To support this integration, Zencoder has introduced its own registry comprising over 100 MCP servers, facilitating a robust ecosystem for AI tool interaction.

Open-Source Marketplace: Harnessing Collective Intelligence

Zen Agents features an open-source marketplace where developers can contribute and discover custom AI agents. This community-driven approach mirrors successful ecosystems like Visual Studio Code extensions and npm packages, allowing for rapid expansion of capabilities and fostering innovation. Early adopters have already developed agents that automate tasks such as code reviews, accessibility enhancements, and integration of design elements from tools like Figma directly into codebases.

Enterprise-Ready with a Focus on Security and Compliance

Understanding the importance of security and compliance in enterprise environments, Zencoder has ensured that Zen Agents meets industry standards, boasting certifications like ISO 27001, SOC 2 Type II, and ISO 42001 for responsible AI management systems. These credentials position Zen Agents as a viable solution for organizations seeking to integrate AI into their development workflows without compromising on security.

Flexible Pricing to Accommodate Diverse Needs

Zencoder offers a tiered pricing model for Zen Agents to cater to various user requirements:

Free Tier: Access to basic features suitable for individual developers or small teams.
$20/Month Plan: Enhanced capabilities for growing teams needing more advanced tools.
$40/Month Plan: Comprehensive features designed for larger organizations with complex development needs.

Looking Ahead: Enhancing Developer Productivity

Zencoder envisions Zen Agents evolving towards greater autonomy, aiming to amplify developer productivity by minimizing context-switching and streamlining workflows. By focusing on the collaborative aspects of software development, Zen Agents aspires to facilitate a "flow state" for developers, where AI agents handle routine tasks, allowing human developers to concentrate on creative and complex problem-solving.

Agentic AI: The Next Frontier in Autonomous Intelligence

Agentic AI represents a transformative leap in artificial intelligence, shifting from passive, reactive tools to proactive, autonomous agents capable of decision-making, learning, and collaboration. Unlike traditional AI models that require explicit instructions, agentic AI systems can understand context, anticipate needs, and act independently to achieve specific goals.

Key Characteristics of Agentic AI

Autonomy and Decision-Making: Agentic AI systems possess the ability to make decisions without human intervention, enabling them to perform complex tasks and adapt to new situations dynamically.
Multimodal Capabilities: These agents can process and respond to various forms of input, including text, voice, and images, facilitating more natural and intuitive interactions.
Emotional Intelligence: By recognizing and responding to human emotions, agentic AI enhances user engagement and provides more personalized experiences, particularly in customer service and healthcare. Collaboration with Humans: Agentic AI is designed to work alongside humans, augmenting capabilities and enabling more efficient workflows through shared decision-making processes.

Real-World Applications

Enterprise Automation: Companies like Microsoft and Amazon are integrating agentic AI into their platforms to automate complex business processes, improve customer service, and enhance operational efficiency.
Healthcare: Agentic AI assists in patient care by monitoring health data, providing personalized recommendations, and supporting medical professionals in diagnosis and treatment planning.
Finance: In the financial sector, agentic AI is employed for algorithmic trading, risk assessment, and fraud detection, enabling faster and more accurate decision-making.
Software Development: AI agents are increasingly used to write, test, and debug code, accelerating the software development lifecycle and reducing the potential for human error.

Challenges and Considerations

While the potential of agentic AI is vast, it also presents challenges that must be addressed:

Ethical and Privacy Concerns: Ensuring that autonomous systems make decisions aligned with human values and maintain user privacy is paramount.
Transparency and Accountability: Understanding how agentic AI makes decisions is crucial for trust and accountability, especially in high-stakes applications.
Workforce Impact: As AI systems take on more tasks, there is a need to reskill the workforce and redefine roles to complement AI capabilities.

The Road Ahead

Agentic AI is poised to redefine the interaction between humans and machines, offering unprecedented levels of autonomy and collaboration. As technology continues to evolve, the integration of agentic AI across various sectors promises to enhance efficiency, innovation, and user experiences. However, careful consideration of ethical implications and proactive governance will be essential to harness its full potential responsibly.

ZEROSEARCH: Simulating Search to Train Retrieval-Augmented LLMs at Zero API Cost

Introduction

Retrieval-Augmented Generation (RAG) has become a cornerstone for grounding large language models (LLMs) in up-to-date information. Yet, existing approaches that integrate live search engines face two critical hurdles: unpredictable document quality and prohibitive API expenses during reinforcement learning (RL) training arXiv. ZEROSEARCH, introduced by Sun et al., offers an elegant solution—train LLMs’ internal “search” strategies without ever contacting a real search engine, slashing costs and stabilizing learning.

Methodology Deep Dive

1. Search Simulation via Supervised Fine-Tuning

Rather than querying Google or Bing, ZEROSEARCH first converts an LLM into a retrieval module (π_ψ) through lightweight supervised fine-tuning (SFT).

Data Collection: The authors collect interaction trajectories by prompting the base LLM to interact with a real search engine until a correct answer is produced (“positive”) or an incorrect one (“negative”).
Prompt Design: Query–document pairs from these trajectories are extracted. The fine-tuning prompt explicitly labels whether the generated document should be useful or noisy, enabling the model to simulate both high- and low-quality retrievals on demand (Table 2) arXiv.

2. Curriculum-Based Rollout Strategy

To progressively challenge the policy model (π_θ), ZEROSEARCH employs a curriculum that gradually increases the noise probability (pᵢ) of simulated documents over training steps:

p_i = p_s + \bigg(\frac{i/m - 1}{b - 1}\bigg) \times (p_e - p_s)

Parameters:
- ps, pe: initial and final noise probabilities
- i/m: fraction of completed training steps
- b: exponential base (default 4)
Effect: Early training relies on mostly useful documents, allowing π_θ to learn structured reasoning. Over time, noisy retrievals dominate, forcing robust search strategies arXiv.

3. Reinforcement Learning Objective

ZEROSEARCH frames the optimization as:

\max_{\pi_\theta} \;\; \mathbb{E}_{x,y}\Big[\,r_\phi(x,y)\;-\;\beta\,D_{\mathrm{KL}}\big(\pi_\theta\,\|\,\pi_{\mathrm{ref}}\big)\Big],

where:

rₚhi(x,y): F1-based reward (balances precision & recall, avoids “reward hacking” seen with Exact Match) arXiv.
π_ref: reference model (for KL-penalty regularization).
Compatible Algorithms: PPO, GRPO, Reinforce++.

Key Results Overview

A 3B-parameter simulation LLM effectively incentivizes π_θ’s search skills at zero API cost.
A 7B retrieval module matches real Google Search performance; a 14B model surpasses it on benchmark QA tasks.
Generalizes across both base and instruction-tuned LLMs, and under diverse RL algorithms arXiv.

Implications for the ML Industry

Cost-Effective RAG Training
Organizations can now sidestep expensive search-API fees during RL-based retrieval training, democratizing advanced RAG strategies for smaller teams.
Controlled Noise Injection
The curriculum approach offers principled noise scheduling—models become robust not only to clean retrievals but also to adversarial or low-quality documents, enhancing real-world resilience.
Scalable, On-Premises Solutions
By fully simulating search behaviors, enterprises can run end-to-end RAG pipelines in-house, preserving data privacy and reducing dependency on third-party services.
Extensible Framework
ZEROSEARCH’s modular design—plugging in any simulation LLM and RL algorithm—facilitates rapid experimentation. Researchers can explore new reward functions (e.g., retrieval diversity), fine-tune custom domains, or apply to multimodal search settings.
Toward Autonomous Agents
As LLMs evolve into general-purpose agents, ZEROSEARCH paves the way for self-sufficient information gathering, where agents learn to both seek and synthesize knowledge without external calls.

Conclusion
ZEROSEARCH represents a paradigm shift in training retrieval-augmented LLMs: by simulating instead of querying, it eliminates cost barriers, stabilizes learning through controlled noise, and scales from 3B to 14B models. For the ML industry, this means more accessible, robust, and private RAG solutions—setting the stage for truly autonomous, knowledge-seeking AI agents.

New Research Compares Fine-Tuning and In-Context Learning for LLM Customization

On May 9, 2025, VentureBeat reported on a collaborative study by Google DeepMind and Stanford University that evaluates two prevalent methods for customizing large language models (LLMs): fine-tuning and in-context learning (ICL). The research indicates that ICL generally provides better generalization capabilities compared to traditional fine-tuning, especially when adapting models to novel tasks.

Understanding Fine-Tuning and In-Context Learning

Fine-tuning involves further training a pre-trained LLM on a specialized dataset, adjusting its internal parameters to acquire new knowledge or skills. In contrast, ICL does not alter the model's parameters; instead, it guides the model by providing examples of the desired task within the input prompt, allowing the model to infer how to handle similar queries.

Experimental Approach

The researchers designed controlled synthetic datasets featuring complex, self-consistent structures, such as imaginary family trees and hierarchies of fictional concepts. To ensure the novelty of the information, they replaced all nouns, adjectives, and verbs with invented terms, preventing any overlap with the models' pre-training data. The models were then tested on various generalization challenges, including logical deductions and reversals.

Key Findings

The study found that, in data-matched settings, ICL led to better generalization than standard fine-tuning. Models utilizing ICL were more adept at tasks like reversing relationships and making logical deductions from the provided context. However, ICL is generally more computationally expensive at inference time, as it requires providing additional context to the model for each use.

Introducing Augmented Fine-Tuning

To combine the strengths of both methods, the researchers proposed an augmented fine-tuning approach. This method involves using the LLM's own ICL capabilities to generate diverse and richly inferred examples, which are then added to the dataset used for fine-tuning. Two main data augmentation strategies were explored:

Local Strategy: Focusing on individual pieces of information, prompting the LLM to rephrase single sentences or draw direct inferences, such as generating reversals.
Global Strategy: Providing the full training dataset as context, then prompting the LLM to generate inferences by linking particular documents or facts with the rest of the information, leading to longer reasoning traces.

Models fine-tuned on these augmented datasets showed significant improvements in generalization, outperforming both standard fine-tuning and plain ICL.

Implications for Enterprise AI Development

This research offers valuable insights for developers and enterprises aiming to adapt LLMs to specific domains or proprietary information. While ICL provides superior generalization, its computational cost at inference time can be high. Augmented fine-tuning presents a balanced approach, enhancing generalization capabilities while mitigating the continuous computational demands of ICL. By investing in creating ICL-augmented datasets, developers can build fine-tuned models that perform better on diverse, real-world inputs.

9.5.25

Fidji Simo Appointed as OpenAI's CEO of Applications, Signaling Strategic Expansion

On May 8, 2025, OpenAI announced the appointment of Fidji Simo, the current CEO and Chair of Instacart, as its new CEO of Applications. In this newly established role, Simo will oversee the development and deployment of OpenAI's consumer and enterprise applications, reporting directly to CEO Sam Altman. This move underscores OpenAI's commitment to expanding its product offerings and scaling its operations to meet growing global demand.

Transition from Instacart to OpenAI

Simo will remain at Instacart during a transitional period, assisting in the onboarding of her successor, who is expected to be selected from the company's existing leadership team. After stepping down as CEO, she will continue to serve as Chair of Instacart's Board.

In a message shared with her team and later posted publicly, Simo expressed her enthusiasm for the new role:

“Joining OpenAI at this critical moment is an incredible privilege and responsibility. This organization has the potential of accelerating human potential at a pace never seen before, and I am deeply committed to shaping these applications toward the public good.”

Strategic Implications for OpenAI

The creation of the CEO of Applications role reflects OpenAI's evolution from a research-focused organization to a multifaceted entity delivering AI solutions at scale. With Simo at the helm of the Applications division, OpenAI aims to enhance its consumer-facing products, such as ChatGPT, and expand its enterprise offerings. This strategic realignment allows Altman to concentrate more on research, computational infrastructure, and AI safety systems.

Simo's Background and Expertise

Before leading Instacart, Simo held significant roles at Facebook (now Meta), including Vice President and Head of the Facebook app, where she was instrumental in developing features like News Feed, Stories, and Facebook Live. Her experience in scaling consumer technology platforms and monetization strategies positions her well to drive OpenAI's application development and deployment.

Additionally, Simo has been a member of OpenAI's Board of Directors since March 2024, providing her with insight into the company's mission and operations. Her appointment follows other strategic hires, such as former Nextdoor CEO Sarah Friar as CFO and Kevin Weil as Chief Product Officer, indicating OpenAI's focus on strengthening its leadership team to support its growth ambitions.

Mem0 Introduces Scalable Memory Architectures to Enhance AI Conversational Consistency

On May 8, 2025, AI research company Mem0 announced the development of two new memory architectures, Mem0 and Mem0g, aimed at improving the ability of large language models (LLMs) to maintain context over prolonged conversations. These architectures are designed to dynamically extract, consolidate, and retrieve key information from dialogues, enabling AI agents to exhibit more human-like memory capabilities.

Addressing the Limitations of Traditional LLMs

While LLMs have demonstrated remarkable proficiency in generating human-like text, they often struggle with maintaining coherence in extended or multi-session interactions due to fixed context windows. Even with context windows extending to millions of tokens, challenges persist:

Conversation Length: Over time, dialogues can exceed the model's context capacity, leading to loss of earlier information.
Topic Variability: Real-world conversations often shift topics, making it inefficient for models to process entire histories for each response.
Attention Degradation: LLMs may overlook crucial information buried deep in long conversations due to the limitations of their attention mechanisms.

These issues can result in AI agents forgetting essential details, such as previous customer interactions or user preferences, thereby diminishing their effectiveness in applications like customer support, planning, and healthcare.

Innovations in Memory Architecture

Mem0 and Mem0g aim to overcome these challenges by implementing scalable memory systems that:

Dynamically Extract Key Information: Identifying and storing relevant details from ongoing conversations.
Consolidate Contextual Data: Organizing extracted information to maintain coherence across sessions.
Efficiently Retrieve Past Interactions: Accessing pertinent historical data to inform current responses without processing entire conversation histories.

By focusing on these aspects, Mem0's architectures seek to provide AI agents with a more reliable and context-aware conversational ability, closely mirroring human memory functions.

Implications for Enterprise Applications

The introduction of Mem0 and Mem0g holds significant promise for enterprises deploying AI agents in environments requiring long-term contextual understanding. Applications include:

Customer Support: AI agents can recall previous customer interactions, enhancing service quality.
Personal Assistants: Maintaining user preferences and past activities to provide personalized assistance.
Healthcare: Remembering patient history and prior consultations to inform medical advice.

By addressing the memory limitations of traditional LLMs, Mem0's architectures aim to enhance the reliability and effectiveness of AI agents across various sectors.

OpenAI Introduces Reinforcement Fine-Tuning for o4-mini Model, Empowering Enterprises with Customized AI Solutions

On May 8, 2025, OpenAI announced the availability of Reinforcement Fine-Tuning (RFT) for its o4-mini reasoning model, enabling enterprises to create customized AI solutions tailored to their unique operational needs.

Enhancing AI Customization with RFT

RFT allows developers to adapt the o4-mini model to specific organizational goals by incorporating feedback loops during training. This process facilitates the creation of AI systems that can:

Access and interpret proprietary company knowledge
Respond accurately to queries about internal products and policies
Generate communications consistent with the company's brand voice

Developers can initiate RFT through OpenAI's online platform, making the process accessible and cost-effective for both large enterprises and independent developers.

Deployment and Integration

Once fine-tuned, the customized o4-mini model can be deployed via OpenAI's API, allowing seamless integration with internal systems such as employee interfaces, databases, and applications. This integration supports the development of internal chatbots and tools that leverage the tailored AI model for enhanced performance.

Considerations and Cautions

While RFT offers significant benefits in customizing AI models, OpenAI advises caution. Research indicates that fine-tuned models may exhibit increased susceptibility to issues like "jailbreaks" and hallucinations. Organizations are encouraged to implement robust monitoring and validation mechanisms to mitigate these risks.

Expansion of Fine-Tuning Capabilities

In addition to RFT for o4-mini, OpenAI has extended supervised fine-tuning support to its GPT-4.1 nano model, the company's most affordable and fastest offering. This expansion provides enterprises with more options to tailor AI models to their specific requirements

Alibaba’s ZeroSearch: Empowering AI to Self-Train and Slash Costs by 88%

On May 8, 2025, Alibaba Group unveiled ZeroSearch, an innovative reinforcement learning framework designed to train large language models (LLMs) in information retrieval without relying on external search engines. This approach not only enhances the efficiency of AI training but also significantly reduces associated costs.

Revolutionizing AI Training Through Simulation

Traditional AI training methods for search capabilities depend heavily on real-time interactions with search engines, leading to substantial API expenses and unpredictable data quality. ZeroSearch addresses these challenges by enabling LLMs to simulate search engine interactions within a controlled environment. The process begins with a supervised fine-tuning phase, transforming an LLM into a retrieval module capable of generating both relevant and irrelevant documents in response to queries. Subsequently, a curriculum-based rollout strategy is employed during reinforcement learning to gradually degrade the quality of generated documents, enhancing the model's ability to discern and retrieve pertinent information.

Achieving Superior Performance at Reduced Costs

In extensive evaluations across seven question-answering datasets, ZeroSearch demonstrated performance on par with, and in some cases surpassing, models trained using actual search engines. Notably, a 14-billion-parameter retrieval module trained with ZeroSearch outperformed Google Search in specific benchmarks. Financially, the benefits are substantial; training with approximately 64,000 search queries using Google Search via SerpAPI would cost about $586.70, whereas utilizing a 14B-parameter simulation LLM on four A100 GPUs incurs only $70.80—a remarkable 88% reduction in costs.

Implications for the AI Industry

ZeroSearch's introduction marks a significant shift in AI development paradigms. By eliminating dependence on external search engines, developers gain greater control over training data quality and reduce operational costs. This advancement democratizes access to sophisticated AI training methodologies, particularly benefiting startups and organizations with limited resources. Furthermore, the open-source release of ZeroSearch's code, datasets, and pre-trained models on platforms like GitHub and Hugging Face fosters community engagement and collaborative innovation.

Looking Ahead

As AI continues to evolve, frameworks like ZeroSearch exemplify the potential for self-sufficient learning models that minimize external dependencies. This development not only streamlines the training process but also paves the way for more resilient and adaptable AI systems in various applications.