10.5.25

Agentic AI: The Next Frontier in Autonomous Intelligence

 Agentic AI represents a transformative leap in artificial intelligence, shifting from passive, reactive tools to proactive, autonomous agents capable of decision-making, learning, and collaboration. Unlike traditional AI models that require explicit instructions, agentic AI systems can understand context, anticipate needs, and act independently to achieve specific goals. 

Key Characteristics of Agentic AI

  • Autonomy and Decision-Making: Agentic AI systems possess the ability to make decisions without human intervention, enabling them to perform complex tasks and adapt to new situations dynamically. 

  • Multimodal Capabilities: These agents can process and respond to various forms of input, including text, voice, and images, facilitating more natural and intuitive interactions. 

  • Emotional Intelligence: By recognizing and responding to human emotions, agentic AI enhances user engagement and provides more personalized experiences, particularly in customer service and healthcare. Collaboration with Humans: Agentic AI is designed to work alongside humans, augmenting capabilities and enabling more efficient workflows through shared decision-making processes.

Real-World Applications

  • Enterprise Automation: Companies like Microsoft and Amazon are integrating agentic AI into their platforms to automate complex business processes, improve customer service, and enhance operational efficiency. 

  • Healthcare: Agentic AI assists in patient care by monitoring health data, providing personalized recommendations, and supporting medical professionals in diagnosis and treatment planning. 

  • Finance: In the financial sector, agentic AI is employed for algorithmic trading, risk assessment, and fraud detection, enabling faster and more accurate decision-making.

  • Software Development: AI agents are increasingly used to write, test, and debug code, accelerating the software development lifecycle and reducing the potential for human error.

Challenges and Considerations

While the potential of agentic AI is vast, it also presents challenges that must be addressed:

  • Ethical and Privacy Concerns: Ensuring that autonomous systems make decisions aligned with human values and maintain user privacy is paramount. 

  • Transparency and Accountability: Understanding how agentic AI makes decisions is crucial for trust and accountability, especially in high-stakes applications. 

  • Workforce Impact: As AI systems take on more tasks, there is a need to reskill the workforce and redefine roles to complement AI capabilities. 

The Road Ahead

Agentic AI is poised to redefine the interaction between humans and machines, offering unprecedented levels of autonomy and collaboration. As technology continues to evolve, the integration of agentic AI across various sectors promises to enhance efficiency, innovation, and user experiences. However, careful consideration of ethical implications and proactive governance will be essential to harness its full potential responsibly.

ZEROSEARCH: Simulating Search to Train Retrieval-Augmented LLMs at Zero API Cost

Introduction

Retrieval-Augmented Generation (RAG) has become a cornerstone for grounding large language models (LLMs) in up-to-date information. Yet, existing approaches that integrate live search engines face two critical hurdles: unpredictable document quality and prohibitive API expenses during reinforcement learning (RL) training arXiv. ZEROSEARCH, introduced by Sun et al., offers an elegant solution—train LLMs’ internal “search” strategies without ever contacting a real search engine, slashing costs and stabilizing learning.


Methodology Deep Dive

1. Search Simulation via Supervised Fine-Tuning

Rather than querying Google or Bing, ZEROSEARCH first converts an LLM into a retrieval module (π_ψ) through lightweight supervised fine-tuning (SFT).

  • Data Collection: The authors collect interaction trajectories by prompting the base LLM to interact with a real search engine until a correct answer is produced (“positive”) or an incorrect one (“negative”).

  • Prompt Design: Query–document pairs from these trajectories are extracted. The fine-tuning prompt explicitly labels whether the generated document should be useful or noisy, enabling the model to simulate both high- and low-quality retrievals on demand (Table 2) arXiv.

2. Curriculum-Based Rollout Strategy

To progressively challenge the policy model (π_θ), ZEROSEARCH employs a curriculum that gradually increases the noise probability (pᵢ) of simulated documents over training steps:

pi=ps+(i/m1b1)×(peps)p_i = p_s + \bigg(\frac{i/m - 1}{b - 1}\bigg) \times (p_e - p_s)
  • Parameters:

    • ps, pe: initial and final noise probabilities

    • i/m: fraction of completed training steps

    • b: exponential base (default 4)

  • Effect: Early training relies on mostly useful documents, allowing π_θ to learn structured reasoning. Over time, noisy retrievals dominate, forcing robust search strategies arXiv.

3. Reinforcement Learning Objective

ZEROSEARCH frames the optimization as:

maxπθ    Ex,y[rϕ(x,y)    βDKL(πθπref)],\max_{\pi_\theta} \;\; \mathbb{E}_{x,y}\Big[\,r_\phi(x,y)\;-\;\beta\,D_{\mathrm{KL}}\big(\pi_\theta\,\|\,\pi_{\mathrm{ref}}\big)\Big],

where:

  • rₚhi(x,y): F1-based reward (balances precision & recall, avoids “reward hacking” seen with Exact Match) arXiv.

  • π_ref: reference model (for KL-penalty regularization).

  • Compatible Algorithms: PPO, GRPO, Reinforce++.


Key Results Overview

  • A 3B-parameter simulation LLM effectively incentivizes π_θ’s search skills at zero API cost.

  • A 7B retrieval module matches real Google Search performance; a 14B model surpasses it on benchmark QA tasks.

  • Generalizes across both base and instruction-tuned LLMs, and under diverse RL algorithms arXiv.


Implications for the ML Industry

  1. Cost-Effective RAG Training
    Organizations can now sidestep expensive search-API fees during RL-based retrieval training, democratizing advanced RAG strategies for smaller teams.

  2. Controlled Noise Injection
    The curriculum approach offers principled noise scheduling—models become robust not only to clean retrievals but also to adversarial or low-quality documents, enhancing real-world resilience.

  3. Scalable, On-Premises Solutions
    By fully simulating search behaviors, enterprises can run end-to-end RAG pipelines in-house, preserving data privacy and reducing dependency on third-party services.

  4. Extensible Framework
    ZEROSEARCH’s modular design—plugging in any simulation LLM and RL algorithm—facilitates rapid experimentation. Researchers can explore new reward functions (e.g., retrieval diversity), fine-tune custom domains, or apply to multimodal search settings.

  5. Toward Autonomous Agents
    As LLMs evolve into general-purpose agents, ZEROSEARCH paves the way for self-sufficient information gathering, where agents learn to both seek and synthesize knowledge without external calls.


Conclusion
ZEROSEARCH represents a paradigm shift in training retrieval-augmented LLMs: by simulating instead of querying, it eliminates cost barriers, stabilizes learning through controlled noise, and scales from 3B to 14B models. For the ML industry, this means more accessible, robust, and private RAG solutions—setting the stage for truly autonomous, knowledge-seeking AI agents.

New Research Compares Fine-Tuning and In-Context Learning for LLM Customization

 On May 9, 2025, VentureBeat reported on a collaborative study by Google DeepMind and Stanford University that evaluates two prevalent methods for customizing large language models (LLMs): fine-tuning and in-context learning (ICL). The research indicates that ICL generally provides better generalization capabilities compared to traditional fine-tuning, especially when adapting models to novel tasks. 

Understanding Fine-Tuning and In-Context Learning

Fine-tuning involves further training a pre-trained LLM on a specialized dataset, adjusting its internal parameters to acquire new knowledge or skills. In contrast, ICL does not alter the model's parameters; instead, it guides the model by providing examples of the desired task within the input prompt, allowing the model to infer how to handle similar queries. 

Experimental Approach

The researchers designed controlled synthetic datasets featuring complex, self-consistent structures, such as imaginary family trees and hierarchies of fictional concepts. To ensure the novelty of the information, they replaced all nouns, adjectives, and verbs with invented terms, preventing any overlap with the models' pre-training data. The models were then tested on various generalization challenges, including logical deductions and reversals. 

Key Findings

The study found that, in data-matched settings, ICL led to better generalization than standard fine-tuning. Models utilizing ICL were more adept at tasks like reversing relationships and making logical deductions from the provided context. However, ICL is generally more computationally expensive at inference time, as it requires providing additional context to the model for each use. 

Introducing Augmented Fine-Tuning

To combine the strengths of both methods, the researchers proposed an augmented fine-tuning approach. This method involves using the LLM's own ICL capabilities to generate diverse and richly inferred examples, which are then added to the dataset used for fine-tuning. Two main data augmentation strategies were explored:

  1. Local Strategy: Focusing on individual pieces of information, prompting the LLM to rephrase single sentences or draw direct inferences, such as generating reversals.

  2. Global Strategy: Providing the full training dataset as context, then prompting the LLM to generate inferences by linking particular documents or facts with the rest of the information, leading to longer reasoning traces.

Models fine-tuned on these augmented datasets showed significant improvements in generalization, outperforming both standard fine-tuning and plain ICL. 

Implications for Enterprise AI Development

This research offers valuable insights for developers and enterprises aiming to adapt LLMs to specific domains or proprietary information. While ICL provides superior generalization, its computational cost at inference time can be high. Augmented fine-tuning presents a balanced approach, enhancing generalization capabilities while mitigating the continuous computational demands of ICL. By investing in creating ICL-augmented datasets, developers can build fine-tuned models that perform better on diverse, real-world inputs.

Karpathy doesn't use a fancy app to manage his research. He uses a folder, Obsidian, and an AI — and I want to copy it. He posted about ...