Wandering Nomad

19.5.25

AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications, and Challenges

A recent study by researchers Ranjan Sapkota, Konstantinos I. Roumeliotis, and Manoj Karkee delves into the nuanced differences between AI Agents and Agentic AI, providing a structured taxonomy, application mapping, and an analysis of the challenges inherent to each paradigm.

Defining AI Agents and Agentic AI

AI Agents: These are modular systems primarily driven by Large Language Models (LLMs) and Large Image Models (LIMs), designed for narrow, task-specific automation. They often rely on prompt engineering and tool integration to perform specific functions.
Agentic AI: Representing a paradigmatic shift, Agentic AI systems are characterized by multi-agent collaboration, dynamic task decomposition, persistent memory, and orchestrated autonomy. They move beyond isolated tasks to coordinated systems capable of complex decision-making processes.

Architectural Evolution

The transition from AI Agents to Agentic AI involves significant architectural enhancements:

AI Agents: Utilize core reasoning components like LLMs, augmented with tools to enhance functionality.
Agentic AI: Incorporate advanced architectural components that allow for higher levels of autonomy and coordination among multiple agents, enabling more sophisticated and context-aware operations.

Applications

AI Agents: Commonly applied in areas such as customer support, scheduling, and data summarization, where tasks are well-defined and require specific responses.
Agentic AI: Find applications in more complex domains like research automation, robotic coordination, and medical decision support, where tasks are dynamic and require adaptive, collaborative problem-solving.

Challenges and Proposed Solutions

Both paradigms face unique challenges:

AI Agents: Issues like hallucination and brittleness, where the system may produce inaccurate or nonsensical outputs.
Agentic AI: Challenges include emergent behavior and coordination failures among agents.

To address these, the study suggests solutions such as ReAct loops, Retrieval-Augmented Generation (RAG), orchestration layers, and causal modeling to enhance system robustness and explainability.

References

Sapkota, R., Roumeliotis, K. I., & Karkee, M. (2025). AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges. arXiv preprint arXiv:2505.10468.

Ultra-FineWeb: A Trillion-Token Dataset Enhancing LLM Accuracy Across Benchmarks

Researchers from Tsinghua University and ModelBest have introduced Ultra-FineWeb, a large-scale, high-quality dataset comprising approximately 1 trillion English tokens and 120 billion Chinese tokens. This dataset aims to enhance the performance of large language models (LLMs) by providing cleaner and more efficient training data.

Efficient Data Filtering Pipeline

The creation of Ultra-FineWeb involved an efficient data filtering pipeline that addresses two main challenges in data preparation for LLMs:

Lack of Efficient Data Verification Strategy:
Traditional methods struggle to provide timely feedback on data quality. To overcome this, the researchers introduced a computationally efficient verification strategy that enables rapid evaluation of data impact on LLM training with minimal computational cost.
Selection of Seed Data for Classifier Training:
Selecting appropriate seed data often relies heavily on human expertise, introducing subjectivity. The team optimized the selection process by integrating the verification strategy, improving filtering efficiency and classifier robustness.

A lightweight classifier based on fastText was employed to efficiently filter high-quality data, significantly reducing inference costs compared to LLM-based classifiers.

Benchmark Performance

Empirical results demonstrate that LLMs trained on Ultra-FineWeb exhibit significant performance improvements across multiple benchmark tasks, including MMLU, ARC, CommonSenseQA, and others. The dataset's quality contributes to enhanced training efficiency and model accuracy.

Availability

Ultra-FineWeb is available on Hugging Face, providing researchers and developers with access to this extensive dataset for training and evaluating LLMs.

References

17.5.25

How FutureHouse’s AI Agents Are Reshaping Scientific Discovery

In a major leap for scientific research, FutureHouse—a nonprofit backed by former Google CEO Eric Schmidt—has introduced a powerful lineup of AI research agents aimed at accelerating the pace of scientific discovery. Built to support scientists across disciplines, these agents automate key parts of the research workflow—from literature search to chemical synthesis planning—reducing bottlenecks and enhancing productivity.

This suite includes four primary agents: Crow, Falcon, Owl, and Phoenix, each specialized in a unique aspect of the research pipeline. Together, they form a comprehensive AI-powered infrastructure for modern science.

Meet the AI Agents Changing Science

1. Crow – The Concise Search Specialist

Crow acts as a rapid-response research assistant. It provides short, precise answers to technical queries by intelligently retrieving evidence from full-text scientific papers. Designed for speed and accuracy, it’s especially useful for API-based interactions, where precision and performance matter most. Crow is built on top of FutureHouse’s custom PaperQA2 architecture.

2. Falcon – Deep Research Assistant

Falcon takes things further by conducting expansive literature reviews. It produces full-length research reports in response to broader or more open-ended scientific questions. By analyzing papers, data sources, and context-rich materials, Falcon allows researchers to dive deep into topics without manually sorting through endless PDFs.

3. Owl – Precedent Investigator

Owl helps scientists find out whether an experiment or research idea has already been executed. This is crucial for grant applications, patent filings, and ensuring that researchers don’t waste time reinventing the wheel. By surfacing related studies and experiments, Owl enables more informed, original work.

4. Phoenix – The Chemistry Innovator

Phoenix is built for early-stage chemistry research. Leveraging cheminformatics tools, it assists in designing molecules, suggesting synthetic routes, and evaluating chemical feasibility. It builds upon an earlier FutureHouse prototype called ChemCrow and remains in active development as a sandbox tool for chemists to explore and provide feedback.

Performance and Potential

In benchmark tests, Crow, Falcon, and Owl outperformed PhD-level biologists on scientific retrieval and reasoning tasks. Unlike many AI tools that only read paper abstracts or summaries, these agents consume and analyze full-text documents, allowing them to detect nuanced issues like methodological flaws or statistical limitations.

Although Phoenix is still in its experimental phase and may sometimes produce errors, it represents an important step toward automating complex tasks in synthetic chemistry.

Why This Matters

The bottlenecks of modern science often lie not in experimentation, but in navigating the overwhelming volume of prior work. By offloading repetitive and time-consuming research tasks to AI, FutureHouse's agents free up scientists to focus on creativity, innovation, and critical thinking.

These tools are also being made openly available for scientists and research institutions, fostering a collaborative environment for AI-augmented science.

Final Takeaway

FutureHouse’s AI agents aren’t just productivity boosters—they’re a vision of a new research paradigm. By augmenting human researchers with scalable, intelligent assistants, we’re witnessing the early stages of a revolution in how science is done. As these tools evolve, they hold the potential to dramatically accelerate scientific discovery across disciplines.