Wandering Nomad: LLM

Showing posts with label LLM. Show all posts

19.6.25

Andrej Karpathy Declares the Era of Software 3.0: Programming in English, Building for Agents, and Rewriting the Stack

Andrej Karpathy on the Future of Software: The Rise of Software 3.0 and the Agent Era

At a packed AI event, Andrej Karpathy—former Director of AI at Tesla and founding member of OpenAI—delivered a compelling address outlining a tectonic shift in how we write, interact with, and deploy software. “Software is changing again,” Karpathy declared, positioning today’s shift as more radical than anything the industry has seen in 70 years.

From Software 1.0 to 3.0

Karpathy breaks down the evolution of software into three stages:

Software 1.0: Traditional code written explicitly by developers in programming languages like Python or C++.
Software 2.0: Neural networks trained via data and optimized using backpropagation—no explicit code, just learned weights.
Software 3.0: Large Language Models (LLMs) like GPT-4 and Claude, where natural language prompts become the new form of programming.

“We are now programming computers in English,” Karpathy said, highlighting how the interface between humans and machines is becoming increasingly intuitive and accessible.

GitHub, Hugging Face, and the Rise of LLM Ecosystems

Karpathy draws powerful parallels between historical shifts in tooling: GitHub was the hub for Software 1.0; Hugging Face and similar platforms are now becoming the repositories for Software 2.0 and 3.0. Prompting an LLM is no longer just a trick—it’s a paradigm. And increasingly, tools like Cursor and Perplexity represent what he calls partial autonomy apps, with sliding scales of control for the user.

In these apps, humans perform verification while AIs handle generation, and GUIs become crucial for maintaining speed and safety.

AI as Utilities, Fabs, and Operating Systems

Karpathy introduced a powerful metaphor: LLMs as a new form of operating system. Just as Windows or Linux manage memory and processes, LLMs orchestrate knowledge and tasks. He explains that while LLMs operate with the reliability and ubiquity of utilities (like electricity), they also require the massive capex and infrastructure akin to semiconductor fabs.

But the most accurate analogy, he claims, is that LLMs are emerging operating systems, with multimodal abilities, memory management (context windows), and apps running across multiple providers—just like early days of Linux vs. Windows.

Vibe Coding and Natural Language Development

Vibe coding—the concept of programming through intuition and natural language—has exploded, thanks in part to Karpathy’s now-famous tweet. “I can’t program in Swift,” he said, “but I built an iOS app with an LLM in a day.”

The viral idea is about empowerment: anyone who speaks English can now create software. And this unlocks massive creative and economic potential, especially for young developers and non-programmers.

The Next Frontier: Building for AI Agents

Karpathy argues that today’s digital infrastructure was designed for humans and GUIs—not for autonomous agents. He proposes tools like llm.txt (analogous to robots.txt) to make content agent-readable, and praises platforms like Vercel and Stripe that are transitioning documentation and tooling to be LLM-native.

“You can’t just say ‘click this’ anymore,” he explains. Agents need precise, machine-readable instructions—not vague human UX metaphors.

He also showcases tools like Deep Wiki and Ingest to convert GitHub repos into digestible formats for LLMs. In short, we must rethink developer experience not just for humans, but for machine collaborators.

Iron Man Suits, Not Iron Man Robots

Karpathy closes with a compelling analogy: most AI applications today should act more like Iron Man suits (human-augmented intelligence) rather than fully autonomous Iron Man robots. We need GUIs for oversight, autonomy sliders to control risk, and workflows that let humans verify, adjust, and approve AI suggestions in tight loops.

“It’s not about replacing developers,” he emphasizes. “It’s about rewriting the stack, building intelligent tools, and creating software that collaborates with us.”

Takeaway:
The future of software isn’t just about writing better code. It’s about redefining what code is, who gets to write it, and how machines will interact with the web. Whether you’re a developer, founder, or student, learning to work with and build for LLMs isn’t optional—it’s the next operating system of the world.

30.5.25

Mistral Enters the AI Agent Arena with New Agents API

The AI landscape is rapidly evolving, and the latest "status symbol" for billion-dollar AI companies isn't a fancy office or high-end swag, but a robust agents framework or, as Mistral AI has just unveiled, an Agents API. This new offering from the well-funded and innovative French AI startup signals a significant step towards empowering developers to build more capable, useful, and active problem-solving AI applications.

Mistral has been on a roll, recently releasing models like "Devstral," their latest coding-focused LLM. Their new Agents API aims to provide a dedicated, server-side solution for building and orchestrating AI agents, contrasting with local frameworks by being a cloud-pinged service. This approach is reminiscent of OpenAI's "requests API" but tailored for agentic workflows.

Key Features of the Mistral Agents API

Mistral's Agents API isn't trying to be a one-size-fits-all framework. Instead, it focuses on providing powerful tools and capabilities specifically for leveraging Mistral's models in agentic systems. Here are some of the standout features:

Persistent Memory Across Conversations: A significant advantage, this allows agents to maintain context and history over extended interactions, a common pain point in many existing agent frameworks where managing memory can be tedious.

Built-in Connectors (Tools): The API comes equipped with a suite of pre-built tools to enhance agent functionality:

Code Execution: Leveraging models like Devstral, agents can securely run Python code in a server-side sandbox, enabling data visualization, scientific computing, and more.

Web Search: Provides agents with access to up-to-date information from online sources, news outlets, and reputable databases.

Image Generation: Integrates with Black Forest Lab's FLUX models (including FLUX1.1 [pro] Ultra) to allow agents to create custom visuals for diverse applications, from educational aids to artistic images.

Document Library (Beta): Enables agents to access and leverage content from user-uploaded documents stored in Mistral Cloud, effectively providing built-in Retrieval-Augmented Generation (RAG) functionality.

MCP (Model Context Protocol) Tools: Supports function calling, allowing agents to interact with external services and data sources.

Agentic Orchestration Capabilities: The API facilitates complex workflows:

Handoffs: Allows different agents to collaborate as part of a larger workflow, with one agent calling another.

Sequential and Parallel Processing: Supports both step-by-step task execution and parallel subtask processing, similar to concepts seen in LangGraph or LlamaIndex, but managed through the API.

Structured Outputs: The API supports structured outputs, allowing developers to define data schemas (e.g., using Pydantic) for more reliable and predictable agent responses.

Illustrative Use Cases and Examples

Mistral has provided a "cookbook" with various examples demonstrating the Agents API's capabilities. These include:

GitHub Agent: A developer assistant powered by Devstral that can manage tasks like creating repositories, handling pull requests, and improving unit tests, using MCP tools for GitHub interaction.

Financial Analyst Agent: An agent designed to handle user queries about financial data, fetch stock prices, generate reports, and perform analysis using MCP servers and structured outputs.

Multi-Agent Earnings Call Analysis System (MAECAS): A more complex example showcasing an orchestration of multiple specialized agents (Financial, Strategic, Sentiment, Risk, Competitor, Temporal) to process PDF earnings call transcripts (using Mistral OCR), extract insights, and generate comprehensive reports or answer specific queries.

These examples highlight how the API can be used for tasks ranging from simple, chained LLM calls to sophisticated multi-agent systems involving pre-processing, parallel task execution, and synthesized outputs.

Differentiation and Implications

The Mistral Agents API positions itself as a cloud-based service rather than a local library like LangChain or LlamaIndex. This server-side approach, particularly with built-in connectors and orchestration, aims to simplify the development of enterprise-grade agentic platforms.

Key differentiators include:

API-centric approach: Focuses on providing endpoints for agentic capabilities.

Tight integration with Mistral models: Optimized for Mistral's own LLMs, including specialized ones like Devstral for coding and their OCR model.

Built-in, server-side tools: Reduces the need for developers to implement and manage these integrations themselves.

Persistent state management: Addresses a critical aspect of building robust conversational agents.

This offering is particularly interesting for organizations looking at on-premise deployments of AI models. Mistral, like other smaller, agile AI companies, has shown more openness to licensing proprietary models for such use cases. The Agents API provides a clear pathway for these on-prem users to build sophisticated agentic systems.

The Path Forward

Mistral's Agents API is a significant step in making AI more capable, useful, and an active problem-solver. It reflects a broader trend in the AI industry: moving beyond foundational models to building ecosystems and platforms that enable more complex and practical applications.

While still in its early stages, the API, with its focus on robust features like persistent memory, built-in tools, and orchestration, provides a compelling new option for developers looking to build the next generation of AI agents. As the tools and underlying models continue to improve, the potential for what can be achieved with such an API will only grow. Developers are encouraged to explore Mistral's documentation and cookbook to get started.

29.5.25

Mistral AI Launches Agents API to Simplify AI Agent Creation for Developers

Mistral AI has unveiled its Agents API, a developer-centric platform designed to simplify the creation of autonomous AI agents. This launch represents a significant advancement in agentic AI, offering developers a structured and modular approach to building agents that can interact with external tools, data sources, and APIs.

Key Features of the Agents API

Built-in Connectors:
The Agents API provides out-of-the-box connectors, including:
- Web Search: Enables agents to access up-to-date information from the web, enhancing their responses with current data.
- Document Library: Allows agents to retrieve and utilize information from user-uploaded documents, supporting retrieval-augmented generation (RAG) tasks.
- Code Execution: Facilitates the execution of code snippets, enabling agents to perform computations or run scripts as part of their workflow.
- Image Generation: Empowers agents to create images based on textual prompts, expanding their multimodal capabilities.
Model Context Protocol (MCP) Integration:
The API supports MCP, an open standard that allows agents to seamlessly interact with external systems such as APIs, databases, and user data. This integration ensures that agents can access and process real-world context effectively.
Persistent State Management:
Agents built with the API can maintain state across multiple interactions, enabling more coherent and context-aware conversations.
Agent Handoff Capability:
The platform allows for the delegation of tasks between agents, facilitating complex workflows where different agents handle specific subtasks.
Support for Multiple Models:
Developers can leverage various Mistral models, including Mistral Medium and Mistral Large, to power their agents, depending on the complexity and requirements of the tasks.

Performance and Benchmarking

In evaluations using the SimpleQA benchmark, agents utilizing the web search connector demonstrated significant improvements in accuracy. For instance, Mistral Large achieved a score of 75% with web search enabled, compared to 23% without it. Similarly, Mistral Medium scored 82.32% with web search, up from 22.08% without. (Source)

Developer Resources and Accessibility

Mistral provides comprehensive documentation and SDKs to assist developers in building and deploying agents. The platform includes cookbooks and examples for various use cases, such as GitHub integration, financial analysis, and customer support. (Docs)

The Agents API is currently available to developers, with Mistral encouraging feedback to further refine and enhance the platform.

Implications for AI Development

The introduction of the Agents API by Mistral AI signifies a move toward more accessible and modular AI development. By providing a platform that simplifies the integration of AI agents into various applications, Mistral empowers developers to create sophisticated, context-aware agents without extensive overhead. This democratization of agentic AI has the potential to accelerate innovation across industries, from customer service to data analysis.

19.5.25

Ultra-FineWeb: A Trillion-Token Dataset Enhancing LLM Accuracy Across Benchmarks

Researchers from Tsinghua University and ModelBest have introduced Ultra-FineWeb, a large-scale, high-quality dataset comprising approximately 1 trillion English tokens and 120 billion Chinese tokens. This dataset aims to enhance the performance of large language models (LLMs) by providing cleaner and more efficient training data.

Efficient Data Filtering Pipeline

The creation of Ultra-FineWeb involved an efficient data filtering pipeline that addresses two main challenges in data preparation for LLMs:

Lack of Efficient Data Verification Strategy:
Traditional methods struggle to provide timely feedback on data quality. To overcome this, the researchers introduced a computationally efficient verification strategy that enables rapid evaluation of data impact on LLM training with minimal computational cost.
Selection of Seed Data for Classifier Training:
Selecting appropriate seed data often relies heavily on human expertise, introducing subjectivity. The team optimized the selection process by integrating the verification strategy, improving filtering efficiency and classifier robustness.

A lightweight classifier based on fastText was employed to efficiently filter high-quality data, significantly reducing inference costs compared to LLM-based classifiers.

Benchmark Performance

Empirical results demonstrate that LLMs trained on Ultra-FineWeb exhibit significant performance improvements across multiple benchmark tasks, including MMLU, ARC, CommonSenseQA, and others. The dataset's quality contributes to enhanced training efficiency and model accuracy.

Availability

Ultra-FineWeb is available on Hugging Face, providing researchers and developers with access to this extensive dataset for training and evaluating LLMs.