Wandering Nomad

19.6.25

Andrej Karpathy Declares the Era of Software 3.0: Programming in English, Building for Agents, and Rewriting the Stack

Andrej Karpathy on the Future of Software: The Rise of Software 3.0 and the Agent Era

At a packed AI event, Andrej Karpathy—former Director of AI at Tesla and founding member of OpenAI—delivered a compelling address outlining a tectonic shift in how we write, interact with, and deploy software. “Software is changing again,” Karpathy declared, positioning today’s shift as more radical than anything the industry has seen in 70 years.

From Software 1.0 to 3.0

Karpathy breaks down the evolution of software into three stages:

Software 1.0: Traditional code written explicitly by developers in programming languages like Python or C++.
Software 2.0: Neural networks trained via data and optimized using backpropagation—no explicit code, just learned weights.
Software 3.0: Large Language Models (LLMs) like GPT-4 and Claude, where natural language prompts become the new form of programming.

“We are now programming computers in English,” Karpathy said, highlighting how the interface between humans and machines is becoming increasingly intuitive and accessible.

GitHub, Hugging Face, and the Rise of LLM Ecosystems

Karpathy draws powerful parallels between historical shifts in tooling: GitHub was the hub for Software 1.0; Hugging Face and similar platforms are now becoming the repositories for Software 2.0 and 3.0. Prompting an LLM is no longer just a trick—it’s a paradigm. And increasingly, tools like Cursor and Perplexity represent what he calls partial autonomy apps, with sliding scales of control for the user.

In these apps, humans perform verification while AIs handle generation, and GUIs become crucial for maintaining speed and safety.

AI as Utilities, Fabs, and Operating Systems

Karpathy introduced a powerful metaphor: LLMs as a new form of operating system. Just as Windows or Linux manage memory and processes, LLMs orchestrate knowledge and tasks. He explains that while LLMs operate with the reliability and ubiquity of utilities (like electricity), they also require the massive capex and infrastructure akin to semiconductor fabs.

But the most accurate analogy, he claims, is that LLMs are emerging operating systems, with multimodal abilities, memory management (context windows), and apps running across multiple providers—just like early days of Linux vs. Windows.

Vibe Coding and Natural Language Development

Vibe coding—the concept of programming through intuition and natural language—has exploded, thanks in part to Karpathy’s now-famous tweet. “I can’t program in Swift,” he said, “but I built an iOS app with an LLM in a day.”

The viral idea is about empowerment: anyone who speaks English can now create software. And this unlocks massive creative and economic potential, especially for young developers and non-programmers.

The Next Frontier: Building for AI Agents

Karpathy argues that today’s digital infrastructure was designed for humans and GUIs—not for autonomous agents. He proposes tools like llm.txt (analogous to robots.txt) to make content agent-readable, and praises platforms like Vercel and Stripe that are transitioning documentation and tooling to be LLM-native.

“You can’t just say ‘click this’ anymore,” he explains. Agents need precise, machine-readable instructions—not vague human UX metaphors.

He also showcases tools like Deep Wiki and Ingest to convert GitHub repos into digestible formats for LLMs. In short, we must rethink developer experience not just for humans, but for machine collaborators.

Iron Man Suits, Not Iron Man Robots

Karpathy closes with a compelling analogy: most AI applications today should act more like Iron Man suits (human-augmented intelligence) rather than fully autonomous Iron Man robots. We need GUIs for oversight, autonomy sliders to control risk, and workflows that let humans verify, adjust, and approve AI suggestions in tight loops.

“It’s not about replacing developers,” he emphasizes. “It’s about rewriting the stack, building intelligent tools, and creating software that collaborates with us.”

Takeaway:
The future of software isn’t just about writing better code. It’s about redefining what code is, who gets to write it, and how machines will interact with the web. Whether you’re a developer, founder, or student, learning to work with and build for LLMs isn’t optional—it’s the next operating system of the world.

18.6.25

OpenBMB Launches MiniCPM4: Ultra-Efficient LLMs Tailored for Edge Devices

OpenBMB recently announced the release of MiniCPM4, a suite of lightweight yet powerful language models designed for seamless deployment on edge devices. The series includes two configurations: a 0.5-billion and an 8-billion-parameter model. By combining innovations in model design, training methodology, and inference optimization, MiniCPM4 delivers unprecedented performance for on-device applications.

What Sets MiniCPM4 Apart

InfLLM v2: Sparse Attention Mechanism
Utilizes trainable sparse attention where tokens attend to fewer than 5% of others during 128 K-long sequence processing. This dramatically reduces computation without sacrificing context comprehension.
BitCPM Quantization:
Implements ternary quantization across model weights, achieving up to 90% reduction in bit-width and enabling storage-efficient deployment on constrained devices.
Efficient Training Framework:
Employs ultra-clean dataset filtering (UltraClean), instruction fine-tuning (UltraChat v2), and optimized hyperparameter tuning strategies (ModelTunnel v2), all trained on only ~8 trillion tokens.
Optimized Inference Stack:
Slow inference is addressed via CPM.cu—an efficient CUDA framework that integrates sparse attention, quantization, and speculative sampling. Cross-platform support is provided through ArkInfer.

Performance Highlights

Speed:
On devices like the Jetson AGX Orin, the 8B MiniCPM4 model processes long text (128K tokens) up to 7× faster than competing models like Qwen3‑8B.
Benchmark Results:
Comprehensive evaluations show MiniCPM4 outperforming open-source peers in tasks across long-text comprehension and multi-step generation.

Deploying MiniCPM4

On CUDA Devices: Use the CPM.cu stack for optimized sparse attention and speculative decoding performance.
With Transformers API: Supports Hugging Face interfacing via tensor-mode bfloat16 and trust_remote_code=True.
Server-ready Solutions: Includes support for styles like SGLang and vLLM, enabling efficient batching and chat-style endpoints.

Why It Matters

MiniCPM4 addresses critical industry pain points:

Local ML Capabilities: Brings powerful LLM performance to devices without relying on cloud infrastructure.
Performance & Efficiency Balance: Achieves desktop-grade reasoning on embedded devices thanks to sparse attention and quantization.
Open Access: Released under Apache 2.0 with documentation, model weights, and inference tooling available via Hugging Face.

Conclusion

MiniCPM4 marks a significant step forward in making advanced language models practical for edge environments. Its efficient attention mechanisms, model compression, and fast decoding pipeline offer developers and researchers powerful tools to embed AI capabilities directly within resource-constrained systems. For industries such as industrial IoT, robotics, and mobile assistants, MiniCPM4 opens doors to real-time, on-device intelligence without compromising performance or privacy.

OpenAI’s Deprecation of GPT-4.5 API Shakes Developer Community Amid Transition to GPT-4.1

OpenAI has announced it's removing GPT‑4.5 Preview from its API on July 14, 2025, triggering disappointment among developers who have relied on its unique blend of performance and creativity. Despite being a favorite among many, the decision aligns with OpenAI’s earlier warning in April 2025, marking GPT‑4.5 as an experimental model meant to inform future iterations.

🚨 Why Developers Are Frustrated

Developers took to X (formerly Twitter) to express their frustration:

“GPT‑4.5 is one of my fav models,” lamented @BumrahBachi.
“o3 + 4.5 are the models I use the most everyday,” said Ben Hyak, Raindrop.AI co-founder.
“What was the purpose of this model all along?” questioned @flowersslop.

For many, GPT‑4.5 offered a distinct combination of creative fluency and nuanced writing—qualities they haven't fully found in newer models like GPT‑4.1 or o3.

🔄 OpenAI’s Response

OpenAI maintains that GPT‑4.5 will remain available in ChatGPT via subscription, even after being dropped from the API. Developers have been directed to migrate to other models such as GPT‑4.1, which the company considers a more sustainable option for API integration.

The removal reflects OpenAI’s ongoing efforts to optimize compute costs while streamlining its model lineup—GT‑4.5’s high GPU requirements and premium pricing made it a natural candidate for phasing out .

💡 What This Means for You

API users must switch models before the mid-July deadline.
Expect adjustments in tone and output style when migrating to GPT‑4.1 or o3.
Organizations using GPT‑4.5 need to test and validate behavior changes in their production pipelines.

🧭 Broader Implications

This move underscores the challenges of balancing model innovation with operational demands and developer expectations.
GPT‑4.5, known as “Orion,” boasted reduced hallucinations and strong language comprehension—yet its high costs highlight the tradeoff between performance and feasibility.
OpenAI’s discontinuation of GPT‑4.5 in the API suggests a continued focus on models that offer the best value, efficiency, and scalability.

✅ Final Takeaway

While API deprecation may frustrate developers who valued GPT‑4.5’s unique strengths, OpenAI’s decision is rooted in economic logic and forward momentum. As the company transitions to GPT‑4.1 and other models, developers must reevaluate their strategies—adapting prompts and workflows to preserve effectiveness while embracing more sustainable AI tools.