Wandering Nomad: long-context LLM

13.8.25

Claude Sonnet 4 Now Handles 1M Tokens: Anthropic’s Big Leap in Long-Context Reasoning

Anthropic has expanded Claude Sonnet 4’s context window to a full 1,000,000 tokens, a five-fold jump that shifts what teams can do in a single request—from whole-repo code reviews to end-to-end research synthesis. In practical terms, that means you can feed the model entire codebases (75,000+ lines) or dozens of papers at once and ask for structured analysis without manual chunking gymnastics. The upgrade is live in public beta on the Anthropic API and Amazon Bedrock; support on Google Cloud’s Vertex AI is “coming soon.”

Why this matters: bigger context changes workflows, not just numbers. When prompts can carry requirements, source files, logs, and prior discussion all together, you get fewer lost references and more coherent plans. It also smooths multi-agent and tool-calling patterns where a planner, executor, and reviewer share one evolving, grounded workspace—without constant re-fetching or re-summarizing. Press coverage framed the jump as removing a major pain point: breaking big problems into fragile fragments.

What you can do today

• Audit whole repos: Ask for dependency maps, risky functions, and minimally invasive refactors across tens of thousands of lines—then request diffs.
• Digest literature packs: Load a folder of PDFs and prompt for a matrix of methods, datasets, and limitations, plus follow-up questions the papers don’t answer.
• Conduct long-form investigations: Keep logs, configs, and transcripts in the same conversation so the model can track hypotheses over hours or days.

Where to run it

• Anthropic API: public beta with 1M-token support.
• Amazon Bedrock: available now in public preview.
• Google Vertex AI: listed as “coming soon.”

How to get the most from 1M tokens

Keep retrieval in the loop. A giant window isn’t a silver bullet; relevant-first context still beats raw volume. Anthropic’s own research shows better retrieval reduces failure cases dramatically. Use hybrid search (BM25 + embeddings) and reranking to stage only what matters.
Structure the canvas. With big inputs, schema matters: headings, file paths, and short summaries up top make it easier for the model to anchor its reasoning and cite sources accurately.
Plan for latency and cost. Longer prompts mean more compute. Batch where you can, and use summaries or “table of contents” stubs for less-critical sections before expanding on demand. (Early reports note the upgrade targets real enterprise needs like analyzing entire codebases and datasets.)

Competitive context

Anthropic’s 1M-token Sonnet 4 puts the company squarely in the long-context race that’s become table stakes for serious coding and document-intelligence workloads. Trade press called out the move as catching up with million-token peers, while emphasizing the practical benefit: fewer seams in real projects.

The bottom line

Claude Sonnet 4’s 1M-token window is less about bragging rights and more about coherence at scale. If your teams juggle sprawling repos, dense discovery packets, or multi-day investigations, this update lets you bring the full problem into one place—and keep it there—so plans, diffs, and decisions line up without constant re-stitching. With availability on the Anthropic API and Bedrock today (Vertex AI next), it’s an immediately useful upgrade for engineering and research-heavy organizations.

18.6.25

MiniMax-M1: A Breakthrough Open-Source LLM with a 1 Million Token Context & Cost-Efficient Reinforcement Learning

MiniMax, a Chinese AI startup renowned for its Hailuo video model, has unveiled MiniMax-M1, a landmark open-source language model released under the Apache 2.0 license. Designed for long-context reasoning and agentic tool use, M1 supports a 1 million token input and 80,000 token output window—vastly exceeding most commercial LLMs and enabling it to process large documents, contracts, or codebases in one go.

Built on a hybrid Mixture-of-Experts (MoE) architecture with lightning attention, MiniMax-M1 optimizes performance and cost. The model spans 456 billion parameters, with 45.9 billion activated per token. Its training employed a custom CISPO reinforcement learning algorithm, resulting in substantial efficiency gains. Remarkably, M1 was trained for just $534,700, compared to over $5–6 million spent by DeepSeek‑R1 or over $100 million for GPT‑4.

⚙️ Key Architectural Innovations

1M Token Context Window: Enables comprehensive reasoning across lengthy documents or multi-step workflows.
Hybrid MoE + Lightning Attention: Delivers high performance without excessive computational overhead.
CISPO RL Algorithm: Efficiently trains the model with clipped importance sampling, lowering cost and training time.
Dual Variants: M1-40k and M1-80k versions support variable output lengths (40K and 80K “thinking budget”).

📊 Benchmark-Topping Performance

MiniMax-M1 excels in diverse reasoning and coding benchmarks:

– AIME 2024 (Math): 86.0% accuracy
– LiveCodeBench (Coding): 65.0%
– SWE‑bench Verified: 56.0%
– TAU‑bench: 62.8%
– OpenAI MRCR (4-needle): 73.4%

These results surpass leading open-weight models like DeepSeek‑R1 and Qwen3‑235B‑A22B, narrowing the gap with top-tier commercial LLMs such as OpenAI’s o3 and Google’s Gemini due to its unique architectural optimizations.

🚀 Developer-Friendly & Agent-Ready

MiniMax-M1 supports structured function calling and is packaged with an agent-capable API that includes search, multimedia generation, speech synthesis, and voice cloning. Recommended for deployment via vLLM, optimized for efficient serving and batch handling, it also offers standard Transformers compatibility.

For enterprises, technical leads, and AI orchestration engineers—MiniMax-M1 provides:

Lower operational costs and compute footprint
Simplified integration into existing AI pipelines
Support for in-depth, long-document tasks
A self-hosted, secure alternative to cloud-bound models
Business-grade performance with full community access

🧩 Final Takeaway

MiniMax-M1 marks a milestone in open-source AI—combining extreme context length, reinforcement-learning efficiency, and high benchmark performance within a cost-effective, accessible framework. It opens new possibilities for developers, researchers, and enterprises tackling tasks requiring deep reasoning over extensive content—without the limitations or expense of closed-weight models.