Anthropic launched Fast Mode for Claude Opus 4.6 in research preview. The feature delivers up to 2.5x higher output tokens per second from the same model at a higher cost per token.
Fast Mode is available now in Claude Code for users with extra usage enabled and through a waitlist for API access. The feature is also rolling out to GitHub Copilot, Cursor, and other platforms.
How Fast Mode Works
Fast Mode is not a different model. It uses the same Opus 4.6 with a different API configuration that prioritizes speed over cost efficiency. You get identical quality and capabilities, just faster responses.
The speed improvement focuses on output tokens per second, not time to first token. The same model weights and behavior remain unchanged.
Accessing Fast Mode
In Claude Code, toggle Fast Mode on or off by typing /fast in the CLI or VS Code extension. You can also enable it in your user settings file by setting "fastMode": true. Fast Mode persists across sessions.
When enabled, Claude Code automatically switches to Opus 4.6 if you're on a different model. A small ↯ icon appears next to the prompt while Fast Mode is active.
For API users, set speed: "fast" in your API request to enable Fast Mode. The feature is currently in limited research preview with waitlist access.
Pricing and Availability
Fast Mode pricing starts at $30 per million input tokens and $150 per million output tokens. This is 6x the standard Opus pricing of $5 per million input and $25 per million output.
A 50% discount is available for all plans until February 16, 2026, bringing the cost to 3x standard pricing during the discount period.
Fast Mode usage is billed directly to extra usage, even if you have remaining usage on your plan. Fast Mode tokens do not count against your plan's included usage.
Requirements and Limitations
Fast Mode requires extra usage enabled on your account. For individual accounts, enable this in Console billing settings. For Teams and Enterprise, an admin must enable both extra usage and Fast Mode for the organization.
Fast Mode is not available on third-party cloud providers including Amazon Bedrock, Google Vertex AI, or Microsoft Azure Foundry. It's only available through the Anthropic Console API and Claude subscription plans using extra usage.
Fast Mode has separate rate limits from standard Opus 4.6. When you hit the rate limit or run out of extra usage credits, Fast Mode automatically falls back to standard Opus 4.6.
When to Use Fast Mode
Fast Mode works best for interactive workflows where speed matters more than cost. Use it for rapid iteration, live debugging, or real-time agent interactions.
Toggle it off when cost efficiency is more important than latency. You can combine Fast Mode with lower effort levels for maximum speed on straightforward tasks.
For API users, note that switching between fast and standard speed invalidates the prompt cache. Requests at different speeds do not share cached prefixes.
No comments:
Post a Comment