May 14, 2026 · Anthropic API
Anthropic API: Direct Access to Claude Models
The Anthropic API provides direct pay-per-token access to the Claude model family. It is the substrate that [Claude Code](claude-code-launch-overview) is built on and the endpoint teams call when buil
Overview
The Anthropic API provides direct pay-per-token access to the Claude model family. It is the substrate that Claude Code is built on and the endpoint teams call when building agents, pipelines, or integrations with Claude.
Models and pricing (May 2026)
| Model | Input $/M | Output $/M | Context |
|---|---|---|---|
| Claude Haiku 4.5 | $1.00 | $5.00 | 200K |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 1M |
| Claude Opus 4.6 | $5.00 | $25.00 | 1M |
| Claude Opus 4.7 | $5.00 | $25.00 | 1M |
Opus 4.7 (released April 16 2026) carries the same price as Opus 4.6. Both Sonnet 4.6 and Opus 4.6/4.7 support extended thinking. Haiku 4.5 is the fast, cheap option for classification, routing, and light extraction tasks.
Cost reduction levers
Prompt caching: Frequently reused context blocks (system prompts, large documents) can be cached. Cache write costs roughly 10% of the normal input rate; cache hits are free after the first write. On sessions that repeat the same system prompt, this can reduce input costs by 50% to 70%.
Batch API: Asynchronous inference with 50% off standard rates. Latency is up to 24 hours. Suitable for offline evaluation runs, bulk classification, and dataset annotation.
Extended thinking: Available on Sonnet 4.6 and Opus 4.6/4.7. Thinking tokens are billed at the same input rate but only when extended thinking is explicitly enabled.
1 million token context
Sonnet 4.6 and Opus 4.6/4.7 support a 1 million token context window at standard pricing. This enables full-codebase ingestion, long document analysis, and multi-session context without context stuffing tricks.
Integration surface
The Anthropic API is OpenAI-compatible (via the /v1/messages endpoint pattern plus an adapter). Most orchestration frameworks (LangChain, Mastra) have native Anthropic SDK support. Observability platforms (Helicone, Langfuse) work via proxy or native integration. Teams that need multi-provider routing add OpenRouter or Portkey in front.
Where it fits
Use the Anthropic API directly when: (1) you are building a product on Claude and want direct cost control, (2) you need features only available via the API (batch, prompt caching, extended thinking), or (3) you are building an agent framework that needs programmatic model control.
For teams that want managed routing across Claude and other models, OpenRouter or Portkey abstract the API layer. For teams that want a managed Claude experience without any API setup, Claude Code is the turnkey option.
Field notes
- Anthropic published pricing for Opus 4.7 on April 16 2026: same rates as Opus 4.6 ($5/$25 per million tokens). The simultaneous release of 4.6 and 4.7 at the same price point caused short-term confusion about which to default to; 4.7 is the current recommended Opus variant. [changelog, 2026-04-16]
- Prompt caching write cost confirmed at approximately 10% of base input rate in production usage across multiple teams. The break-even point is roughly 10 cache hits per write. Long-running agentic sessions with a stable system prompt typically exceed this within the first hour. [community-thread, 2026-04-20]
See also
Claude Code, OpenAI API, OpenRouter, Portkey, Helicone, Langfuse
Field notes synthesized from build evidence ; postmortems, dev-team blogs, and vendor retros. Methodology is public. Corrections to hello@vybing.dev.