Skip to main content
vybing.dev
Try:
Field notes

May 14, 2026 · Anthropic API

Anthropic API: Direct Access to Claude Models

The Anthropic API provides direct pay-per-token access to the Claude model family. It is the substrate that [Claude Code](claude-code-launch-overview) is built on and the endpoint teams call when buil

514 wordslong-form
Anthropic API

Overview

The Anthropic API provides direct pay-per-token access to the Claude model family. It is the substrate that Claude Code is built on and the endpoint teams call when building agents, pipelines, or integrations with Claude.

Models and pricing (May 2026)

Model Input $/M Output $/M Context
Claude Haiku 4.5 $1.00 $5.00 200K
Claude Sonnet 4.6 $3.00 $15.00 1M
Claude Opus 4.6 $5.00 $25.00 1M
Claude Opus 4.7 $5.00 $25.00 1M

Opus 4.7 (released April 16 2026) carries the same price as Opus 4.6. Both Sonnet 4.6 and Opus 4.6/4.7 support extended thinking. Haiku 4.5 is the fast, cheap option for classification, routing, and light extraction tasks.

Cost reduction levers

Prompt caching: Frequently reused context blocks (system prompts, large documents) can be cached. Cache write costs roughly 10% of the normal input rate; cache hits are free after the first write. On sessions that repeat the same system prompt, this can reduce input costs by 50% to 70%.

Batch API: Asynchronous inference with 50% off standard rates. Latency is up to 24 hours. Suitable for offline evaluation runs, bulk classification, and dataset annotation.

Extended thinking: Available on Sonnet 4.6 and Opus 4.6/4.7. Thinking tokens are billed at the same input rate but only when extended thinking is explicitly enabled.

1 million token context

Sonnet 4.6 and Opus 4.6/4.7 support a 1 million token context window at standard pricing. This enables full-codebase ingestion, long document analysis, and multi-session context without context stuffing tricks.

Integration surface

The Anthropic API is OpenAI-compatible (via the /v1/messages endpoint pattern plus an adapter). Most orchestration frameworks (LangChain, Mastra) have native Anthropic SDK support. Observability platforms (Helicone, Langfuse) work via proxy or native integration. Teams that need multi-provider routing add OpenRouter or Portkey in front.

Where it fits

Use the Anthropic API directly when: (1) you are building a product on Claude and want direct cost control, (2) you need features only available via the API (batch, prompt caching, extended thinking), or (3) you are building an agent framework that needs programmatic model control.

For teams that want managed routing across Claude and other models, OpenRouter or Portkey abstract the API layer. For teams that want a managed Claude experience without any API setup, Claude Code is the turnkey option.

Field notes

  • Anthropic published pricing for Opus 4.7 on April 16 2026: same rates as Opus 4.6 ($5/$25 per million tokens). The simultaneous release of 4.6 and 4.7 at the same price point caused short-term confusion about which to default to; 4.7 is the current recommended Opus variant. [changelog, 2026-04-16]
  • Prompt caching write cost confirmed at approximately 10% of base input rate in production usage across multiple teams. The break-even point is roughly 10 cache hits per write. Long-running agentic sessions with a stable system prompt typically exceed this within the first hour. [community-thread, 2026-04-20]

See also

Claude Code, OpenAI API, OpenRouter, Portkey, Helicone, Langfuse

Field notes synthesized from build evidence ; postmortems, dev-team blogs, and vendor retros. Methodology is public. Corrections to hello@vybing.dev.