Together AI: Open-Source Inference at Scale

Overview

Together AI is a cloud inference provider focused on open-source models: Llama, DeepSeek, Mistral, Qwen, and dozens more. It offers serverless pay-per-token pricing, dedicated GPU deployments, and fine-tuning -- all under one API using the OpenAI-compatible format.

Pricing (May 2026)

Model	Input $/M	Output $/M
Llama 3.3 70B	$0.88	varies
DeepSeek V3	$1.25	$1.25
DeepSeek R1	$3.00	$7.00
Llama 3.1 405B	$3.50	$3.50
Mistral Large	varies	$9.00

New accounts get $5 in free credits (enough for millions of tokens on smaller models). No monthly minimum.

Fine-tuning

Together AI supports fine-tuning on the full Llama, Mistral, and Qwen lineup including 405B, at roughly $8-$12 per million training tokens. This is the differentiator against purely inference-focused competitors like Groq -- Together AI covers the full train-deploy loop.

Where it fits

Together AI is the default choice for teams that need:

Fine-tuning on large open-source models (including 405B)
Access to DeepSeek V3/R1 via a reliable US-based endpoint
A single billing account across many open-source models

For raw throughput on a single model, Groq delivers faster inference via LPUs. For maximum model breadth and routing across both open and closed providers, OpenRouter is the aggregator layer.

Switching from the OpenAI API to Together AI requires only a base URL and API key change -- no SDK changes.

Field notes

Together AI pricing comparison published by Digital Applied in Q2 2026 benchmarked inference providers across 15 popular open-source models. Together AI had the strongest coverage for DeepSeek and Qwen model families among US-based providers. [community-thread, 2026-04-28]

Overview

Pricing (May 2026)

Fine-tuning

Where it fits

Field notes

See also