Fireworks AI
Fireworks AIFast, low-cost inference for open and proprietary models with native function calling.
Third-party hosts for open-source LLMs (Together, Fireworks, Groq, Replicate).
Fast, low-cost inference for open and proprietary models with native function calling.
Sub-100ms LPU-based inference for Llama, Mixtral, and other open models.
Inference and fine-tuning across 200+ open-source LLMs with serverless and dedicated endpoints.
Compute substrate for AI agents: lightweight enough to live on your laptop, elastic enough to scale into the cloud and unleash unlimited resources.
SGLang is a high-performance serving framework for large language models and multimodal models.
A high-throughput and memory-efficient inference and serving engine for LLMs