The product has no tiers — the correct framing is that some models are free to use (Groq and Cerebras-hosted open-source models) and others are paid. Updated all user-facing copy, docs, and landing page content to reflect this.
The product has no tiers — the correct framing is that some models are free to use (Groq and Cerebras-hosted open-source models) and others are paid. Updated all user-facing copy, docs, and landing page content to reflect this.
Two new endpoints alongside the existing /v1/chat/completions:
/v1/completions for legacy text completion workflows, and
/v1/responses for the OpenAI Responses API format.
Both support auto-routing and the same model parameter values.
Note: model pinning is required for these endpoints — auto-routing with tier aliases
(economy, standard, premium) is not available.
Pass model: "auto" and the router analyses your messages to pick the
appropriate tier automatically. Simple queries route to economy, complex reasoning to
standard or premium. Overridable with explicit prefer hints
(fast, coding, reasoning).
Full documentation at /docs/api#auto-routing.
New /try page: send requests to the router directly from your browser, authenticated with your session. Usage is billed to your credit balance at standard rates. Useful for testing routing decisions without writing code.
Added Groq and Cerebras as providers. Models on these platforms (including Llama 3.3 70B and Llama 3.1 8B) are available at no cost — the router covers provider costs. Paid credits are only required for models on Anthropic, OpenAI, Google, and AWS Bedrock.
The economy tier now routes exclusively to free models by default.
API keys no longer have a tier. The tier is specified per-request via the
model parameter. This simplifies key management — one key works across all
tiers and all endpoints.
Several small improvements to reduce the activation gap:
claude-sonnet-4-6 and claude-opus-4-6 context windows
updated to 1,048,576 tokens (1M) following Anthropic GA announcement on March 13 2026.
Vertex AI provider adapter added (Google's managed API endpoint, separate from the direct Gemini API). Nemotron 3 Nano 8B and 51B models added via AWS Bedrock.
Users can configure an OTLP endpoint in their profile to receive request telemetry
(latency, token counts, routing decisions, provider used) as OpenTelemetry spans.
Every response includes an X-Request-Id header for correlation.
Added /v1/embeddings with embed-small and
embed-large tier aliases. Backed by Amazon Titan Embed Text v2.
Supports batch input and returns normalized vectors.
Added prefer: "coding" as a routing hint. Routes to models with
strong code benchmark scores within the selected tier.
Token bucket rate limiter applied per API key. Limits are tier-based and
visible in response headers (X-RateLimit-*).
Concurrent overdraft prevention via atomic credit reservation.
Authentication switched to passwordless email codes. Enter your email, receive a 6-digit code, done. No passwords to manage or forget.
Account system launched: sign up, manage API keys, view usage history, and add payment methods — all from /profile.
Credit-based billing via Stripe. Add credits to your account and they are deducted at cost as you make requests. A 4% platform fee applies to each top-up.
Initial release. One endpoint (/v1/chat/completions),
OpenAI-compatible, routes across Anthropic, OpenAI, and Google Gemini.
Circuit-breaker failover, streaming support, cost-based tier selection.