What shipped, when it shipped.
2026
Replaced "free tier" with "free models" throughout

The product has no tiers — the correct framing is that some models are free to use (Groq and Cerebras-hosted open-source models) and others are paid. Updated all user-facing copy, docs, and landing page content to reflect this.

Added /v1/completions and /v1/responses endpoints

Two new endpoints alongside the existing /v1/chat/completions: /v1/completions for legacy text completion workflows, and /v1/responses for the OpenAI Responses API format. Both support auto-routing and the same model parameter values.

Note: model pinning is required for these endpoints — auto-routing with tier aliases (economy, standard, premium) is not available.

Auto-routing — intelligent tier selection from conversation context

Pass model: "auto" and the router analyses your messages to pick the appropriate tier automatically. Simple queries route to economy, complex reasoning to standard or premium. Overridable with explicit prefer hints (fast, coding, reasoning).

Full documentation at /docs/api#auto-routing.

Try page — interactive playground in the browser

New /try page: send requests to the router directly from your browser, authenticated with your session. Usage is billed to your credit balance at standard rates. Useful for testing routing decisions without writing code.

Free models via Groq and Cerebras

Added Groq and Cerebras as providers. Models on these platforms (including Llama 3.3 70B and Llama 3.1 8B) are available at no cost — the router covers provider costs. Paid credits are only required for models on Anthropic, OpenAI, Google, and AWS Bedrock.

The economy tier now routes exclusively to free models by default.

Removed tier from API key creation

API keys no longer have a tier. The tier is specified per-request via the model parameter. This simplifies key management — one key works across all tiers and all endpoints.

Activation improvements — key reveal, curl example, first-call nudge

Several small improvements to reduce the activation gap:

  • New API keys are revealed in full on creation (with a copy button) before being masked
  • A curl example using the new key is shown inline
  • A banner nudges users who have a key but haven't made their first call
  • Welcome email content updated — arrives 1 hour after signup with clearer next steps
Claude models updated to 1M token context window

claude-sonnet-4-6 and claude-opus-4-6 context windows updated to 1,048,576 tokens (1M) following Anthropic GA announcement on March 13 2026.

Added Vertex AI and Nemotron 3 Nano models

Vertex AI provider adapter added (Google's managed API endpoint, separate from the direct Gemini API). Nemotron 3 Nano 8B and 51B models added via AWS Bedrock.

OpenTelemetry observability — per-user OTLP export

Users can configure an OTLP endpoint in their profile to receive request telemetry (latency, token counts, routing decisions, provider used) as OpenTelemetry spans. Every response includes an X-Request-Id header for correlation.

Embeddings endpoint — /v1/embeddings

Added /v1/embeddings with embed-small and embed-large tier aliases. Backed by Amazon Titan Embed Text v2. Supports batch input and returns normalized vectors.

Coding preference — prefer:coding routing hint

Added prefer: "coding" as a routing hint. Routes to models with strong code benchmark scores within the selected tier.

Rate limiting — per-key token bucket

Token bucket rate limiter applied per API key. Limits are tier-based and visible in response headers (X-RateLimit-*). Concurrent overdraft prevention via atomic credit reservation.

Passwordless auth — email code login

Authentication switched to passwordless email codes. Enter your email, receive a 6-digit code, done. No passwords to manage or forget.

User accounts and API key management

Account system launched: sign up, manage API keys, view usage history, and add payment methods — all from /profile.

Stripe billing — credit top-ups

Credit-based billing via Stripe. Add credits to your account and they are deducted at cost as you make requests. A 4% platform fee applies to each top-up.

Public launch — model routing API

Initial release. One endpoint (/v1/chat/completions), OpenAI-compatible, routes across Anthropic, OpenAI, and Google Gemini. Circuit-breaker failover, streaming support, cost-based tier selection.