What shipped, when it shipped.
2026
Paid tier — 600 RPM and $300/day for users with Stripe credits

Users who have purchased credits via Stripe now get elevated limits: 600 RPM (vs 10–60) and a $300/day spend cap (vs $30). The upgrade is automatic — no configuration needed. Personal spend caps set on the profile page still override tier defaults.

Also added IP-level rate limiting to the verify-code endpoint to prevent account creation floods, sharing the same limiter bucket as request-code.

Removed free provider routing — economy tier is now fully paid

Groq and Cerebras free-tier models have been removed from the economy tier. Free provider rate limits (39% failure rate on Cerebras) were causing cascading 502 errors as user volume grew beyond what free quotas could sustain.

Economy tier now routes to cheap paid models: GPT-4.1 Mini, Gemini 2.5 Flash, Claude Haiku 4.5, Grok 3 Mini, and Bedrock models. New accounts still get $1 in trial credits — enough for millions of tokens.

Zero-balance users receive a 402 Payment Required response with a link to /billing.

Replaced "free tier" with "free models" throughout

The product has no tiers — the correct framing is that some models are free to use (Groq and Cerebras-hosted open-source models) and others are paid. Updated all user-facing copy, docs, and landing page content to reflect this.

Added /v1/completions and /v1/responses endpoints

Two new endpoints alongside the existing /v1/chat/completions: /v1/completions for legacy text completion workflows, and /v1/responses for the OpenAI Responses API format. Both support auto-routing and the same model parameter values.

Note: model pinning is required for these endpoints — auto-routing with tier aliases (economy, standard, premium) is not available.

Auto-routing — intelligent tier selection from conversation context

Pass model: "auto" and the router analyses your messages to pick the appropriate tier automatically. Simple queries route to economy, complex reasoning to standard or premium. Overridable with explicit prefer hints (fast, coding, reasoning).

Full documentation at /docs/api#auto-routing.

Try page — interactive playground in the browser

New /try page: send requests to the router directly from your browser, authenticated with your session. Usage is billed to your credit balance at standard rates. Useful for testing routing decisions without writing code.

Free models via Groq and Cerebras

Added Groq and Cerebras as providers. Models on these platforms (including Llama 3.3 70B and Llama 3.1 8B) are available at no cost — the router covers provider costs. Paid credits are only required for models on Anthropic, OpenAI, Google, and AWS Bedrock.

The economy tier now routes exclusively to free models by default.

Removed tier from API key creation

API keys no longer have a tier. The tier is specified per-request via the model parameter. This simplifies key management — one key works across all tiers and all endpoints.

Activation improvements — key reveal, curl example, first-call nudge

Several small improvements to reduce the activation gap:

  • New API keys are revealed in full on creation (with a copy button) before being masked
  • A curl example using the new key is shown inline
  • A banner nudges users who have a key but haven't made their first call
  • Welcome email content updated — arrives 1 hour after signup with clearer next steps
Claude models updated to 1M token context window

claude-sonnet-4-6 and claude-opus-4-6 context windows updated to 1,048,576 tokens (1M) following Anthropic GA announcement on March 13 2026.

Added Vertex AI and Nemotron 3 Nano models

Vertex AI provider adapter added (Google's managed API endpoint, separate from the direct Gemini API). Nemotron 3 Nano 8B and 51B models added via AWS Bedrock.

OpenTelemetry observability — per-user OTLP export

Users can configure an OTLP endpoint in their profile to receive request telemetry (latency, token counts, routing decisions, provider used) as OpenTelemetry spans. Every response includes an X-Request-Id header for correlation.

Embeddings endpoint — /v1/embeddings

Added /v1/embeddings with embed-small and embed-large tier aliases. Backed by Amazon Titan Embed Text v2. Supports batch input and returns normalized vectors.

Coding preference — prefer:coding routing hint

Added prefer: "coding" as a routing hint. Routes to models with strong code benchmark scores within the selected tier.

Rate limiting — per-key token bucket

Token bucket rate limiter applied per API key. Limits are tier-based and visible in response headers (X-RateLimit-*). Concurrent overdraft prevention via atomic credit reservation.

Passwordless auth — email code login

Authentication switched to passwordless email codes. Enter your email, receive a 6-digit code, done. No passwords to manage or forget.

User accounts and API key management

Account system launched: sign up, manage API keys, view usage history, and add payment methods — all from /profile.

Stripe billing — credit top-ups

Credit-based billing via Stripe. Add credits to your account and they are deducted at cost as you make requests. A 4% platform fee (minimum $0.80) applies to each top-up.

Public launch — model routing API

Initial release. One endpoint (/v1/chat/completions), OpenAI-compatible, routes across Anthropic, OpenAI, and Google Gemini. Circuit-breaker failover, streaming support, cost-based tier selection.