Changelog — Model Router

2026-06-26 feature

Paid tier — 600 RPM and $300/day for users with Stripe credits

Users who have purchased credits via Stripe now get elevated limits: 600 RPM (vs 10–60) and a $300/day spend cap (vs $30). The upgrade is automatic — no configuration needed. Personal spend caps set on the profile page still override tier defaults.

Also added IP-level rate limiting to the verify-code endpoint to prevent account creation floods, sharing the same limiter bucket as request-code.

2026-06-24 feature

Removed free provider routing — economy tier is now fully paid

Groq and Cerebras free-tier models have been removed from the economy tier. Free provider rate limits (39% failure rate on Cerebras) were causing cascading 502 errors as user volume grew beyond what free quotas could sustain.

Economy tier now routes to cheap paid models: GPT-4.1 Mini, Gemini 2.5 Flash, Claude Haiku 4.5, Grok 3 Mini, and Bedrock models. New accounts still get $1 in trial credits — enough for millions of tokens.

Zero-balance users receive a 402 Payment Required response with a link to /billing.

2026-03-25 fix

Replaced "free tier" with "free models" throughout

The product has no tiers — the correct framing is that some models are free to use (Groq and Cerebras-hosted open-source models) and others are paid. Updated all user-facing copy, docs, and landing page content to reflect this.

2026-03-24 feature

Added /v1/completions and /v1/responses endpoints

Two new endpoints alongside the existing /v1/chat/completions: /v1/completions for legacy text completion workflows, and /v1/responses for the OpenAI Responses API format. Both support auto-routing and the same model parameter values.

Note: model pinning is required for these endpoints — auto-routing with tier aliases (economy, standard, premium) is not available.

2026-03-24 feature

Auto-routing — intelligent tier selection from conversation context

Pass model: "auto" and the router analyses your messages to pick the appropriate tier automatically. Simple queries route to economy, complex reasoning to standard or premium. Overridable with explicit prefer hints (fast, coding, reasoning).

Full documentation at /docs/api#auto-routing.

2026-03-24 feature

Try page — interactive playground in the browser

New /try page: send requests to the router directly from your browser, authenticated with your session. Usage is billed to your credit balance at standard rates. Useful for testing routing decisions without writing code.

2026-03-23 feature

Free models via Groq and Cerebras

Added Groq and Cerebras as providers. Models on these platforms (including Llama 3.3 70B and Llama 3.1 8B) are available at no cost — the router covers provider costs. Paid credits are only required for models on Anthropic, OpenAI, Google, and AWS Bedrock.

The economy tier now routes exclusively to free models by default.

2026-03-23 fix

Removed tier from API key creation

API keys no longer have a tier. The tier is specified per-request via the model parameter. This simplifies key management — one key works across all tiers and all endpoints.

2026-03-19 feature

Activation improvements — key reveal, curl example, first-call nudge

Several small improvements to reduce the activation gap:

New API keys are revealed in full on creation (with a copy button) before being masked
A curl example using the new key is shown inline
A banner nudges users who have a key but haven't made their first call
Welcome email content updated — arrives 1 hour after signup with clearer next steps

2026-03-16 feature

Claude models updated to 1M token context window

claude-sonnet-4-6 and claude-opus-4-6 context windows updated to 1,048,576 tokens (1M) following Anthropic GA announcement on March 13 2026.

2026-03-16 feature

Added Vertex AI and Nemotron 3 Nano models

Vertex AI provider adapter added (Google's managed API endpoint, separate from the direct Gemini API). Nemotron 3 Nano 8B and 51B models added via AWS Bedrock.

2026-03-15 feature

OpenTelemetry observability — per-user OTLP export

Users can configure an OTLP endpoint in their profile to receive request telemetry (latency, token counts, routing decisions, provider used) as OpenTelemetry spans. Every response includes an X-Request-Id header for correlation.

2026-03-13 feature

Embeddings endpoint — /v1/embeddings

Added /v1/embeddings with embed-small and embed-large tier aliases. Backed by Amazon Titan Embed Text v2. Supports batch input and returns normalized vectors.

2026-03-13 feature

Coding preference — prefer:coding routing hint

Added prefer: "coding" as a routing hint. Routes to models with strong code benchmark scores within the selected tier.

2026-03-03 feature

Rate limiting — per-key token bucket

Token bucket rate limiter applied per API key. Limits are tier-based and visible in response headers (X-RateLimit-*). Concurrent overdraft prevention via atomic credit reservation.

2026-03-03 feature

Passwordless auth — email code login

Authentication switched to passwordless email codes. Enter your email, receive a 6-digit code, done. No passwords to manage or forget.

2026-03-03 feature

User accounts and API key management

Account system launched: sign up, manage API keys, view usage history, and add payment methods — all from /profile.

2026-03-02 feature

Stripe billing — credit top-ups

Credit-based billing via Stripe. Add credits to your account and they are deducted at cost as you make requests. A 4% platform fee (minimum $0.80) applies to each top-up.

2026-03-01 feature

Public launch — model routing API

Initial release. One endpoint (/v1/chat/completions), OpenAI-compatible, routes across Anthropic, OpenAI, and Google Gemini. Circuit-breaker failover, streaming support, cost-based tier selection.