Model Router — AI API Gateway

$ curl https://api.lxg2it.com/v1/chat/completions \
    -H "Authorization: Bearer $KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "standard",
      "prefer": "cheap",
      "messages": [{"role": "user", "content": "Hello"}]
    }'

Provider APIs are chaos — models launch and deprecate, quotas exhaust, latency spikes, prices shift. You don't need a model catalog. You need requests that just complete, reliably, at the right capability level. We handle the routing, failover, and cost optimisation. Your code stays the same through all of it.

model

The capability tier — sets the floor for what models are eligible. auto analyses your conversation context (system prompt, code blocks, message history) to pick the right tier automatically.

economy·standard·premium·auto

prefer

The optimisation direction within that tier.

cheap·fast·balanced·quality·coding

You can also pass a specific model name (gpt-4.1, claude-sonnet-4-6) to pin routing and bypass tier selection entirely.

Current tiers

economy gemini-2.5-flash · gemini-3.1-flash-lite · gpt-4.1-mini · gpt-5.4-nano · gpt-5.4-mini · o4-mini · claude-haiku-4-5 · grok-3-mini-beta · nvidia.nemotron-nano-3-30b · nvidia.nemotron-nano-9b-v2 · zai.glm-4.7-flash · qwen.qwen3-32b-v1:0 · openai.gpt-oss-120b-1:0

standard gemini-2.5-pro · gemini-3.5-flash · gpt-4.1 · gpt-5.3-chat-latest · gpt-5.3-codex · gpt-5.1-codex-mini · o3 · claude-sonnet-4-6 · claude-sonnet-5 · grok-3-beta · deepseek.v3.2 · mistral.mistral-large-3-675b-instruct · moonshotai.kimi-k2.5 · moonshotai.kimi-k2-thinking · minimax.minimax-m2.1 · qwen.qwen3-next-80b-a3b · us.meta.llama4-maverick-17b-instruct-v1:0 · us.meta.llama4-scout-17b-instruct-v1:0 · mistral.devstral-2-123b · qwen.qwen3-coder-480b-a35b-v1:0 · nvidia.nemotron-super-3-120b · zai.glm-4.7 · qwen.qwen3-235b-a22b-2507-v1:0

premium gemini-3.1-pro-preview · claude-opus-4-7 · claude-opus-4-6 · claude-opus-4-8 · claude-fable-5 · gpt-5.5 · gpt-5.4 · zai.glm-5

Context-window guard: never routes to a model that can't handle your input. Circuit breakers reroute around provider outages automatically.

See the full tier × prefer routing grid →

Automatic failover

Circuit breakers detect provider outages and reroute requests in real time. Context-window guards ensure requests never go to a model that can't handle them. Your code doesn't change — routing adapts automatically.

Auto-routing

Set model: "auto" and the router analyses your full conversation context — system prompt, code blocks, message history, tool use, reasoning markers — and picks the right tier automatically. No heuristics on individual messages; it reads the whole picture. Every auto-routed response includes X-Model-Router-Auto-Tier and X-Model-Router-Auto-Score headers so you can see exactly why a tier was chosen. How it works →

You stay in control

Block providers you don't want to fund. Set daily spend limits. Enable auto-recharge so you never hit a wall mid-project. Export request traces to any OTLP backend — Axiom, Grafana, Honeycomb, Datadog.

$1 trial credit

Every new account starts with $1 in credits — enough for millions of tokens on economy models like GPT-4.1 Mini and Gemini 2.5 Flash. Add a payment method when you need more.

Transparent pricing

A 4% fee (minimum $0.80) on credit deposits. Requests are billed at actual provider market rates — you pay what the model costs, nothing more. Every response includes X-Model-Router-Model and X-Model-Router-Provider headers so you always know exactly what ran and what it cost.

Embeddings included

Same key, same endpoint pattern. embed-small, embed-large, and embed-titan aliases route to the best available embedding model. Batch inputs, optional dimension truncation, billed at input tokens only.

Get started

Create an account — no password, just an email code.

Point any OpenAI-compatible client at https://api.lxg2it.com

That's it. Your $1 trial credit starts immediately. Add more credits when needed.

Full documentation →