A high-performance async router that load-balances, routes intelligently, and fails over across every LLM provider — OpenAI, Anthropic, Ollama, LlamaCpp and beyond. One endpoint. Many backends. Zero drama.
Provider abstraction, intelligent distribution, automatic recovery, and live metrics — wired together in a single async binary.
Round-robin and configurable routing strategies spread traffic across multiple LLM backends, keeping latency low and no single provider overwhelmed.
Clean trait-based architecture. OpenAI, Anthropic, Ollama, LlamaCpp — and trivially extensible to the next one.
Full Server-Sent Events support so tokens flow through the router the instant a provider emits them.
Exponential backoff with automatic failover to the next healthy provider. Honors retry_after headers.
TTFT, latency, TPS, throughput, token usage and success rates — streamed over WebSocket to the dashboard.
Internal user/password accounts, Nostr (NIP-98) login and per-provider API keys, plus a built-in Lightning Network top-up system (Routstr / PPQ) with auto-generated invoices and per-model pricing.
Pin a request to an exact provider, or let the engine choose. The model string decides.
A slash-prefixed model name routes straight to a named provider, bypassing load balancing entirely.
/ → provider slug + modelroute_by_slug()A bare model name is matched against your routing config and load-balanced across every active provider serving it.
Every provider carries a health state. The router degrades gracefully, backs off, and recovers automatically on success.
Normal operation — accepting requests at full rate.
Elevated error rate — still serving, now with backoff.
High failure rate — temporarily pulled from rotation.
React 19 + TypeScript + Vite SPA served right alongside the router. Manage everything without touching a config file.
Provider health grid, TTFT / latency / TPS stat cards, live WebSocket updates and a top-up dialog.
Full CRUD with quick-add templates for OpenAI, Anthropic, Ollama, LlamaCpp & OpenRouter.
Routing configuration CRUD with nested provider-per-config assignment.
Real-time Recharts for P90 TTFT & TPS, model breakdown, and a live event stream.
User & API-key management with per-user model permission overrides.
Balances, model pricing, transaction history, invoices and admin credit/debit.
Run the container, mount your config, and hit the OpenAI-compatible endpoint.
# Pull & run the router docker run -p 3000:3000 \ -v $(pwd)/data:/app/data \ -v $(pwd)/config.yaml:/app/config.yaml \ voidic/yalr:latest # Admin UI → http://localhost:3000
curl http://localhost:3000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4", "messages": [{ "role":"user", "content":"hi" }], "stream": true }'
# Build & run the server cargo run --bin yalr-server # With verbose logging cargo run --bin yalr-server -- --verbose # The CLI companion cargo run --bin yalr-cli
server: host: "0.0.0.0" port: 3000 database: url: "sqlite:data/llm_router.db?mode=rwc" auth: enabled: true allowed_pubkeys: - "your-nostr-pubkey"
Drop-in /v1 endpoints for inference, plus a complete admin REST API for providers, metrics, users and payments.
One async binary stands between your app and every LLM provider you run. Clone it, configure it, own it.