RS Built in Rust · Axum + Tokio

Yet Another
LLM Router.

A high-performance async router that load-balances, routes intelligently, and fails over across every LLM provider — OpenAI, Anthropic, Ollama, LlamaCpp and beyond. One endpoint. Many backends. Zero drama.

OpenAI-compat/v1 API surface
SSEstreaming first
SQLitebacked config
// ROUTING ENGINE round-robin · active
01 / CAPABILITIES

Everything a router
should actually do.

Provider abstraction, intelligent distribution, automatic recovery, and live metrics — wired together in a single async binary.

⚖️ Load Balancing

Distribute across providers

Round-robin and configurable routing strategies spread traffic across multiple LLM backends, keeping latency low and no single provider overwhelmed.

🧩

Provider abstraction

Clean trait-based architecture. OpenAI, Anthropic, Ollama, LlamaCpp — and trivially extensible to the next one.

🌊

Streaming, end-to-end

Full Server-Sent Events support so tokens flow through the router the instant a provider emits them.

🔁

Retry & fallback

Exponential backoff with automatic failover to the next healthy provider. Honors retry_after headers.

📊

Live metrics

TTFT, latency, TPS, throughput, token usage and success rates — streamed over WebSocket to the dashboard.

🔐 Auth & Payments

Session auth, NIP-98 nostr, API keys — and Lightning top-ups

Internal user/password accounts, Nostr (NIP-98) login and per-provider API keys, plus a built-in Lightning Network top-up system (Routstr / PPQ) with auto-generated invoices and per-model pricing.

02 / DUAL ROUTING

Two ways to route
the same request.

Pin a request to an exact provider, or let the engine choose. The model string decides.

→ Prefixed · Direct

Target one provider

A slash-prefixed model name routes straight to a named provider, bypassing load balancing entirely.

model: "provider-1/gpt-4"
  1. Split on / → provider slug + model
  2. Route via route_by_slug()
  3. Skip the balancer — go direct
⇄ Unprefixed · Balanced

Let the engine decide

A bare model name is matched against your routing config and load-balanced across every active provider serving it.

model: "gpt-4"
  1. Match name in routing config
  2. Round-robin across active providers
  3. Fall back to first available config
03 / RESILIENCE

Self-healing by design.

Every provider carries a health state. The router degrades gracefully, backs off, and recovers automatically on success.

Healthy

Normal operation — accepting requests at full rate.

Degraded

Elevated error rate — still serving, now with backoff.

Unhealthy

High failure rate — temporarily pulled from rotation.

04 / ADMIN UI

A control room
in your browser.

React 19 + TypeScript + Vite SPA served right alongside the router. Manage everything without touching a config file.

Dashboard

Provider health grid, TTFT / latency / TPS stat cards, live WebSocket updates and a top-up dialog.

Providers

Full CRUD with quick-add templates for OpenAI, Anthropic, Ollama, LlamaCpp & OpenRouter.

Config

Routing configuration CRUD with nested provider-per-config assignment.

Metrics

Real-time Recharts for P90 TTFT & TPS, model breakdown, and a live event stream.

Users

User & API-key management with per-user model permission overrides.

Payments

Balances, model pricing, transaction history, invoices and admin credit/debit.

05 / QUICK START

Up in one command.

Run the container, mount your config, and hit the OpenAI-compatible endpoint.

docker · run
# Pull & run the router
docker run -p 3000:3000 \
  -v $(pwd)/data:/app/data \
  -v $(pwd)/config.yaml:/app/config.yaml \
  voidic/yalr:latest

# Admin UI → http://localhost:3000
curl · /v1/chat/completions
curl http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{ "role":"user",
                  "content":"hi" }],
    "stream": true
  }'
cargo · from source
# Build & run the server
cargo run --bin yalr-server

# With verbose logging
cargo run --bin yalr-server -- --verbose

# The CLI companion
cargo run --bin yalr-cli
config.yaml
server:
  host: "0.0.0.0"
  port: 3000

database:
  url: "sqlite:data/llm_router.db?mode=rwc"

auth:
  enabled: true
  allowed_pubkeys:
    - "your-nostr-pubkey"
06 / API SURFACE

OpenAI-compatible
& then some.

Drop-in /v1 endpoints for inference, plus a complete admin REST API for providers, metrics, users and payments.

POST/v1/chat/completions Chat completion · streaming & non-streaming
GET/v1/models List all available models
GET/api/providers List & manage configured providers
GET/api/metrics Current provider performance snapshot
WS/api/metrics/ws?token=… Real-time metrics event stream
GET/api/metrics/health Per-provider health state overview
POST/api/payments/topup Create a Lightning top-up invoice
GET/health Liveness probe
MIT Licensed · Self-Hosted

Route smarter.
Ship faster.

One async binary stands between your app and every LLM provider you run. Clone it, configure it, own it.