YALR — Yet Another LLM Router

01 / CAPABILITIES

Everything a router
should actually do.

Provider abstraction, intelligent distribution, automatic recovery, and live metrics — wired together in a single async binary.

⚖️ Load Balancing

Distribute across providers

Round-robin and configurable routing strategies spread traffic across multiple LLM backends, keeping latency low and no single provider overwhelmed.

🧩

Provider abstraction

Clean trait-based architecture. OpenAI, Anthropic, Ollama, LlamaCpp — and trivially extensible to the next one.

🌊

Streaming, end-to-end

Full Server-Sent Events support so tokens flow through the router the instant a provider emits them.

🔁

Retry & fallback

Exponential backoff with automatic failover to the next healthy provider. Honors retry_after headers.

📊

Live metrics

TTFT, latency, TPS, throughput, token usage and success rates — streamed over WebSocket to the dashboard.

🔐 Auth & Payments

Session auth, NIP-98 nostr, API keys — and Lightning top-ups

Internal user/password accounts, Nostr (NIP-98) login and per-provider API keys, plus a built-in Lightning Network top-up system (Routstr / PPQ) with auto-generated invoices and per-model pricing.

02 / DUAL ROUTING

Two ways to route
the same request.

Pin a request to an exact provider, or let the engine choose. The model string decides.

→ Prefixed · Direct

Target one provider

A slash-prefixed model name routes straight to a named provider, bypassing load balancing entirely.

model: "provider-1/gpt-4"

Split on / → provider slug + model
Route via route_by_slug()
Skip the balancer — go direct

⇄ Unprefixed · Balanced

Let the engine decide

A bare model name is matched against your routing config and load-balanced across every active provider serving it.

model: "gpt-4"

Match name in routing config
Round-robin across active providers
Fall back to first available config

03 / RESILIENCE

Self-healing by design.

Every provider carries a health state. The router degrades gracefully, backs off, and recovers automatically on success.

Healthy

Normal operation — accepting requests at full rate.

Degraded

Elevated error rate — still serving, now with backoff.

Unhealthy

High failure rate — temporarily pulled from rotation.

04 / ADMIN UI

A control room
in your browser.

React 19 + TypeScript + Vite SPA served right alongside the router. Manage everything without touching a config file.

◆Dashboard

Provider health grid, TTFT / latency / TPS stat cards, live WebSocket updates and a top-up dialog.

◆Providers

Full CRUD with quick-add templates for OpenAI, Anthropic, Ollama, LlamaCpp & OpenRouter.

◆Config

Routing configuration CRUD with nested provider-per-config assignment.

◆Metrics

Real-time Recharts for P90 TTFT & TPS, model breakdown, and a live event stream.

◆Users

User & API-key management with per-user model permission overrides.

◆Payments

Balances, model pricing, transaction history, invoices and admin credit/debit.

05 / QUICK START

Up in one command.

Run the container, mount your config, and hit the OpenAI-compatible endpoint.

docker · run

# Pull & run the router
docker run -p 3000:3000 \
  -v $(pwd)/data:/app/data \
  -v $(pwd)/config.yaml:/app/config.yaml \
  voidic/yalr:latest

# Admin UI → http://localhost:3000

curl · /v1/chat/completions

curl http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{ "role":"user",
                  "content":"hi" }],
    "stream": true
  }'

cargo · from source

# Build & run the server
cargo run --bin yalr-server

# With verbose logging
cargo run --bin yalr-server -- --verbose

# The CLI companion
cargo run --bin yalr-cli

config.yaml

server:
  host: "0.0.0.0"
  port: 3000

database:
  url: "sqlite:data/llm_router.db?mode=rwc"

auth:
  enabled: true
  allowed_pubkeys:
    - "your-nostr-pubkey"

06 / API SURFACE

OpenAI-compatible
& then some.

Drop-in /v1 endpoints for inference, plus a complete admin REST API for providers, metrics, users and payments.

POST/v1/chat/completions Chat completion · streaming & non-streaming

GET/v1/models List all available models

GET/api/providers List & manage configured providers

GET/api/metrics Current provider performance snapshot

WS/api/metrics/ws?token=… Real-time metrics event stream

GET/api/metrics/health Per-provider health state overview

POST/api/payments/topup Create a Lightning top-up invoice

GET/health Liveness probe

Yet Another
LLM Router.

Everything a router
should actually do.

Distribute across providers

Provider abstraction

Streaming, end-to-end

Retry & fallback

Live metrics

Session auth, NIP-98 nostr, API keys — and Lightning top-ups

Two ways to route
the same request.

Target one provider

Let the engine decide

Self-healing by design.

Healthy

Degraded

Unhealthy

A control room
in your browser.

Up in one command.

OpenAI-compatible
& then some.

Route smarter.
Ship faster.

Everything a routershould actually do.

Distribute across providers

Provider abstraction

Streaming, end-to-end

Retry & fallback

Live metrics

Session auth, NIP-98 nostr, API keys — and Lightning top-ups

Two ways to routethe same request.

Target one provider

Let the engine decide

Self-healing by design.

Healthy

Degraded

Unhealthy

A control roomin your browser.

Up in one command.

OpenAI-compatible& then some.

Route smarter.Ship faster.

Everything a router
should actually do.

Two ways to route
the same request.

A control room
in your browser.

OpenAI-compatible
& then some.

Route smarter.
Ship faster.