OpenAI-compatible chat completions API
POST /v1/chat/completions with the same request shape as OpenAI. Drop-in for any OpenAI SDK.
What's shipped, what's in flight, and what's coming. SafeRouter is the OpenRouter you can verify — every line below is tracked against that mission.
OpenRouter is a smart router. SafeRouter is a smart router that runs inside a TEE and lets anyone independently prove your prompt was processed by the open-source code on GitHub — not by a tampered build with logging hooks.
POST /v1/chat/completions with the same request shape as OpenAI. Drop-in for any OpenAI SDK.
Generic OpenAiCompatible abstraction routes by model prefix. Live: DeepSeek (V4 flash/pro), GLM via Z.AI (4.5 / 4.6 / 4.7 / 5.x families). Adding a provider is a config change, not new code.
Byte-level pass-through of upstream SSE — token-by-token output, including reasoning_content for thinking models like DeepSeek V4 and GLM 4.5-flash.
Customers use one SafeRouter key. We hold the upstream provider keys. Authorization: Bearer sk-sr-... required on chat completions; verifier and health stay public.
Anyone can hit our endpoint, see the attestation report, and (in v2) verify the AMD SEV-SNP cert chain. Hosted on GitHub Pages so we cannot tamper with it. Currently in DEMO mode.
Persistent ed25519 signing identity, synthetic 3-layer cert chain, measurement = sha256(binary‖providers‖models). New endpoints: POST /attestation/quote, GET /attestation/verifying-key. Verifier page does real client-side signature + chain check via @noble/ed25519. Same wire format as future SEV-SNP — only the signing backend swaps.
Every gateway startup appends a signed entry — measurement, public key, build, audience — to an append-only hash-chained log. Endpoints: GET /attestation/transparency[/head|/{index}]. Verifier walks the chain, recomputes every payload hash and entry hash, verifies every ed25519 signature, and confirms the latest startup entry matches the live identity. Defends against split-view / equivocation attacks. Wire format aligned with public log services (Sigstore Rekor) so the storage backend can be swapped later without touching the verifier.
When upstream returns 5xx / 429 / network timeout, SafeRouter transparently retries on a capability-matched alternative model (hand-curated chain per model in src/fallback.rs). 4xx client errors bubble up unretried. Successful fallbacks surface as X-SafeRouter-Fallback-From / -To response headers; each attempt is its own Activity entry with attempt + original_model fields so users can see exactly what was tried.
Per-key USD balance persisted to ~/.saferouter-credits.json. Each successful non-stream request deducts cost from prompt_tokens × input_per_million + completion_tokens × output_per_million. Pre-flight rejects calls when balance hits zero (402 Payment Required). New endpoints: GET /v1/credits (self balance + per-model usage), POST /v1/credits/topup. Dashboard at credits.html. Stream billing deferred (no upstream usage at this layer until we buffer SSE).
Probabilistic sampler (default 5%, configurable via SAFEROUTER_AUDIT_RATE) captures sha256(prompt) + sha256(response) for sampled requests, signs each with the gateway identity key, and appends to a hash-chained audit log. No plaintext is logged. GET /attestation/audit/head is public for verifier integrity check (Check 7); GET /v1/audit/samples exposes full sample data for authorised holders. The future zkTLS layer (cryptographic proof that upstream really produced the content) will swap the signing backend without changing the wire format.
Card grid of all routed models with provider badge, context length, pricing. Search + provider filter. "Try it" button → copy curl/python with your key pre-filled.
OpenRouter-style pill button; click expands a search-filterable model drawer. Streaming responses, thinking-chain folding, conversation persistence.
For models that emit reasoning_content separately (V4 hybrid thinking, GLM 4.5+), we pass it through unchanged and surface it in the chat UI as a foldable "Thinking…" block.
OpenAI-standard catalog endpoint. SDK clients can call client.models.list(). Models page and chat picker now fetch dynamically; backend is the single source of truth.
Live request log polling /v1/activity every 5s. Stats: total requests, success rate, p50/p95 latency, breakdown by model and provider. Persists to ~/.saferouter-activity.json across restarts.
Create new keys with names, see prefix + usage, inline rename, revoke. Full secret shown once at creation. Backend uses a JSON-persisted key store; revoked keys immediately stop authenticating.
One prompt, up to 4 columns streaming in parallel. Per-column model picker, status pill, latency timer. Selection persists in localStorage.
Real SEV-SNP attester code (src/tee/sevsnp.rs) is written and compiles: /dev/sev-guest ioctl, AMD KDS VCEK fetch, AttestationBundle in the same wire format as mock. Awaiting AWS c7a confidential VM provisioning to verify on real hardware — code falls back to mock when /dev/sev-guest is absent, so it works locally today.
Customer brings their own DeepSeek/GLM/OpenAI key. The key is sealed and only released to an attested SafeRouter binary — so even our own ops can't decrypt it. The cornerstone of "we literally cannot see your traffic." Depends on real SEV-SNP hardware for sealing.