Beta · Q2 2026

Cut your LLM costs by 40% with one URL change.

CostGhost sits between your app and LLM providers. It routes every request to the cheapest model that meets your quality bar. You change one line of code.

api.openai.com gw.costghost.dev

Three steps. Zero config.

No SDK changes. No model mapping. No manual rules. CostGhost learns your traffic and optimizes automatically.

01

Route through CostGhost

Point your existing OpenAI/Anthropic client at our gateway. One base URL change. Your code, prompts, and parameters stay identical.

02

Classify & route at the edge

Every request is classified in <1ms at the edge. Budget phase, task type, and priority determine the optimal model. 23 routing rules, zero latency overhead.

03

Save without losing quality

Low-priority classification? GPT-4o-mini instead of Claude Opus. Critical reasoning task? Best model, guaranteed. You save money on requests that never needed the expensive model.

Edge-native. Zero servers.

Built on Cloudflare Workers with Durable Objects for per-tenant state. Deployed across 300+ locations. No cold starts.

┌────────────────────────────────────────────────────────┐ Your app POST /v1/chat/completions ┌────────────────────────────────────┐ │ CostGhost Gateway (Hono + Edge) │ │ Classify → Route → Forward │ └──────────────┬─────────────────────┘ ┌────────────────────────────────────┐ │ Budget State Machine (DO) │ ◄── Moat │ GREEN → YELLOW → ORANGE → RED │ │ 23 rules × 5 phases × task type │ └──────────────┬─────────────────────┘ ┌─────────┼─────────┐ ▼ ▼ ▼ Anthropic OpenAI Mistral └────────────────────────────────────────────────────────┘
<5ms
Routing latency
23
Routing rules
5
Budget phases
4+
LLM providers

Not another dashboard. Invisible infrastructure.

CostGhost runs as middleware. Your team never sees it. Your CFO sees the savings.

Budget state machine

5 phases: GREEN through HARD_STOP. Sequential transitions only. Idempotent spend recording. Automatic monthly reset. No request ever exceeds your budget.

Learning cache

Exponential moving average over your actual traffic. After ~500 requests, the system knows which model delivers acceptable quality for each task type at the lowest cost.

Fail-closed design

No request passes through unvalidated. Per-tenant rate limiting in Durable Objects. Sliding window. If the budget engine is unreachable, requests are rejected — never silently forwarded.

Append-only audit log

Every routing decision logged to R2. NDJSON, partitioned by tenant/day/hour. Full traceability of what was routed where and why. Export anytime.

30 seconds to integrate.

If you use the OpenAI SDK (or any compatible client), you change one line. That's the entire integration. No new dependencies, no config files, no migration.

  • Works with OpenAI, Anthropic, Mistral, and Google models
  • Optional priority header for fine-grained control
  • Team and project sub-budgets via custom headers
  • Usage API for real-time cost monitoring
app.ts
// Before: const client = new OpenAI({ baseURL: "https://api.openai.com/v1" }); // After: const client = new OpenAI({ baseURL: "https://gw.costghost.dev/v1" }); // Done. Everything else stays identical. // Optional: set priority per request const res = await client.chat.completions .create({ model: "gpt-4o", messages: [{ role: "user", content: prompt }], }, { headers: { "X-CG-Priority": "low" } });

We're onboarding the first 50 teams.

No commitment. No credit card. Just your work email.

We'll reach out within 24 hours to set up your tenant.