CostGhost — Cut your LLM API costs by 40% with intelligent routing.

How it works

Three steps. Zero config.

No SDK changes. No model mapping. No manual rules. CostGhost learns your traffic and optimizes automatically.

Route through CostGhost

Point your existing OpenAI/Anthropic client at our gateway. One base URL change. Your code, prompts, and parameters stay identical.

Classify & route at the edge

Every request is classified in <1ms at the edge. Budget phase, task type, and priority determine the optimal model. 23 routing rules, zero latency overhead.

Save without losing quality

Low-priority classification? GPT-4o-mini instead of Claude Opus. Critical reasoning task? Best model, guaranteed. You save money on requests that never needed the expensive model.

Architecture

Edge-native. Zero servers.

Built on Cloudflare Workers with Durable Objects for per-tenant state. Deployed across 300+ locations. No cold starts.

┌────────────────────────────────────────────────────────┐ │ Your app │ │ POST /v1/chat/completions │ │ │ │ │ ▼ │ │ ┌────────────────────────────────────┐ │ │ │ CostGhost Gateway (Hono + Edge) │ │ │ │ Classify → Route → Forward │ │ │ └──────────────┬─────────────────────┘ │ │ │ │ │ ▼ │ │ ┌────────────────────────────────────┐ │ │ │ Budget State Machine (DO) │ ◄── Moat │ │ │ GREEN → YELLOW → ORANGE → RED │ │ │ │ 23 rules × 5 phases × task type │ │ │ └──────────────┬─────────────────────┘ │ │ │ │ │ ┌─────────┼─────────┐ │ │ ▼ ▼ ▼ │ │ Anthropic OpenAI Mistral │ └────────────────────────────────────────────────────────┘

<5ms

Routing latency

Routing rules

Budget phases

LLM providers

Built for engineers

Not another dashboard. Invisible infrastructure.

CostGhost runs as middleware. Your team never sees it. Your CFO sees the savings.

Budget state machine

5 phases: GREEN through HARD_STOP. Sequential transitions only. Idempotent spend recording. Automatic monthly reset. No request ever exceeds your budget.

Learning cache

Exponential moving average over your actual traffic. After ~500 requests, the system knows which model delivers acceptable quality for each task type at the lowest cost.

Fail-closed design

No request passes through unvalidated. Per-tenant rate limiting in Durable Objects. Sliding window. If the budget engine is unreachable, requests are rejected — never silently forwarded.

Append-only audit log

Every routing decision logged to R2. NDJSON, partitioned by tenant/day/hour. Full traceability of what was routed where and why. Export anytime.

Integration

30 seconds to integrate.

If you use the OpenAI SDK (or any compatible client), you change one line. That's the entire integration. No new dependencies, no config files, no migration.

Works with OpenAI, Anthropic, Mistral, and Google models
Optional priority header for fine-grained control
Team and project sub-budgets via custom headers
Usage API for real-time cost monitoring

          
          app.ts
        
// Before:
const client = new OpenAI({
  baseURL: "https://api.openai.com/v1"
});

// After:
const client = new OpenAI({
  baseURL: "https://gw.costghost.dev/v1"
});

// Done. Everything else stays identical.
// Optional: set priority per request
const res = await client.chat.completions
  .create({
    model: "gpt-4o",
    messages: [{ role: "user", content: prompt }],
  }, {
    headers: { "X-CG-Priority": "low" }
  });

Cut your LLM costs by 40% with one URL change.