ai-cost · v0.3.0 · MIT · Node ≥ 20

LLM cost observability
for engineers who ship.

Wrap your LLM SDK. KostAI records every call, scores it for waste across nine categories, and — in shadow mode — runs a cheaper or local path in parallel so you can see, per call, what each optimized route would have saved. Nothing leaves the machine.

npm install @sapperjohn/kostai
9
waste categories scored per call
6
provider wrappers (Anthropic, OpenAI, Google, Ollama, LM Studio, OpenAI-compat)
20
MCP tools over stdio + HTTP bridge
0
data leaves the machine by default

What it does

Wrap & record

One-line wrappers for Anthropic, OpenAI, Google, Ollama, LM Studio, and any OpenAI-compatible endpoint. Append-only JSONL event store you can cat. Optional SQLite backend. Optional Elasticsearch sink with an ECS-shaped document.

Score waste, per call

Nine categories — oversized context, redundant history, long outputs where a short one would have done, over-model for task class, missing cache hits, and more. Every call gets an efficiency score and an avoidable-cost estimate in USD.

Shadow mode A/B

For a route you flag, run the frontier call and a cheaper/local one in parallel. The user always sees the frontier result. KostAI records both, grades them with a quality evaluator, and shows you exactly which optimized path would have saved money without quality regression.

Route

Pure function. Given a call, classify the task, check the model, emit one of four decisions — local sufficient, cheaper API sufficient, frontier required, cache hit — with a USD-denominated savings estimate per decision.

MCP server

Twenty tools over both stdio (Claude Desktop / Claude Code) and an HTTP+SSE bridge. Local↔frontier handoff, cheap-API routing, local preprocessing, durable task queue with exponential backoff.

Dashboard

Eight tabs on :3674 — Overview, Shadow Mode, Router, Local LLMs, Bridge, Queue, Calls, Trends. Live-syncs over SSE. /health, /ready, /metrics (Prometheus) on the side.

Quick start

1 · install
npm install @sapperjohn/kostai
npx ai-cost init
2 · wrap your client
import Anthropic from "@anthropic-ai/sdk";
import { wrapAnthropic } from "@sapperjohn/kostai";

const client = wrapAnthropic(new Anthropic(), {
  appName: "my-app",
  route: "bugfix-agent",
});
3 · open the dashboard
npx ai-cost dashboard
# → http://localhost:3674

Prefer zero-code adoption? Run npx ai-cost proxy --mode observe and set OPENAI_BASE_URL=http://localhost:4311/v1.

The nine waste categories

Oversized context
Redundant history
Over-long output
Over-model for task
Missed cache hit
Retry burn
Tool-call fan-out
System-prompt bloat
Unnecessary streaming

Every event carries llm.efficiency_score (0–100) and llm.avoidable_context_cost_usd. Roll up by route, model, app, or workflow. Ship to Elasticsearch and grade your whole fleet.

Two-machine bridge

Run a MacBook as your frontier node and a Mac Mini as a local-LLM workhorse. HTTP + SSE transport with bearer-token auth, a durable 24-hour task queue, exponential backoff, and a live dashboard that syncs across both machines.

on both machines
npx ai-cost bridge --listen --with-worker --with-dashboard
npx ai-cost bridge --doctor

Built for Elastic-shaped observability

KostAI ships an Elasticsearch sink out of the box. Events flow through _bulk as ECS 8.11 documents with a dedicated llm.* namespace — cost, tokens, avoidable cost, efficiency, route, model, provider — plus the standard event.category, event.action, and labels.* you already index.

A starter Kibana dashboard (kibana/dashboards/ai-cost-overview.ndjson) lands a five-panel overview in one import: total spend, avoidable spend, average efficiency, spend by model over time, top routes by spend.

ai-cost.config.json
{
  "elasticsearch": {
    "enabled": true,
    "url": "https://es.example.com:9200",
    "index": "ai-cost-events",
    "apiKey": "BASE64_API_KEY",
    "batchSize": 50,
    "flushIntervalMs": 5000
  },
  "redactPII": true
}

PII redaction is on by default: email, US phone, SSN, IPv4/v6, credit card, plus GitHub/Slack/GitLab tokens. Fail-soft buffered sink — a network blip never loses events, never blocks a call.

Not a SaaS. Not a proxy you have to trust.

Everything runs on your machine. The JSONL store is a file you can read. The dashboard listens on localhost. The Elasticsearch sink is opt-in. KostAI is a library and a CLI — not a service that charges per event.