# PRD: LLM Spend Guard — Real-Time AI API Cost Enforcement

> **Stage:** Lean MVP PRD
> **Decision:** BUILD (Score: 30/40)
> **Generated:** 2026-05-01
> **Stack:** Next.js 14 + TypeScript + Supabase + Stripe + Tailwind

---

## 1. Problem & User

**Who:** Individual developers and small startup teams (1–5 engineers) running production LLM applications or AI agents using OpenAI, Anthropic, or compatible APIs.

**Pain:** Cloud provider spend caps are alerts, not enforcers. Users get charged $1,800 after setting a $100 limit. Runaway agents (infinite loops, prompt injection, misconfigured retries) can exhaust a month's budget in minutes. Current tools like Helicone and Portkey focus on observability — they tell you what happened, not stop it before it does.

**User Quote:** *"被扣了 $1,800，但我设的上限是 $100"* — r/googlecloud, 91 upvotes

**Wedge:** The fastest way to add a hard kill-switch to any LLM app without changing existing SDKs.

---

## 2. Target Outcome & KPIs

| KPI | Target |
|-----|--------|
| **Aha Moment rate** | ≥60% of signups route their first API call within 5 min of getting proxy key |
| **7-day retention** | ≥40% still making proxy calls after 7 days |
| **Free → Paid conversion** | ≥8% of free users upgrade within 14 days |
| **Paywall trigger rate** | ≥30% of free users hit the paywall within 7 days (signals real usage) |
| **MRR (Month 3)** | $1,500 (≈19 paying customers) |

**Aha Moment definition:** User registers → gets proxy key → routes first API call → sees it logged in dashboard with cost estimate. Total time: <5 minutes.

---

## 3. MVP Scope (In)

**PLG Free Tier:**
- 1 project, $100/month protected spend
- Unlimited API calls (until cap hit)
- Real-time dashboard (last 24h usage + block events)
- Email alert on 80% threshold

**Paid Tier ($29/mo):**
- Up to 3 projects
- $500/month protected spend per project
- Per-agent caps (via `X-Agent-ID` header)
- Anomaly detection: auto-block if >$10 spent in 5 minutes
- Slack webhook alerts
- 30-day usage history

**Core Mechanics:**
- OpenAI-compatible proxy endpoint (`/v1/chat/completions`)
- User replaces `https://api.openai.com` → `https://api.spendguard.io` and adds `Authorization: Bearer sg_<project_key>`
- Proxy forwards to real provider using encrypted stored key
- Estimates cost from token usage in provider response
- Checks spend total in Supabase; blocks with 402 if cap exceeded

---

## 4. Out of Scope (MVP)

- Streaming responses (v2)
- Embeddings / image / audio endpoints (v2)
- Multi-provider routing / fallback (separate product: Card #2)
- Team roles / multi-user orgs (v2)
- Evaluation, tracing, or prompt logging (not our wedge)
- Self-hosted / open-source version
- Azure OpenAI, Bedrock, Vertex (v2)

---

## 5. User Flow (Happy Path — PLG)

```
1. Land on homepage → "Stop runaway LLM spend" → "Get Started Free"
2. Sign up (email/Google, Supabase Auth) — no CC required
3. Dashboard: "Create your first project" → name it → copy proxy key (30 sec)
4. One-liner migration:
     Before: openai.baseURL = "https://api.openai.com/v1"
     After:  openai.baseURL = "https://api.spendguard.io/v1"
             openai.apiKey  = "sg_<your_project_key>"
5. Make any API call → it proxies through → see it logged in dashboard
   ★ AHA MOMENT: User sees real cost in real time, under their cap
6. Use product for days → hits free cap ($100) or needs 2nd project
   → Paywall: "Upgrade to protect more" → Stripe Checkout
7. Upgrade → $29/mo → continue working
8. Auto-email sequence triggers on paywall hit (see FR-16–18)
```

---

## 6. Functional Requirements (P0)

| ID | Requirement |
|----|-------------|
| FR-01 | User can sign up with email or Google (Supabase Auth), no CC |
| FR-02 | User can create a project, name it, and receive a `sg_<key>` proxy key |
| FR-03 | Proxy accepts requests at `/v1/chat/completions` with `sg_<key>` auth |
| FR-04 | Proxy validates key, checks project cap in Supabase, proxies or blocks (402) |
| FR-05 | After provider response, extract `usage.prompt_tokens + completion_tokens`, compute cost from pricing table, write to `usage_events` table |
| FR-06 | Dashboard shows: today's spend, monthly spend, remaining cap, last 10 events |
| FR-07 | Block event logged when 402 returned; dashboard shows block count and reason |
| FR-08 | Free tier hard limit: $100/month per project, 1 project max |
| FR-09 | Paid tier: $29/mo via Stripe Checkout, unlocks 3 projects + $500 cap |
| FR-10 | Middleware blocks API proxy if subscription expired; dashboard redirects to `/billing` |
| FR-11 | Email alert when 80% of monthly cap consumed (Resend.com) |
| FR-12 | Per-agent cap: if `X-Agent-ID` header present, enforce per-agent daily limit if configured |
| FR-13 | Anomaly block: if >$10 spent in any rolling 5-minute window, auto-block and alert |
| FR-14 | Slack webhook: POST block event JSON if webhook URL configured in project settings |
| FR-15 | Provider API key stored AES-256 encrypted in Supabase (never returned to client) |
| FR-16 | On paywall trigger: send email "You hit your free cap — here's what you protected" |
| FR-17 | 24h later if not upgraded: send "Your agents are currently paused" urgency email |
| FR-18 | 72h later if not upgraded: send "Most common mistake that causes $1,800 bills" story email with upgrade CTA |

---

## 7. Minimal Data Model

```sql
-- users (managed by Supabase Auth)

CREATE TABLE projects (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id     UUID REFERENCES auth.users NOT NULL,
  name        TEXT NOT NULL,
  proxy_key   TEXT UNIQUE NOT NULL,          -- sg_ prefix, hashed for lookup
  provider    TEXT DEFAULT 'openai',
  api_key_enc TEXT NOT NULL,                 -- AES-256 encrypted provider key
  monthly_cap NUMERIC(10,4) DEFAULT 100,     -- USD
  created_at  TIMESTAMPTZ DEFAULT now()
);

CREATE TABLE usage_events (
  id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  project_id      UUID REFERENCES projects NOT NULL,
  agent_id        TEXT,                      -- from X-Agent-ID header
  model           TEXT NOT NULL,
  input_tokens    INT NOT NULL,
  output_tokens   INT NOT NULL,
  cost_usd        NUMERIC(10,6) NOT NULL,
  blocked         BOOLEAN DEFAULT false,
  block_reason    TEXT,
  ts              TIMESTAMPTZ DEFAULT now()
);

CREATE TABLE model_pricing (
  model           TEXT PRIMARY KEY,
  input_per_1k    NUMERIC(10,6) NOT NULL,
  output_per_1k   NUMERIC(10,6) NOT NULL,
  updated_at      TIMESTAMPTZ DEFAULT now()
);

CREATE TABLE subscriptions (
  user_id         UUID REFERENCES auth.users PRIMARY KEY,
  stripe_sub_id   TEXT,
  plan            TEXT DEFAULT 'free',       -- 'free' | 'starter'
  status          TEXT DEFAULT 'active',
  period_end      TIMESTAMPTZ
);
```

---

## 8. API / Integration Notes

- **Proxy endpoint:** `POST /api/proxy/v1/chat/completions` — OpenAI-compatible passthrough
- **Auth:** `Authorization: Bearer sg_<key>` → SHA-256 hash → lookup in `projects`
- **Provider call:** fetch `https://api.openai.com/v1/chat/completions` with stored decrypted key
- **Cost calc:** `(input_tokens / 1000 * input_per_1k) + (output_tokens / 1000 * output_per_1k)`
- **Spend check:** `SELECT SUM(cost_usd) FROM usage_events WHERE project_id=? AND ts > date_trunc('month', now())`
- **Stripe:** Checkout Session for upgrade, webhook for `invoice.payment_succeeded` / `customer.subscription.deleted`
- **Email:** Resend.com — transactional + automated sequence (3 emails)
- **Encryption:** `crypto.createCipheriv('aes-256-gcm', KEY, iv)` in Node.js edge function

---

## 9. Acceptance Criteria

| # | Criteria |
|---|----------|
| AC-01 | New user signs up, creates project, copies proxy key — all within 60 seconds |
| AC-02 | Replacing `api.openai.com` with proxy URL routes successfully; response body identical to direct OpenAI call |
| AC-03 | After cap exceeded, next request returns HTTP 402 with `{"error": "spend_cap_exceeded", "cap_usd": 100, "spent_usd": 100.42}` |
| AC-04 | Dashboard shows correct spend total within 3 seconds of request completion |
| AC-05 | Block event appears in dashboard immediately after 402 response |
| AC-06 | Free user with 1 project cannot create a 2nd project — UI shows upgrade CTA |
| AC-07 | Stripe checkout completes → subscription row updated → 2nd project creation unlocked |
| AC-08 | `X-Agent-ID: crawler-bot` header logged correctly in `usage_events.agent_id` |
| AC-09 | Anomaly rule: 3 rapid large calls totaling >$10 in 5 min → 4th call blocked |
| AC-10 | Provider API key is never returned in any API response or client-visible field |

---

## 10. Delivery Plan

### M1 — Proxy Core + Aha Moment (Hours 1–8)
**Goal:** User can sign up and route a real API call in 5 minutes.

Files to create:
- `app/api/proxy/v1/chat/completions/route.ts` — proxy handler, auth, spend check, block logic
- `lib/encrypt.ts` — AES-256 encrypt/decrypt for provider keys
- `lib/pricing.ts` — cost calculation from model pricing table
- `app/(auth)/login/page.tsx` — Supabase Auth UI
- `app/dashboard/page.tsx` — spend overview + recent events
- `app/dashboard/projects/new/page.tsx` — create project, copy proxy key
- `supabase/migrations/001_init.sql` — all 4 tables

Exit criteria:
- `POST /api/proxy/v1/chat/completions` with valid `sg_<key>` returns same response as direct OpenAI call
- Usage event written to DB with correct token counts and cost
- Dashboard shows today's spend update within 3 seconds

---

### M2 — Paywall + Conversion Sequence (Hours 9–14)
**Goal:** Monetization activated; free users auto-enter email sequence on cap hit.

Files to create/modify:
- `app/api/billing/checkout/route.ts` — Stripe Checkout session
- `app/api/billing/webhook/route.ts` — handle subscription events
- `middleware.ts` — block proxy API if subscription expired
- `lib/emails/paywall-hit.tsx` — email template #1 (Resend React Email)
- `lib/emails/agents-paused.tsx` — email template #2
- `lib/emails/cost-story.tsx` — email template #3
- `app/api/cron/email-sequence/route.ts` — Vercel cron (daily check, send next email)
- `app/billing/page.tsx` — upgrade CTA with price + feature comparison

Exit criteria:
- Free user hitting $100 cap gets 402 on next request + email #1 within 60 seconds
- Stripe checkout → webhook → `subscriptions` row updated to `starter` → 2nd project creatable
- Email #2 sends automatically 24h after paywall if no upgrade

---

### M3 — Anomaly Detection + Slack Alerts (Hours 15–20)
**Goal:** Paid tier fully functional; enterprise-lite safety features.

Files to create/modify:
- `lib/anomaly.ts` — rolling 5-min spend check logic
- `lib/slack.ts` — POST block event to configured webhook URL
- `app/dashboard/projects/[id]/settings/page.tsx` — agent caps + Slack URL config
- `app/api/cron/monthly-reset/route.ts` — Vercel cron to reset monthly spend aggregates
- `app/dashboard/projects/[id]/agents/page.tsx` — per-agent spend breakdown

Exit criteria:
- 3 rapid calls totaling >$10 in 5 min → 4th blocked with `anomaly_detected` reason in DB
- Slack webhook receives JSON payload within 2 seconds of block event
- Per-agent dashboard shows correct cost breakdown by `agent_id`

---

## 11. Risks & Mitigations

| Risk | Likelihood | Mitigation |
|------|-----------|------------|
| Proxy latency adds >200ms to every call | Medium | Deploy on Vercel Edge Runtime in same region as OpenAI (US-East); add `x-proxy-latency` header for debugging |
| Provider API key compromise via DB breach | Low | AES-256 encryption + key stored in env var (not DB); audit log for all key decryption events |
| Users abuse free tier with throwaway accounts | Medium | Rate limit signups per IP; require email verification before proxy key issued |
| OpenAI blocks proxy traffic from single IP | Low | Use Vercel edge functions (distributed IPs); proxy passes through all original headers |
| Pricing table stale after model price changes | Medium | Webhook from Vercel cron checks OpenAI pricing page weekly; admin Slack alert on mismatch |

---

## 12. Chargeability Rationale

**One runaway agent incident saves more than a year's subscription:** a single misconfigured loop at GPT-4o pricing can cost >$300/hour — the $29/mo "starter" plan pays for itself the moment it blocks one runaway call.

**PLG conversion logic:**
- Free tier gives real value ($100 protection) with zero friction
- Paywall triggers naturally at the point of highest motivation (just got blocked, cap proved its value)
- 3-email automated sequence converts async — no sales call needed

**Target:** 19 paying customers at $29/mo = $551 MRR by end of Month 3. CAC ~$0 (PLG inbound). Churn low due to SDK-level stickiness (changing base URL is trivial; remembering to do it is not).

