# DocFlow — Bookkeeping/Tax Document Collection Hub MVP PRD (v2)

> Generated: 2026-03-23
> Source: `demand-discovery-engine/output/Bookkeeping-Client-Document-Collection-Filing-Challenges-20260323.md`
> Engine: `demand-discovery-engine` (Run ID: `20260323_220032`)
> Based on Decision Cards: #1 (33/40), #2 (32/40), #3 (31/40), #4 (30/40)
> Goal: Provide a development-ready MVP specification, covering all 4 BUILD cards in phased rollout.

---

## 1. Project Definition

### 1.1 Project Name
DocFlow for Bookkeepers

### 1.2 One-Line Value Proposition
A "zero-login document collection + completeness verification + audit receipt" system for 1–10 person bookkeeping/tax firms: clients upload via magic links with no registration, firms see real-time collection status with automated reminders, and every submission carries a timestamped receipt for dispute defense.

### 1.3 Core Problems (Extracted from 10 Pain Points)
1. **Documents scattered across channels**: Clients submit via email, text, portal, cloud drives — 5+ channels with no unified tracking.
2. **Chasing is unbillable**: Under fixed-fee models, repeated follow-ups are hidden costs eroding profit (>10h/month).
3. **No reliable completeness signal**: No trustworthy "all documents received" indicator, preventing firms from starting work on time.
4. **Portal adoption is abysmal**: Even firms that purchased TaxDome/Canopy find clients refuse to log in and upload.
5. **No dispute trail**: Clients later claim "I submitted everything" — firms lack traceable evidence.

### 1.4 Wedge Strategy
- **Zero-login magic links**: Clients click and upload — no registration, no app download — directly eliminating the portal abandonment problem.
- **Completeness signal light**: Item-by-item checklist confirmation; green light only when all required documents are received — replacing manual Excel tracking.
- **Starting at $27/mo**: Significantly undercuts TaxDome ($50+/user/mo) and Canopy ($100+/mo), targeting price-sensitive solo bookkeepers.

---

## 2. Target Users & Scenarios

### 2.1 Target Users (B2B)
| Attribute | Description |
|-----------|-------------|
| Role | Bookkeeper / Tax Professional / Small firm owner |
| Team Size | 1–10 people |
| Client Base | 20–100 small business clients |
| Pricing Model | Fixed monthly fee (not hourly billing) |
| Current Tools | TaxDome / Canopy / SmartVault / Email + Excel |

### 2.2 Core Scenarios
1. **Monthly document collection**: On the 1st–10th of each month, collect bank statements, credit card statements, receipts, etc. from all clients.
2. **Tax season document collection**: January–April, intensive collection of W-2s, 1099 series, K-1s, and other tax filings.
3. **Dispute resolution**: After receiving an IRS Underreporter Notice, produce client submission records as evidence.

### 2.3 Jobs To Be Done
- When I start monthly document collection, I want to send upload links with one click and have automated follow-ups, so I can stop manually tracking who has submitted what.
- When a client uploads documents, I want the system to check completeness against a checklist, so I know exactly when I can start working.
- When a client claims "I already submitted everything," I want a timestamped receipt as proof, so I can protect myself from liability.

---

## 3. Phased Scope

### 3.1 Phase 1 — MVP Core (P0, <=20h)
> Maps to Decision Card #1 + #2

| # | Feature | Description |
|---|---------|-------------|
| 1 | Firm signup/login | Supabase Auth (Magic Link) |
| 2 | Client management | Create/edit/archive clients with name, email, phone |
| 3 | Monthly document checklist templates | Configurable required items per client (Bank Statement, CC Statement, Receipts, etc.) |
| 4 | Zero-login upload link generation | UUID token with configurable expiration |
| 5 | Client upload page | Mobile-first, no registration required, select checklist items to upload against |
| 6 | File storage | Isolated by firm/client/cycle |
| 7 | Completeness engine | All required items have at least one submission => COMPLETE |
| 8 | Automated email reminders | Daily cron job sends reminder emails to INCOMPLETE clients |
| 9 | Collection dashboard | Three-state view: Complete / Incomplete / Unresponsive |
| 10 | Basic analytics | Track signup/client creation/request sent/upload/completion events |

### 3.2 Phase 1.5 — Reminder Enhancement (P0.5, +5h)

| # | Feature | Description |
|---|---------|-------------|
| 11 | SMS reminders | Twilio-powered upload link delivery |
| 12 | Escalating reminder cadence | Day 1: gentle → Day 3: reminder → Day 7: urgent |
| 13 | Client submission confirmation | Client clicks "I have submitted all documents" button, timestamp recorded |

### 3.3 Phase 2 — Audit Receipts (P1, +8h)
> Maps to Decision Card #3 (31/40, REAL GAP)

| # | Feature | Description |
|---|---------|-------------|
| 14 | Submission receipt PDF generation | On cycle close, generate timestamped receipt with file names, upload times, and file hashes |
| 15 | Client acknowledgment | Client confirms "this submission is complete" via magic link; IP + timestamp recorded |
| 16 | Automatic receipt delivery | PDF receipt emailed to both the firm and the client |
| 17 | Audit log query | View full operation history by client/cycle |

### 3.4 Phase 3 — AI Document Classification (P2, +15h)
> Maps to Decision Card #4 (30/40, REAL GAP)

| # | Feature | Description |
|---|---------|-------------|
| 18 | Automatic PDF classification | Upload mixed PDF → AI identifies document types (W-2, 1099 series, etc.) |
| 19 | Auto-split and rename | Split into individual files based on classification results |
| 20 | Structured manifest output | CSV/JSON manifest (document type, page range, taxpayer name) |
| 21 | Usage-based billing | Credit model, independent of monthly subscription |

### 3.5 Explicitly Out of Scope
- Client-side account system and permissions
- Native deep integration with TaxDome/Canopy (can bridge via Zapier Webhook)
- Native mobile app
- Custom AI model training
- Video processing
- Enterprise-grade compliance certification

---

## 4. Functional Requirements (Phase 1)

### 4.1 Firm Side: Client & Checklist Management

**Description**
- Users can create clients (name, email, phone, timezone).
- Users can configure a "document checklist template" per client (e.g., Bank Statement, Credit Card Statement, Receipts).
- Each collection cycle (e.g., 2026-03) inherits from a template and generates period-specific tasks.

**Acceptance Criteria**
- A client + checklist can be created within 2 minutes.
- Checklist items support `required = true/false`.
- Batch cycle creation from templates is supported.

### 4.2 Firm Side: Send Upload Request

**Description**
- User clicks "Send Request" — system generates a one-time token link: `/upload/{token}`.
- Default email template includes: client name, due date, required checklist summary, upload button.

**Acceptance Criteria**
- Resending within the same cycle reuses the existing valid token (configurable expiration).
- Email delivery status is trackable (sent/failed).

### 4.3 Client Side: Zero-Login Upload

**Description**
- Client visits the link and can upload files without registration.
- Client must select the corresponding checklist item (document_type) when uploading.
- Supports multi-file upload and drag-and-drop.
- After successful upload, client sees a list of submitted items and progress.

**Acceptance Criteria**
- Fully functional on mobile (iOS/Android browsers)
- Per-file size limit is configurable (default 20MB)
- Page load < 3 seconds (on 3G network conditions)
- Submitted items list updates immediately after upload

### 4.4 Completeness Engine

**Description**
- Rule: `all required items have at least 1 submission` => status `COMPLETE`
- Otherwise: `INCOMPLETE`
- If no submissions for N days AND reminders sent >= 2 times => `UNRESPONSIVE`

**Acceptance Criteria**
- Status refreshes within 3 seconds of each upload
- Dashboard status and detail views are consistent

### 4.5 Automated Reminders

**Description**
- Scheduled task runs daily (Supabase Edge Function / cron).
- Sends reminder emails to `INCOMPLETE` clients that haven't exceeded frequency limits.
- Email template includes a list of "still missing documents."

**Acceptance Criteria**
- Daily task execution is logged
- Maximum 1 reminder per client per 24 hours
- Reminder email includes missing items list and upload link

### 4.6 Collection Dashboard

**Description**
- List columns: client name, period, completion %, status, last submission time, next reminder time.
- Filter by status and sort by due date.
- Top summary cards: Complete / Incomplete / Unresponsive counts.

**Acceptance Criteria**
- First screen loads in < 2 seconds at 50-client scale
- Reflects latest upload status in real time

---

## 5. Information Architecture & Pages

| Route | Description | Auth |
|-------|-------------|------|
| `/auth/login` | Login page (Magic Link) | Public |
| `/dashboard` | Overview dashboard (status counts + client list) | Authenticated |
| `/clients` | Client list | Authenticated |
| `/clients/:id` | Client detail (checklist templates, historical cycles) | Authenticated |
| `/clients/:id/cycles/:cycleId` | Cycle detail (submissions, send reminders) | Authenticated |
| `/templates` | Checklist template management | Authenticated |
| `/upload/:token` | Client upload page (zero-login) | Public (token validated) |
| `/receipt/:token` | Submission receipt viewer (Phase 2) | Public (token validated) |

---

## 6. Data Model (Supabase/Postgres)

### 6.1 Core Tables

#### firms
| Column | Type | Description |
|--------|------|-------------|
| id | uuid, pk | |
| name | text | Firm name |
| owner_user_id | uuid, fk | |
| created_at | timestamptz | |

#### clients
| Column | Type | Description |
|--------|------|-------------|
| id | uuid, pk | |
| firm_id | uuid, fk | |
| name | text | Client name |
| email | text | |
| phone | text | |
| timezone | text | Default: America/New_York |
| status | text | active / archived |
| created_at | timestamptz | |

#### checklist_templates
| Column | Type | Description |
|--------|------|-------------|
| id | uuid, pk | |
| firm_id | uuid, fk | |
| name | text | Template name |
| created_at | timestamptz | |

#### checklist_items
| Column | Type | Description |
|--------|------|-------------|
| id | uuid, pk | |
| template_id | uuid, fk | |
| document_type | text | e.g., Bank Statement |
| required | bool | |
| sort_order | int | |

#### collection_cycles
| Column | Type | Description |
|--------|------|-------------|
| id | uuid, pk | |
| firm_id | uuid, fk | |
| client_id | uuid, fk | |
| template_id | uuid, fk | Source template |
| period_key | text | e.g., 2026-03 |
| due_date | date | |
| status | text | complete / incomplete / unresponsive |
| completion_ratio | numeric | 0.00 – 1.00 |
| client_confirmed_at | timestamptz | Client confirmation timestamp (Phase 1.5) |
| created_at | timestamptz | |

#### cycle_required_items
| Column | Type | Description |
|--------|------|-------------|
| id | uuid, pk | |
| cycle_id | uuid, fk | |
| document_type | text | |
| required | bool | |
| sort_order | int | |

#### upload_tokens
| Column | Type | Description |
|--------|------|-------------|
| id | uuid, pk | |
| cycle_id | uuid, fk | |
| token | text, unique | UUID v4 |
| expires_at | timestamptz | |
| is_revoked | bool | |
| created_at | timestamptz | |

#### submissions
| Column | Type | Description |
|--------|------|-------------|
| id | uuid, pk | |
| cycle_id | uuid, fk | |
| document_type | text | |
| file_path | text | Supabase Storage path |
| file_name | text | Original file name |
| file_size | bigint | Bytes |
| file_hash | text | SHA-256 (for Phase 2 audit) |
| uploaded_at | timestamptz | Server-side timestamp |
| source | text | link / email / manual |

#### reminder_logs
| Column | Type | Description |
|--------|------|-------------|
| id | uuid, pk | |
| cycle_id | uuid, fk | |
| channel | text | email / sms |
| status | text | sent / failed |
| escalation_level | int | 1=gentle, 2=reminder, 3=urgent |
| sent_at | timestamptz | |

#### intake_receipts (Phase 2)
| Column | Type | Description |
|--------|------|-------------|
| id | uuid, pk | |
| cycle_id | uuid, fk | |
| receipt_pdf_path | text | Supabase Storage path |
| client_ip | text | Client IP at confirmation |
| client_confirmed_at | timestamptz | |
| acknowledgment_text | text | Confirmation copy |
| created_at | timestamptz | |

---

## 7. API Specification (Next.js Route Handlers)

### 7.1 Phase 1 APIs

| Method | Route | Description | Auth |
|--------|-------|-------------|------|
| POST | `/api/clients` | Create client | Authenticated |
| GET | `/api/clients` | List clients | Authenticated |
| PUT | `/api/clients/:id` | Update client | Authenticated |
| POST | `/api/templates` | Create checklist template | Authenticated |
| GET | `/api/templates` | List templates | Authenticated |
| POST | `/api/cycles` | Create collection cycle (copy from template) | Authenticated |
| GET | `/api/cycles` | List cycles (supports status filter) | Authenticated |
| GET | `/api/cycles/:id` | Get cycle details | Authenticated |
| POST | `/api/cycles/:id/send-request` | Send request email and return upload link | Authenticated |
| GET | `/api/upload/:token` | Get upload page context | Public |
| POST | `/api/upload/:token/files` | Upload files | Public |
| POST | `/api/reminders/run` | Scheduled reminder task entry point | Cron |
| GET | `/api/dashboard/summary` | Dashboard summary stats | Authenticated |

### 7.2 Phase 2 APIs (Audit Receipts)

| Method | Route | Description | Auth |
|--------|-------|-------------|------|
| POST | `/api/cycles/:id/confirm` | Client confirms submission complete | Public (token) |
| POST | `/api/cycles/:id/receipt` | Generate and send receipt PDF | Authenticated |
| GET | `/api/receipt/:token` | View receipt | Public (token) |

---

## 8. Tech Stack

| Layer | Choice | Notes |
|-------|--------|-------|
| Frontend | Next.js 14 (App Router) | SSR + client-side interactivity |
| Hosting | Vercel | Start on free tier |
| Auth | Supabase Auth | Magic Link login |
| Database | Supabase (Postgres) | RLS row-level security |
| File Storage | Supabase Storage | Isolated by firm/client/cycle |
| Email | Resend or SendGrid | Transactional emails |
| SMS | Twilio (Phase 1.5) | SMS reminders |
| PDF Generation | pdf-lib (Phase 2) | Receipt generation |
| Payments | Stripe (post Phase 2) | Monthly subscriptions |
| Analytics | PostHog | Behavioral analytics |
| AI | OpenAI GPT-4o (Phase 3) | Document classification |

---

## 9. Analytics & Monitoring

### 9.1 Key Events
- `signup_completed` — Firm registration complete
- `client_created` — Client created
- `template_created` — Checklist template created
- `collection_request_sent` — Collection request sent
- `upload_link_opened` — Client opened upload link
- `file_uploaded` — File upload completed
- `cycle_completed` — Cycle fully collected
- `client_confirmed_complete` — Client confirmed submission complete
- `reminder_sent` — Reminder email/SMS sent
- `receipt_generated` — Receipt generated (Phase 2)
- `dashboard_return_visit` — Firm user return visit

### 9.2 Funnel Metrics
- Requests sent → Link open rate → First upload rate → Completion rate
- Firm signup → First client created → First request sent (activation)

### 9.3 Core KPIs (First 30 Days)
| Metric | Target |
|--------|--------|
| Activation Rate (first request sent within 48h of signup) | >= 40% |
| Upload Link Open Rate | >= 60% |
| Upload Completion Rate (cycle fully collected) | >= 35% |
| D7 Return Rate (firm users) | >= 25% |

---

## 10. Competitive Positioning

| Competitor | Price | Key Gap |
|------------|-------|---------|
| TaxDome | $50+/user/mo | Bloated all-in-one; clients must log into portal |
| Canopy | $100+/mo | Same portal friction; expensive |
| Karbon | $59+/user/mo | Targets mid-size teams; high complexity |
| Liscio | $50+/mo | Requires client app download |
| SmartVault | Mid-range | Storage only — no proactive collection or completeness tracking |
| FileInvite | Mid-range | Generic use case, not accounting-specific |
| Hubdoc (Xero) | — | Bank auto-fetch only, not document collection |

**DocFlow differentiation**: Zero-login + completeness signals + audit receipts + starting at $27/mo

---

## 11. Pricing Hypothesis

### Starter — $27/mo
- Up to 50 active clients
- Email reminders
- Collection dashboard
- 14-day free trial, no credit card required

### Pro — $49/mo
- Unlimited clients
- SMS reminders
- Audit receipt PDFs
- Escalating reminder cadence

### AI Add-on (Phase 3)
- $19 / 25 credits (per-PDF usage billing)
- Mixed PDF split + classification + manifest

### Path to Revenue
- 200 Starter users → $5,400 MRR
- Target channels: Reddit r/bookkeeping, Facebook bookkeeper groups, accounting forums
- Near-zero customer acquisition cost (content-driven)

---

## 12. Non-Functional Requirements

- All file transfers over HTTPS
- File storage isolated by `firm_id/client_id/cycle_id/` path
- Supabase RLS ensures row-level data isolation
- Upload tokens expire after 30 days by default
- Basic audit logging (who, when, what)
- Error monitoring: Sentry
- File size limit: 20MB per file, configurable

---

## 13. Iteration Plan

### Phase 1 — MVP Core (Week 1–2, 20h)
**Week 1 (12h)**
1. Supabase project init + Auth + RLS (2h)
2. Data model tables + CRUD APIs (3h)
3. Client management + checklist template UI (3h)
4. Token upload page + Storage upload (4h)

**Week 2 (8h)**
1. Completeness engine `recomputeCycleStatus(cycleId)` (2h)
2. Automated email reminder cron + templates (2h)
3. Dashboard + filtering (3h)
4. PostHog analytics integration (1h)

### Phase 1.5 — Reminder Enhancement (Week 3, 5h)
1. Twilio SMS integration (2h)
2. Escalating reminder cadence logic (1.5h)
3. Client "submission complete" confirmation button (1.5h)

### Phase 2 — Audit Receipts (Week 4–5, 8h)
1. File hash computation + intake_receipts table (2h)
2. PDF receipt generation (pdf-lib) (3h)
3. Receipt delivery + viewer page (2h)
4. Audit log query page (1h)

### Phase 3 — AI Document Classification (Requires Smoke Test validation, 15h)
1. PDF text extraction (pdf-parse) (2h)
2. GPT-4o classification prompt + batch processing (4h)
3. PDF split + ZIP bundle download (3h)
4. Credit billing system (3h)
5. Classification results UI + manifest download (3h)

---

## 14. Launch & Validation Checklist

### Phase 1 Go-Live Criteria
- [ ] Can create 10 test clients and batch-send requests
- [ ] Clients can upload via mobile browser and see success feedback
- [ ] Dashboard reflects Complete / Incomplete / Unresponsive in real time
- [ ] Daily reminder task executes successfully with logs
- [ ] Manual beta test with 5–10 target users

### Phase 2 Go-Live Criteria
- [ ] PDF receipt generated on cycle close
- [ ] Receipt contains file names, timestamps, and file hashes
- [ ] Client confirmation records IP + timestamp
- [ ] Both parties receive receipt email

---

## 15. Stripe Integration Trigger Conditions

Proceed with payment integration when ANY of the following is met:
1. Target users actively return for 2 consecutive weeks (>= 2 sessions/week)
2. At least 5 firm users complete the full loop: send request → client uploads → cycle complete
3. At least 3 users express explicit willingness to pay monthly

---

## 16. Risks & Mitigations

| Risk | Impact | Mitigation |
|------|--------|------------|
| Low email deliverability | Clients don't receive upload links | SMS fallback (Phase 1.5); use professional email service |
| File security compliance | Client tax documents contain PII | HTTPS + storage isolation + token expiration |
| Competitors add zero-login | Large platforms close the gap | Speed advantage + small-team pricing focus + audit receipt differentiation |
| AI classification accuracy | Misidentified documents erode trust | Phase 3 deferred; validate core collection first; human review fallback |
