Claude Vision for Document Extraction — Invoices, IDs, Forms Read in 4 Seconds · Blog

Your team drowns in documents: invoices with vendor logos crammed in the margins, driver's licenses at weird angles, handwritten forms that Tesseract wouldn't touch. You've probably tried Textract (AWS's OCR). It's good—but it's also $25/month minimum plus $0.15 per page, with a three-month onboarding loop and a vendor lock-in smell that never quite washes out.

Velocity X's document extraction uses Claude Vision instead. Same-day setup. $0.005 per page. Handles rotated images, terrible lighting, handwriting, mixed-language documents. Returns structured JSON you can drop straight into your database—no post-processing regex nightmare.

Why Textract Fails on Real-World Docs

Textract is trained on clean, well-scanned documents. Real invoices are phone photos under fluorescent lights. ID cards get snapped at 45 degrees. Forms come in folded or stained. Textract's confidence scores plummet, and you end up building a whole triage pipeline: "medium confidence → human review → downstream error correction."

Claude Vision doesn't just read text—it understands context. It sees that a number at the top of an invoice is the invoice ID, not a date. It recognises a driver's license even if it's partially obscured. It handles cursive handwriting with 70%+ accuracy. Most importantly: it fails gracefully. If it can't read something, it says so explicitly rather than hallucinating a number.

The Architecture: Upload → Base64 → Vision + JSON Schema → Structured Output

The flow is simple. A document lands in Velocity X—uploaded from a mobile camera, a form submission, or a batch import. An edge function encodes it to base64 and sends it to Claude Vision (Sonnet, the fast model) with a structured prompt.

The prompt is the secret. Instead of "extract all text", you tell Claude exactly what you want—and more importantly, what shape it should return. For an invoice, that's:

{`{
  "vendor": "Acme Supply Co",
  "invoice_id": "INV-2026-45782",
  "date": "2026-06-10",
  "due_date": "2026-07-10",
  "total": 1250.50,
  "line_items": [
    {
      "description": "Widget pack (x100)",
      "qty": 100,
      "unit_price": 12.50,
      "amount": 1250.00
    }
  ],
  "confidence": "high",
  "warnings": []
}`}

Claude returns this JSON directly. No intermediate plain-text file to parse. No regex nightmares. Insert it straight into a Supabase table with a simple Zod schema validation. The edge function logs confidence levels and any warnings (e.g., "unable to read due_date" or "document appears rotated")—that's your triage flag.

Document Types: Invoices, IDs, Forms, Receipts, Multi-Page PDFs

Invoices. Vendor name, invoice ID, date, due date, line items (description, qty, unit price, total), tax, grand total, payment terms. Works on printed invoices, email attachments, PDF scans, phone photos.

ID documents. Driver's licenses, passports, work permits. Extract name, DOB, ID number, expiry date, issuing country. Claude Vision handles multiple ID formats (EU driver's licenses look nothing like US ones) without retraining. Includes a orientation_corrected: true flag if it had to rotate the image internally.

Forms. Application forms, claim forms, intake questionnaires. Extract field labels and values, preserving form structure. If a checkbox was ticked, "checkbox_field": true. If a date field was left blank, "date_field": null.

Multi-page PDFs. Send the whole PDF as base64 (Claude Vision handles PDF encoding natively). If the document spans pages, Claude extracts all of it and returns a single unified JSON with a "page_count": 3 flag so you know the source wasn't single-page.

Failure Modes and Graceful Degradation

Dark or blurry images. Claude returns what it can extract and sets "confidence": "low". The lead processor sees a flag and decides whether to accept it or ask the user to recapture. No silent failures.

Handwritten documents. Clean handwriting: 70–80% accuracy. Messy handwriting: 40–60%. Claude explicitly returns "handwriting_detected": true so you know to expect gaps. Velocity X flags these for human review, which takes 30 seconds and costs way less than a rescan.

Rotated or skewed pages. Claude auto-rotates. If it had to correct orientation, the response includes "orientation_corrected": true and an "original_angle": 270 for logging.

Mixed language documents. Invoices in Spanish, IDs in Arabic, forms in Chinese—Claude handles UTF-8 seamlessly. Supabase stores it. No charset surprises.

Cost Economics: $0.005 Per Page, Not $0.15

Claude Vision (Sonnet) costs $3 per million input tokens, roughly $0.005 per document page. Compare:

Textract: $25 minimum + $0.15/page = $500/month for 3,000 pages.

Claude Vision: $0.005/page = $15/month for 3,000 pages. Add your Supabase + Netlify infra, still under $50/month for the whole pipeline.

Even if you add 10% for storage and processing overhead, you're 10× cheaper than Textract. And you own your data—no closed-source vendor API black box.

Frequently Asked Questions

What if a document is partially obscured or damaged?

Claude extracts what's visible and returns "missing_fields": ["due_date", "payment_terms"]. Supabase stores the partial record. Velocity X flags it for review. User can then manually fill gaps or request a recapture. Partial data is better than no data.

Can it extract from photos taken at bad angles?

Yes—with a confidence hit. If you snap a receipt at 45 degrees under a desk lamp, Claude returns what it can read and logs "confidence": "medium". Recapture if needed, but it usually works.

Does it work on PDFs?

Yes. Send the PDF as base64. Claude reads all pages in one call and returns unified JSON with a "page_count" field. Multi-page invoices with line items spread across pages still extract cleanly.

What about handwritten signatures or dates?

Signatures are skipped (returned as "signature_detected": true but not extracted—no need). Handwritten dates are attempted with 60–70% accuracy. Velocity X flags uncertain dates for manual review.

Can it extract tables from forms?

Yes. If a form has a table of items, Claude preserves the structure and returns it as a JSON array of objects. E.g., a multi-item claim form returns "claim_items": [{ "date": "...", "amount": ... }, ...].

Is there a rate limit?

Claude has standard API rate limits (depends on your plan), not per-page fees. Velocity X batches extraction jobs—10,000 pages run in parallel if needed, no daily caps. Textract would require separate quota requests and cost $1,500+.

Does it work offline?

No—it requires an API call to Anthropic. But Velocity X queues uploads and batches them if connection drops. Users see "processing offline" until sync resumes. Suitable for field teams with spotty coverage.

The Bottom Line

Claude Vision doesn't try to be Textract—it's simpler and cheaper. If your documents are messy, handwritten, rotated, or multi-language, Claude Vision handles them better AND faster. Invoices extract in 3 seconds. IDs in 2. Forms in 4–6 depending on complexity. At $0.005/page, the math is brutal for Textract: you'd need 10,000 pages a month for Textract to even make sense. For field teams processing 200–500 documents daily, Claude Vision is the obvious choice. See the extraction flow at /pricing, or read how we use Vision for business cards at Business Card OCR.

AI Features