Typical output schema
Schemas are defined per project — line items, header fields, signatures, and footnotes can each map to their own columns or nested JSON.
Invoices, statements, reports, and scanned archives — extracted as structured CSV, Excel, or JSON with a defined schema. We handle mixed templates, multi-page tables, and OCR for scanned documents.
Schemas are defined per project — line items, header fields, signatures, and footnotes can each map to their own columns or nested JSON.
We've extracted from regulatory filings, multi-vendor invoice piles, decades-old scanned archives, and AI-generated reports. Mixed quality is the norm — we plan for it.
Drop 3–10 representative PDFs and a target schema. Mixed templates, scanned pages, and multi-language documents are fine.
Within 1–3 business days you get a CSV or JSON with the parsed fields. We flag low-confidence rows so you can spot-check before approval.
Send PDFs by email, S3, SFTP, or API webhook. Output goes back in the same format on a schedule, with confidence scores and validation rules per field.
Yes. We detect tabular regions, reconstruct row and column boundaries, and output one row per line item with consistent columns across documents. Multi-page tables are stitched together automatically.
Yes. We run OCR (English plus most European and CJK scripts) before structured parsing. Output quality depends on scan resolution — we recommend 300 DPI or better for production use.
Our pipeline does not require a fixed template. Field detection is driven by layout cues plus a schema you define (e.g. "invoice number", "line items", "total"). Edge cases get sent to human review on paid plans.
Files are processed in an isolated environment, encrypted at rest, and deleted after the agreed retention window. We can sign an NDA before sample delivery. We don't use client documents for model training.
All four. Pick whichever fits your downstream pipeline. CSV and Excel are the default for finance and operations teams; JSON or direct Postgres / BigQuery loads are common for engineering teams.
© 2026 VSTOCK LIMITED. All rights reserved.
Built for data-driven teams worldwide.