Private Beta · Validating models with 20 finance teams.
HomeBlogI've Processed Thousands of Invoices by Hand. Here's What AI Actually Gets Right (and Wrong).
AI & Technology7 min read

I've Processed Thousands of Invoices by Hand. Here's What AI Actually Gets Right (and Wrong).

F

Founder

Domain Architect, Finance B2B · 2026-05-06

Last year I watched a senior AP clerk spend twenty minutes on a single invoice from a logistics vendor in Mumbai. The PDF was a scan of a scan - tilted, with a coffee ring over the GST number. She squinted, cross-referenced the vendor master in NetSuite, manually typed the line items, and moved on to the next one. She'd been doing this for nine years.

That moment stuck with me because it captures everything that's wrong with how finance teams still handle documents, and everything that's hard about fixing it.

The problem isn't what you think it is

Most pitches for document automation focus on speed. "Process invoices 10x faster!" Sure, speed matters. But in the AP teams I've worked with across Singapore, Dubai, and London, the real pain was never just slowness - it was the cascading mess that follows from bad data entry.

One team I advised had duplicate vendor records for the same supplier under three slightly different names. Their old OCR system read "1,000.00" as "100000" on a freight invoice, and nobody caught it until month-end reconciliation. That's a $99,000 error sitting in the ledger for weeks. Speed doesn't help if the extraction is garbage.

According to [Ardent Partners' 2025 AP Metrics That Matter report](https://ardentpartners.com), the average invoice exception rate across organizations is 22%. Best-in-class teams get that down to about 9%. That gap isn't really about technology - it's about how well the technology understands context.

What's actually different now

I'll be honest: I was skeptical of the latest generation of AI document processing. I've lived through the rule-based OCR era, the template-matching era, and the "machine learning" era that was mostly just rebranded templates. But a few things genuinely are different this time.

Modern systems use large language models that can read a document the way a person does - understanding that the number next to "Total Due" is probably the invoice total, even if the layout is completely different from the last vendor's format. Computer vision handles the spatial reasoning: where's the header, where are the line items, is that a table or just text that happens to be aligned.

The combination means you can throw a handwritten receipt from a street vendor in Jakarta and a 40-page contract from a London law firm at the same system, and it actually knows what to do with both. That wasn't true three years ago.

Where it still falls apart

Here's my contrarian take: most AI document processing vendors oversell accuracy and undersell the importance of the human-in-the-loop workflow. The extraction might be right 90%+ of the time, but in finance, the remaining cases are where the real risk lives. A misread tax ID. A currency conversion that the model guessed on. A line item that got merged with the one below it.

The useful systems aren't the ones that claim near-perfect accuracy - they're the ones that know when they're uncertain and route those documents for review with the AI's best guess pre-filled. Confidence scoring matters more than headline accuracy numbers.

What we're actually doing with this at ScribeArc

We're building ScribeArc around the idea that extraction is maybe 30% of the problem. The other 70% is what happens after: three-way matching against POs, GL coding, accrual entries at month-end, vendor master cleanup. We're designing the system so that when an invoice comes in, it doesn't just get "processed" - it gets validated against your actual business logic, coded to the right accounts, and routed to the right approver based on your rules.

We're in private beta right now, so I won't quote accuracy numbers we haven't properly measured yet. What I will say is that we're spending more engineering time on the exception-handling workflow than on the extraction model itself. That feels like the right priority.

ScribeArc is in private beta. Numbers cited about our product are targets, not measured results.