Private Beta · Validating models with 20 finance teams.
HomeBlogFour Architecture Decisions We've Made Building ScribeArc (and One We're Still Arguing About)
Engineering9 min read

Four Architecture Decisions We've Made Building ScribeArc (and One We're Still Arguing About)

F

Founder

Domain Architect, Finance B2B · 2026-04-12

My co-founder and I have been going back and forth for weeks about caching strategy, and honestly, I'm not sure we've landed on the right answer yet. That's probably a weird way to start an engineering blog post - most of these read like the decisions were obvious in hindsight. Ours weren't.

We're building ScribeArc as an intelligent document processing platform for finance teams. We're in private beta, which means we're still making foundational architecture calls that we'll live with for years. Here are four decisions we've committed to, one we haven't, and why.

Async-first, event-driven processing

This was actually the easiest call. Document processing is inherently variable - a one-page invoice from Xero might take two seconds to parse; a 50-page vendor contract with nested tables could take thirty. If we'd built a synchronous request-response system, half of our requests would time out during month-end when finance teams upload batches of 200+ documents.

We designed around AWS SQS and Lambda, with S3 as the document store. Each upload triggers a chain: ingestion, classification, extraction, validation, workflow routing. If one step fails - say, the extraction model chokes on a particularly ugly scan - the document gets retried without blocking everything else in the queue.

The trade-off is complexity. Debugging an async pipeline is harder than debugging a request-response endpoint. We've invested heavily in observability from the start (structured logging, trace IDs across services, dead-letter queues for failed documents) because we learned from past projects that you either build this on Day 1 or you build it in a panic at 2 AM on Day 90.

Specialized models, not one monolith

We're using separate models for classification (is this an invoice, a PO, a receipt, a bank statement?), extraction (pull out the vendor name, amounts, dates, line items), and validation (does this extracted data make sense given the business rules?).

The alternative was a single end-to-end model. Simpler to deploy, but worse at everything. In my experience working with finance documents across APAC and EMEA, the variation is enormous. A GST invoice from India looks nothing like a VAT invoice from Germany, which looks nothing like a commercial invoice from a freight forwarder in Dubai. Specialized models let us fine-tune each stage independently and swap out one component without retraining the whole system.

We're planning to add an anomaly detection model later - something that flags unusual patterns that might indicate duplicate payments or fraud. That's on the roadmap, not in production.

Confidence-based routing with tunable thresholds

Not every document needs human review. Not every document can be fully automated. We're designing a confidence spectrum:

Above 95% confidence: auto-processed, no human touch

80-95%: auto-processed but flagged for async spot-check

Below 80%: routed for human review with AI-suggested values pre-filled

The thresholds are the part we're still tuning. We'll adjust them during private beta based on what our design partners' teams are comfortable with. A two-person finance team at a 30-person company will have a different risk tolerance than a ten-person AP team at a 200-person manufacturer.

My opinion: most vendors set their auto-approval threshold too high because it makes their accuracy numbers look better in demos. We'd rather route a few more documents for review than let a bad extraction slip into someone's GL.

Privacy-first, tenant-isolated infrastructure

Financial documents contain bank account numbers, tax IDs, revenue figures. This isn't "nice to have" security - a breach would be catastrophic for our customers.

Our commitments, in effect today: encryption at rest (AES-256) and in transit (TLS 1.3), AI models trained only on anonymized and synthetic datasets (never on customer documents), and customer data used strictly for inference. This is already in our Privacy Policy.

What we're building toward: full tenant isolation so one customer's data never touches another's infrastructure, customer-managed encryption keys for enterprise accounts, data residency controls (important for EU customers dealing with GDPR, and increasingly for APAC customers too), and automatic PII masking in logs and analytics. These are on the roadmap and not yet in production.

The thing we're still arguing about

Caching. Specifically: how aggressively to cache document templates and extraction patterns.

The case for aggressive caching: if we've seen 500 invoices from the same vendor, we should be able to process the 501st nearly instantly by recognizing the template.

The case against: financial documents change. Vendors update their invoice formats. Tax rates change. A cached template that's slightly stale could cause systematic extraction errors across hundreds of documents before anyone notices.

We're leaning toward a hybrid - cache the layout recognition but re-extract the actual values every time - but we haven't committed. If you've solved this problem and want to tell me I'm wrong, I'd genuinely like to hear it.

ScribeArc is in private beta. The architecture described here is how we're building the platform today. Some components noted as planned are on our roadmap and not yet in production.