HomeBlogBuilding ScribeArc: The Engineering Decisions That Shaped Our Platform
Engineering9 min read

Building ScribeArc: The Engineering Decisions That Shaped Our Platform

TN

Tushar Naresh

Co-Founder, ScribeArc · 2026-04-12

Building an AI-powered finance platform isn't just about training the best model or designing the prettiest dashboard. It's about making hundreds of architectural decisions that balance performance, reliability, cost, and user experience. Here's a look inside our engineering process.

Decision 1: Event-Driven vs. Request-Response

Early on, we faced a fundamental architectural question: should document processing be synchronous (request-response) or asynchronous (event-driven)?

The answer was clearly event-driven, for several reasons:

Document processing is inherently variable: A simple one-page invoice might take 2 seconds; a 50-page contract could take 30 seconds. Synchronous processing would mean timeouts and poor UX.

Workload is bursty: Finance teams often upload batches of documents at month-end. Event queues absorb these bursts gracefully.

Failure isolation: If one processing step fails, the document can be retried without affecting other operations.

We built our pipeline on AWS SQS and Lambda, with S3 as the document store. Each document upload triggers a chain of processing events: ingestion → classification → extraction → validation → workflow routing.

Decision 2: Multi-Model Architecture

Rather than relying on a single AI model for everything, we built a multi-model architecture:

Classification Model: Determines document type (invoice, PO, receipt, statement) with 99.2% accuracy

Extraction Models: Specialized models for each document type, fine-tuned on real financial documents

Validation Model: Cross-references extracted data against business rules and historical patterns

Anomaly Detection: Identifies unusual patterns that might indicate errors or fraud

This approach gives us better accuracy than a monolithic model, and allows us to update individual models without retraining the entire system.

Decision 3: Confidence-Based Routing

Not every document needs human review. Not every document can be fully automated. Our confidence-based routing system creates a spectrum:

>95% confidence: Auto-processed, no human touch required

80–95% confidence: Auto-processed with async human audit (spot-checked)

<80% confidence: Routed for human review with AI-suggested values pre-filled

This approach maximizes throughput while maintaining the accuracy standards that financial data demands.

Decision 4: Privacy-First Architecture

Financial documents contain sensitive data — bank account numbers, tax IDs, revenue figures. Our privacy architecture includes:

Encryption at rest and in transit: using AES-256 and TLS 1.3

Tenant isolation: ensuring one customer's data never touches another's infrastructure

Data residency controls: allowing customers to specify where their data is stored

Automatic PII detection and masking: for logs and analytics

What We'd Do Differently

No system is perfect, and hindsight is valuable. If we were starting over:

1. We'd invest more in observability from Day 1. Our early monitoring was insufficient for debugging complex multi-model pipelines.

2. We'd adopt a more aggressive caching strategy for frequently accessed document templates.

3. We'd build our internal tooling earlier — the admin dashboard and model evaluation tools that make our engineering team more productive.

Building in public means being honest about these lessons. We hope sharing our journey helps other teams building in this space.