Building ScribeArc: The Engineering Decisions That Shaped Our Platform
Tushar Naresh
Co-Founder, ScribeArc · 2026-04-12
Building an AI-powered finance platform isn't just about training the best model or designing the prettiest dashboard. It's about making hundreds of architectural decisions that balance performance, reliability, cost, and user experience. Here's a look inside our engineering process.
Decision 1: Event-Driven vs. Request-Response
Early on, we faced a fundamental architectural question: should document processing be synchronous (request-response) or asynchronous (event-driven)?
The answer was clearly event-driven, for several reasons:
Document processing is inherently variable: A simple one-page invoice might take 2 seconds; a 50-page contract could take 30 seconds. Synchronous processing would mean timeouts and poor UX.
Workload is bursty: Finance teams often upload batches of documents at month-end. Event queues absorb these bursts gracefully.
Failure isolation: If one processing step fails, the document can be retried without affecting other operations.
We built our pipeline on AWS SQS and Lambda, with S3 as the document store. Each document upload triggers a chain of processing events: ingestion → classification → extraction → validation → workflow routing.
Decision 2: Multi-Model Architecture
Rather than relying on a single AI model for everything, we built a multi-model architecture:
Classification Model: Determines document type (invoice, PO, receipt, statement) with 99.2% accuracy
Extraction Models: Specialized models for each document type, fine-tuned on real financial documents
Validation Model: Cross-references extracted data against business rules and historical patterns
Anomaly Detection: Identifies unusual patterns that might indicate errors or fraud
This approach gives us better accuracy than a monolithic model, and allows us to update individual models without retraining the entire system.
Decision 3: Confidence-Based Routing
Not every document needs human review. Not every document can be fully automated. Our confidence-based routing system creates a spectrum:
>95% confidence: Auto-processed, no human touch required
80–95% confidence: Auto-processed with async human audit (spot-checked)
<80% confidence: Routed for human review with AI-suggested values pre-filled
This approach maximizes throughput while maintaining the accuracy standards that financial data demands.
Decision 4: Privacy-First Architecture
Financial documents contain sensitive data — bank account numbers, tax IDs, revenue figures. Our privacy architecture includes:
Encryption at rest and in transit: using AES-256 and TLS 1.3
Tenant isolation: ensuring one customer's data never touches another's infrastructure
Data residency controls: allowing customers to specify where their data is stored
Automatic PII detection and masking: for logs and analytics
What We'd Do Differently
No system is perfect, and hindsight is valuable. If we were starting over:
1. We'd invest more in observability from Day 1. Our early monitoring was insufficient for debugging complex multi-model pipelines.
2. We'd adopt a more aggressive caching strategy for frequently accessed document templates.
3. We'd build our internal tooling earlier — the admin dashboard and model evaluation tools that make our engineering team more productive.
Building in public means being honest about these lessons. We hope sharing our journey helps other teams building in this space.