Private Beta · Validating models with 20 finance teams.
HomeBlogYour Invoices Have Bank Account Numbers in Them. Let's Talk About That.
Security & Privacy8 min read

Your Invoices Have Bank Account Numbers in Them. Let's Talk About That.

F

Founder

Domain Architect, Finance B2B Operations · 2026-03-28

A few months ago, a friend who runs a 50-person logistics company asked if he could try ScribeArc. Before I could even show him the demo, his first question was: "Where does my data go?"

Not "how accurate is it?" Not "how fast is it?" Where does my data go.

I liked that question. It's the right question, and it's the one most vendors try to rush past with a compliance logo wall and a vague mention of "enterprise-grade security." So let me actually answer it.

What's in a financial document, really

When I worked in finance ops, I handled thousands of invoices, bank statements, and vendor contracts. After a while, you stop seeing them as documents and start seeing them as bundles of sensitive data:

Bank account numbers and routing information - enough to initiate fraudulent transfers

Tax IDs (GST numbers in India, VAT IDs in the EU, EINs in the US) - useful for identity fraud

Revenue figures, pricing, and contract terms - competitive intelligence gold

Vendor and customer relationships - who you buy from, who buys from you, at what prices

A breach of this data isn't just embarrassing. It's an operational and legal catastrophe. And the attack surface increases the moment you add AI, because now there's a model processing all of this data, generating logs, and potentially retaining patterns.

Our hard lines

Some of our security decisions are philosophical commitments, not just technical choices. Here are the ones we've drawn hard lines on:

We will never train models on customer data. Full stop. Our AI models are trained on anonymized, synthetic, and licensed datasets. Your invoices are used for inference - meaning the model reads them to extract data - and nothing else. This is in our Privacy Policy today, not a future plan.

Customer data is for your benefit, not ours. We don't aggregate customer data for analytics, benchmarking, or product improvement without explicit, per-customer consent. The fact that your AP department processes a lot of invoices from a particular vendor is your business information, not ours to monetize.

We default to less data, not more. Extracted data points get stored. Original documents get retained based on your retention policy, not ours. Logs are scrubbed of PII before they hit storage.

What we've built and what we're building

I want to be clear about what's in production versus what's on our roadmap, because I think the fintech industry has a bad habit of blurring that line.

In production now:

Encryption at rest (AES-256) and in transit (TLS 1.3) for all data

Model training exclusively on non-customer datasets

PII-scrubbed logging

On our roadmap (designed, not yet deployed):

Full tenant isolation - each customer's data in logically separate infrastructure, so cross-tenant access is architecturally impossible

Customer-managed encryption keys (CMEK) for enterprise accounts

Data residency controls - letting customers specify where their data is stored, which matters a lot for EU companies under GDPR and increasingly for APAC businesses too

Automatic PII detection and redaction in analytics and monitoring

Real-time anomaly detection on access patterns

Prompt injection protections as we integrate more LLM capabilities

On compliance

We're actively working toward SOC 2 Type II and GDPR alignment. We're not certified yet - we're pre-launch, and these certifications require operational track record. I'd rather be honest about that than slap a "SOC 2 Compliant" badge on our site prematurely.

What I will say: we're building the controls, audit trails, and documentation from Day 1 so that when we pursue certification, it's a validation of what we already do, not a retrofit.

The opinion nobody asked for

Here's my slightly tired take after fifteen years in this space: most fintech security pages are theater. Long lists of compliance certifications, stock photos of padlocks, and paragraphs about "military-grade encryption." Meanwhile, the actual question - "what happens to my data, who can see it, and how do I get it back if I leave?" - gets buried in a legal document nobody reads.

We're trying to do this differently. Not because we're more virtuous, but because the founders of this company have personally handled the kind of sensitive financial data that would be catastrophic if leaked. We know what's at stake because we've been on the other side of the table.

ScribeArc is in private beta. Security features noted as "on our roadmap" are designed but not yet deployed.