The Architecture Behind Qvault

Why your documents never leave your machine — a deep dive into local-first PII detection and redaction.

Contents

  1. The Local-First Architecture
  2. Dual-Layer PII Detection Engine
  3. Document Processing Pipeline
  4. Frontend Rendering Pipeline
  5. Performance Characteristics
  6. What Qvault Does NOT Do
  7. Cross-Platform Distribution
  8. Jurisdictional Coverage
  9. The Trust Model
0
Bytes sent to cloud
5
Jurisdictions covered
18
IPC commands
<100ms
Typical scan time

Legal professionals handle some of the most sensitive information in existence: Social Security numbers, financial records, personal addresses, medical data, and confidential business details. Every day, law firms process thousands of documents containing personally identifiable information (PII) that, if exposed, could harm clients and violate regulations like GDPR, CCPA, and LGPD.

Most document processing tools require uploading files to a cloud server. For a law firm, that means client data — privileged, confidential, protected by attorney-client privilege — leaves the machine and travels across the internet to be processed on someone else's infrastructure.

Qvault takes a fundamentally different approach. Every byte of every document stays on your computer. There is no cloud processing, no data upload, no server-side analysis. The entire PII detection and redaction pipeline runs locally, inside a native desktop application built with Rust.

The Local-First Architecture

Qvault is built on Tauri v2, a framework that pairs a Rust backend with a lightweight web frontend. Unlike Electron (which bundles an entire Chromium browser), Tauri uses the operating system's native webview — WebKit on macOS, WebView2 on Windows, WebKitGTK on Linux. The result is a binary that's a fraction of the size of an Electron app, with lower memory usage and faster startup times.

Why Rust?

The frontend is React 18 with TypeScript and Tailwind CSS, rendered inside the native webview. The two layers communicate through Tauri's IPC bridge — a type-safe, serialized message-passing system that connects JavaScript function calls to Rust command handlers.

The IPC Bridge

Qvault exposes 18 IPC commands that the frontend can invoke. These are not REST endpoints or WebSocket messages — they're direct function calls serialized through Tauri's command system, with no network stack involved.

Every command runs in the same process as the application. There's no HTTP overhead, no serialization to JSON over a wire, no latency from network hops. A scan command completes in single-digit milliseconds locally.

Dual-Layer PII Detection Engine

The core of Qvault is its PII detection pipeline, which runs two complementary scanners in sequence.

Layer 1: Pattern-Based Regex Scanner

The first layer uses compiled regular expressions to detect structured PII — data that follows a known format. The scanner contains patterns organized across five jurisdictions:

Global patterns cover universally formatted data: email addresses, credit card numbers (Visa, Mastercard, Amex, Diners Club with Luhn-compatible digit groups), IBAN numbers, IPv4 addresses, URLs, monetary amounts (multi-currency: USD, EUR, GBP, BRL), dates, and phone numbers.

Regional patterns include US Social Security Numbers and EINs, EU VAT identification numbers, Brazilian CPF and CNPJ tax IDs, and German Steuernummer formats.

Every regex pattern is compiled once at initialization and reused. Rust's regex crate compiles patterns to a highly optimized finite automaton — there's no backtracking, no catastrophic regex performance. Each match is assigned a confidence score of 0.95.

The scanner implements overlap prevention: when a high-priority pattern matches a region of text, lower-priority patterns skip that region entirely.

Layer 2: Heuristic Context Scanner

Structured patterns only catch PII that follows a predictable format. But some of the most sensitive information in legal documents — people's names and company names — doesn't follow any fixed pattern.

The heuristic scanner runs six detection passes:

  1. Company names with legal suffixes. Capitalized word sequences followed by entity types (LLC, Inc, Corp, GmbH, SA, SARL, and 11 others).
  2. Names from distribution tables. Shareholder/ownership tables formatted as "Entity Name — XX.XXXX%".
  3. Names from entity descriptions. Context clues like "...of Meridian Partners, a Delaware limited liability company..."
  4. Person names with titles. Capitalized name sequences followed by roles (Manager, Director, Officer). Supports international name connectors: de, da, do, van, von, del, di, el, la, le.
  5. Person names in distribution contexts. Individual names in percentage-distribution tables.
  6. General capitalized sequences. Consecutive capitalized words evaluated with contextual signals (nearby terms like "Member", "Shareholder", "signed by").

To avoid false positives, the scanner maintains 25 stop phrases and 71 stop words covering common legal terminology. Confidence scores range from 0.75 to 0.92 depending on contextual evidence.

Why two layers? Regex alone can't detect names. Machine learning is slow and often requires cloud inference. Qvault's dual-layer approach combines the precision of pattern matching with the flexibility of contextual analysis — all running locally, all in milliseconds.

Document Processing Pipeline

Upload
Store
Extract Text
Scan PII
Review
Export

Extraction

PDF extraction uses lopdf, a pure-Rust PDF library that parses the PDF's internal object tree and extracts text per page. On the frontend, PDF.js independently extracts text spans with precise coordinate data — position, width, and height for every word. This dual extraction provides both raw text (for scanning) and spatial layout (for overlay rendering and export).

DOCX extraction treats the file as what it is: a ZIP archive containing XML. Qvault opens the archive, locates word/document.xml, and parses it with quick-xml, collecting text from paragraph and run elements.

Storage

All metadata lives in a local SQLite database running in WAL (Write-Ahead Logging) mode for concurrent read/write performance. The schema tracks documents, redactions, entities, page text, audit logs, and credits.

The entities table is noteworthy: when Qvault detects a name in one document, it normalizes and stores that entity. When processing subsequent documents, the system has a growing knowledge base of known PII. This cross-document intelligence means detection accuracy improves over time without any cloud-based learning.

Frontend Rendering Pipeline

Rendering a PDF with redaction overlays in real time is a non-trivial challenge. Qvault's DocumentViewer manages a multi-layer rendering stack:

  1. PDF canvas layer. PDF.js renders each page to an HTML5 canvas at 1.5x viewport scale for crisp text on high-DPI displays.
  2. Text span extraction. Every text fragment is extracted with position data in both PDF and viewport coordinate systems.
  3. Redaction overlay layer. PII detections are mapped from character offsets to spatial coordinates. Color-coded overlays indicate PII category and review status.
  4. Selection layer. In advanced mode, users can select arbitrary text to create manual redactions with a popup interface.
  5. Export coordinate mapping. Viewport coordinates are translated back to PDF space. The Rust backend injects black rectangle operators into the PDF's content streams — permanently redacting the selected regions.

This entire pipeline runs locally. The PDF never leaves the app. The coordinates never leave the app. The redacted export is written directly to the local filesystem.

Performance Characteristics

Factor How
Compiled regex Deterministic finite automata — linear-time matching, no backtracking. A 50–100 page document scans in under 100ms.
Single-process IPC Tauri commands are in-process function calls. Serialization overhead is microseconds, not milliseconds.
SQLite WAL mode Concurrent readers and writers. The frontend queries redaction data while the backend processes new pages.
Streaming processing Pages scanned individually, results stored incrementally. Users see detections in real time.
Native webview ~15–20 MB installed size (vs 150+ MB for Electron). Sub-second startup.
Zero network latency No upload, no download, no waiting. Processing is purely CPU-bound, measured in milliseconds.

What Qvault Does NOT Do

Understanding what Qvault avoids is as important as understanding what it does:

Cross-Platform Distribution

Qvault runs natively on macOS (Apple Silicon and Intel), Windows, Linux, iOS, and Android:

Platform Formats Notes
macOS .dmg Separate builds for Apple Silicon (ARM64) and Intel (x86_64)
Windows .exe, .msi NSIS installer with multi-language support (EN/ES/PT)
Linux .deb, .AppImage Debian/Ubuntu packages and universal AppImage
iOS .ipa iPhone, iPad — touch-optimized UI with bottom tab navigation
Android .apk Phones and tablets — responsive layout with document picker

All desktop builds are produced by a single GitHub Actions workflow using a matrix strategy. Mobile builds use Tauri v2's native iOS (Xcode) and Android (Gradle) toolchains, sharing the same Rust backend and React frontend.

Enterprise Integration

For organizations requiring enhanced PII detection beyond regex and heuristic analysis, Qvault offers enterprise integration options:

Enterprise integration is available through Santacroce SL. Contact info@santacroce.es to discuss your requirements.

Jurisdictional Coverage

Legal work is increasingly international. A firm in Madrid might handle documents containing US Social Security Numbers, Brazilian CPFs, and German tax IDs — all in the same case.

Jurisdiction PII Types Detected
Global Email, credit cards, IBAN, IP addresses, URLs, monetary amounts, dates, phone numbers
United States Social Security Numbers (SSN), Employer Identification Numbers (EIN)
European Union VAT identification numbers (multi-country format)
Brazil CPF (individual tax ID), CNPJ (corporate tax ID)
Germany Steuernummer (tax identification number)

Combined with the heuristic scanner's ability to detect names and company entities regardless of language, Qvault provides broad coverage for international legal practices.

The Trust Model

Qvault's security model is simple: trust the machine, distrust the network.

For legal professionals operating under strict confidentiality obligations, this model provides something cloud tools fundamentally cannot: certainty that client data never left the building.

Qvault is developed by Santacroce SL, Madrid, Spain.
Learn more at qvault.tech.