We Built a PDF Tool That Works Offline. Here's What We Learned.
Offline-first sounds obvious until you hit browser memory ceilings, worker orchestration edge cases, and PDF internals that were never designed for your happy path.
In simple terms
The web can absolutely run serious document processing now. But you only get good results if you treat architecture, UX, and failure handling as one system.
Lesson 1: 'No uploads' is an architecture commitment, not a feature
Early on, we thought "no uploads" was mostly a product decision. It turned out to be a deep architectural constraint touching everything: library choice, UI flow, memory strategy, error handling, and even copywriting. Once you promise local-first behavior, hidden server fallbacks are no longer acceptable.
That forced a cleaner system. Core operations had to succeed with the network unplugged after page load. If they did not, we treated that as a bug, not a nice-to-have.
Lesson 2: pdf-lib is powerful, but not magic
pdf-lib gave us a strong base for merge/split/reorder/write workflows, but complex real-world documents exposed edges quickly: unusual object graphs, large embedded assets, and metadata variations that needed explicit handling outside happy-path APIs.
We learned to treat PDF processing as defensive programming. Assume malformed inputs, weird producer chains, and mixed encoding behaviors. Build fallback logic early.
// The pattern that scaled for us:
// 1) Parse defensively
// 2) Isolate risky operations
// 3) Recover with partial success where possible
// 4) Surface actionable errors to usersLesson 3: workers are the difference between 'usable' and 'frozen'
Main-thread PDF work looks fine on tiny documents and collapses on large ones. The first time we pushed big multi-file jobs through the UI thread, interaction quality cratered. Buttons lagged. Progress indicators stuttered. Users thought the app crashed.
Moving heavy tasks into Web Workers fixed both responsiveness and user trust. People tolerate long operations if the interface stays alive, progress is honest, and cancel/resume behavior is predictable.
- Worker pool capped by hardware concurrency (with safety max).
- Per-file queue states: queued, processing, done, error.
- Progress events normalized so UI components stay simple.
- Cancellation modeled explicitly, not as a thrown exception.
Lesson 4: memory is your real production budget
Browser memory limits are less forgiving than backend nodes, especially on mobile. Large PDFs can inflate in-memory representation far beyond source file size once decoded. Without discipline, you can run out of memory before users understand what happened.
We adopted chunked pipelines, aggressive object cleanup, and blob URL lifecycle controls to keep memory pressure manageable. We also made failures explicit: better to warn users early than crash after two minutes of fake progress.
Lesson 5: offline does not mean anti-AI, it means explicit boundaries
We still wanted summarization and Q&A features. The compromise was strict separation: core file operations remain local by default; AI routes require explicit user consent and clear disclosure that extracted text is sent server-side for model processing.
This boundary became one of our best product decisions. Users understand exactly when they are in local mode versus assisted mode. Trust improves when transitions are explicit.
Lesson 6: verification UX is product, not documentation
We expected only technical users to care about verification. Instead, compliance leads, legal teams, and procurement reviewers asked for it immediately. That pushed us to build /verify-claims as a first-class page, not a buried docs note.
When users can validate architecture claims themselves, sales friction drops and support becomes simpler. Verification is now one of our strongest onboarding features.
What we would do differently if starting again
We would implement worker-first from day one, design memory telemetry earlier, and standardize shared tool shell components sooner. Much of the complexity came from retrofitting these patterns after feature growth.
- Define local/remote boundaries before writing feature code.
- Build a consistent progress event contract across all engines.
- Ship synthetic large-file tests earlier in CI.
- Treat metadata and redaction controls as core, not add-ons.
If you are building privacy-first tooling, bias toward observable guarantees over abstract claims. Architecture that users can verify will outlast marketing copy every time.
Try the architecture, not just the article
The fastest way to evaluate these lessons is to run the tools directly: Batch Engine, Convert PDF, and Offline OCR. Keep DevTools open and test with real files.
If the web can process your documents locally and transparently, you should demand that baseline everywhere.
Share this Guide
Help others discover privacy-first PDF tools