Why server-side PDF tools aren't privacy-first, and how client-side changes that

Most online PDF tools work the same way. You upload a file, a server processes it, the server hands back the result, and your file gets deleted after some window. That's a fine architecture. It's also, by construction, not private, and I think "privacy-first" gets stapled onto it far too casually.
I want to lay out the actual technical and legal difference between server-side and client-side, because it's more concrete than the marketing on either side lets on, and because the client-side version has quietly gotten boring enough to be the default.
The deletion window is the whole problem
Reputable server-side tools are honest about the flow: upload, process, delete. The deletion windows I've seen documented run from "immediately after processing" to 1 hour to 24 hours. Take the most generous reading, and assume every one of them deletes exactly when it says.
That still means your file existed, in plaintext, on someone else's hardware, for some window. And during that window it's exposed to a few things:
- Breach. Logs, temp directories, swap, backup snapshots, CDN and edge caches: all of them capture data the application layer thinks it already deleted. "We delete after 1 hour" is an application promise, not a storage guarantee. Anyone who's run an incident knows the file you "deleted" is usually still sitting in three other places.
- Subpoena and legal process. If a file is on a server at time T and a preservation order lands at time T, your deletion window is now whatever a court says it is. You can't subpoena data that never left the user's laptop. You can absolutely subpoena a 24-hour buffer.
- Insider and supply-chain access. Ops staff, a compromised dependency somewhere in the upload path, a misconfigured bucket. The attack surface is everything that touches the file between upload and delete.
None of this needs the operator to be a villain. The point is that the architecture creates a window where your file's safety rides on the operator's competence and the legal weather, and you control neither.
What client-side actually changes
Run the processing in the browser and the file never goes over the wire. pdf.js parses, pdf-lib writes, WASM does the heavy lifting, and the result is handed to you as a local download. The server's job shrinks to "serve some static assets once," and after that it's a bystander.
So the deletion window doesn't get shorter. It's gone, because there's nothing to delete. Breach scope, subpoena scope, insider scope, for your file, all drop to zero. The file sits in exactly one place it already was: your machine.
The part I actually like: you can check it
This is the bit that turns it into a real argument instead of a trust-me. With server-side, "we delete your file" is unfalsifiable from where you're standing. You take it on faith and hope the policy matches the code.
Client-side you can check in about ten seconds:
- Open DevTools, Network tab. No fancy throttling needed.
- Process a file.
- Watch for an upload. There isn't one. Static JS and WASM load up front, your bytes never show up in a request body.
Or the cruder version: load the page, kill the Wi-Fi, process a file anyway. If it works offline, it physically could not have phoned home. Honestly, "turn off the internet and it still works" lands harder for me than any privacy policy I've ever skimmed, and I've skimmed a lot of them so you don't have to.
The honest tradeoffs, because there always are some
I'm not going to pretend client-side is free.
- Your device does the work. A 300-page merge or an OCR pass leans on the user's CPU and RAM. Phones are the real constraint here. Some operations that are trivial on a server are slow on a low-end phone, or quietly walk into a memory ceiling and fall over. We route per device (WebGPU where it exists, single-thread WASM on phones) and still lose to physics now and then.
- Cold start. The WASM and any model weights download once. First load is heavier than a server tool's first load, no way around it. After that it's cached and works offline.
- Some things genuinely need a server. Anything that needs a shared secret, server-held keys, or compute that won't fit in a tab. In-browser AI has a real ceiling too: the small models that run on a phone get confused easily on cluttered inputs, and in-browser OCR is plainly worse than a datacenter pipeline. Client-side isn't a moral high ground for every problem. It's the right default for "transform this file the user already has," and not much beyond that.
Why this is suddenly normal
Five years ago the WASM toolchain wasn't there, and the honest answer to "can you do this in the browser" was usually "no, use a server." Now pdf.js, pdf-lib, qpdf-wasm, and the image/Canvas/WASM stack are mature enough that the browser is a perfectly good runtime for document work. The interesting shift isn't some clever trick. It's that the boring, verifiable architecture finally got practical, so "we don't upload your file" stopped being a constraint you apologize for and turned into a feature you can prove on the spot.
Full disclosure: I build one of these (privacy-first PDF and image tools, all in-browser), so I'm not a neutral party. But the architecture argument holds up no matter whose tool you use. If a file can be uploaded, deleted, breached, or subpoenaed, then "private" hinges on a window you don't control. If it never leaves the tab, the window doesn't exist.
The open question I keep chewing on: is the device-cost tradeoff worth it for the long tail of heavy operations, or is that exactly where server-side keeps winning?
The PDF tools are here if you want to run the kill-the-Wi-Fi test yourself.
Every tool on PDF & Image Tools runs entirely in your browser. Your files never leave your device.
← All posts