Why Server-Side PDF Tools Leak Your Data
Server side PDF tools data leak the moment something goes wrong on the operator's side. The November 2025 jsonformatter.org breach proved the failure mode is not theoretical. Every upload-based PDF tool runs the same risk.
November 2025: 5GB of credentials leaked from a server-side tool
Researchers at watchTowr Labs disclosed that jsonformatter.org and codebeautify.org had stored user submissions server-side for years. The exposed archive contained AWS keys, GitHub tokens, database passwords, and Stripe secret keys from banks, government agencies, and Fortune 500 companies. The Hacker News, November 2025.
- What the jsonformatter breach proved
- Why the same risk applies to PDF tools
- What a server-side PDF tool sees when you upload
- The sensitive PDFs people upload anyway
- What the privacy policies actually disclose
- The structural fix: browser-local processing
- A 60-second audit for any PDF tool you use
- Frequently asked questions
What the jsonformatter breach proved
For years, the privacy critique of server-side dev tools was theoretical. People argued that pasting credentials into an online JSON formatter was risky because the operator could log it, or might be breached, or could be compromised by a supply-chain attack. The November 2025 disclosure ended the theoretical part of the argument.
Security firm watchTowr Labs found that jsonformatter.org and codebeautify.org had been retaining user-submitted text server-side for years. The exposed archive totalled 5GB across more than 80,000 files. The contents were not random JSON samples. They were AWS access keys pasted in for debugging, GitHub tokens embedded in webhook payloads, database connection strings copied from production configs, and Stripe secret keys from internal billing flows. Submissions came from banks, government agencies, and Fortune 500 companies. The full disclosure ran on The Hacker News in November 2025.
jsonformatter.org served about 2.4 million monthly visitors at the time, based on Similarweb's October 2025 traffic estimate. Even if only one in a thousand visitors pasted something sensitive, that is 2,400 credential exposures a month from a single tool. The actual exposure was higher.
The mechanism was not exotic. The tool accepted text, logged the input for features and debugging, and the logs sat in storage long enough to be reachable by someone outside the company. No zero-day, no nation-state actor, no novel attack chain. The breach happened because the architecture treated user input as something safe to retain server-side. The full coverage of how it played out sits in our companion post on the jsonformatter API key leak.
Why the same risk applies to PDF tools
Replace "JSON" with "PDF" and the rest of the breach narrative reads identically. iLovePDF, Smallpdf, PDF24's online endpoints, Sejda Online, FreeConvert, and the long tail of ad-funded PDF sites all run the same primitive: accept an upload, process server-side, return a result. The pieces that broke at jsonformatter.org exist in every one of them.
Every server-side PDF site has an upload endpoint, a temporary storage layer (S3, EFS, local disk), a processing worker, queue logs, processing logs, access logs, and a delivery endpoint. Many also retain the converted output for re-download. Each of those components is a place where the contents of your PDF sit in plaintext or near-plaintext at some point. A single misconfigured S3 bucket, a single indexed log file, a single compromised npm package in the front-end build, and the contents are out.
The risk surface is broader than the breach surface, too. The operator can be served a court order. The operator can be acquired by a less careful company. The operator can ship a tracking pixel that fingerprints filenames. The operator can quietly start training an internal model on submitted content. None of these are hypothetical. iLovePDF's privacy policy, as of January 2026, reserves the right to "use aggregated data for service improvement," which is a phrase broad enough to cover model training in a way most users do not parse.
What a server-side PDF tool sees when you upload
Pick any upload-based PDF tool and walk through what hits the operator's infrastructure when you click "convert":
- The full file bytes. The PDF is uploaded in its entirety, decompressed, and decrypted (if it had a password). Every page of text, every image, every embedded font, every form field value, every digital signature, every metadata tag.
- The filename. Filenames are sticky metadata that survive in logs long after the file is deleted. "Statement_HDFC_Apr2026.pdf", "Aadhaar_Arun.pdf", "Salary_Slip_March_2026.pdf" — each of these is a category label that makes the underlying contents inferable even from log traces.
- Your IP, user-agent, and referer. Standard web access logs tie the upload to a network location and browser fingerprint. For corporate users, the IP often resolves to the company.
- Any account email if logged in. Premium tiers require login, which links every upload to a verified email.
- Conversion parameters. Page ranges, compression levels, output formats — all logged for analytics.
- Crash dumps and processing artefacts. When a conversion fails, the file often gets copied to a debugging bucket so engineers can reproduce. Those buckets have looser retention than the main pipeline.
The privacy policy might claim the file is deleted in 60 minutes. The filename, the IP, the email, the crash dump, and the access logs are not the file — and they often persist for the full standard log-retention window of 90 days to several years.
The sensitive PDFs people upload anyway
Walk through the actual usage. A senior developer in Bangalore uploads a bank statement PDF to a compression tool because their loan officer asked for the file under 500KB. A finance analyst in Mumbai uploads a GST invoice PDF to a merge tool to bundle Q1 filings. A 23-year-old applying for a Schengen visa uploads three salary slips, an Aadhaar PDF, and an ITR acknowledgement to an online PDF-to-image converter so the consulate's upload portal accepts them.
None of those users were thinking about server-side data retention. They were thinking about the next task in their morning. The tool they used was the first result on Google, looked clean, said "secure" somewhere on the page, and worked in 8 seconds. The sensitive content of those files — bank balance, transaction history, Aadhaar number, signed payslip, taxpayer ID — sat on a server they had never heard of, in a region they could not locate, governed by a privacy policy they did not read.
For India specifically, the bank-statement-PDF flow is the highest-volume sensitive upload pattern. Loan applications, rental verifications, visa documentation, and credit-card limit increases all trigger one. The Reserve Bank of India's account aggregator framework was supposed to solve this with consent-based data sharing, but adoption is still partial, so the PDF-upload workaround keeps running. Every one of those uploads is an unforced credential and identity exposure.
What the privacy policies actually disclose
Read three of them side by side. iLovePDF's January 2026 policy states that files are "automatically deleted after a few hours," and that the company "may collect technical data and usage statistics." The deletion window is not specified more precisely. Smallpdf's policy promises deletion within an hour for free users and 30 minutes for paid users, but reserves the right to retain "metadata about your use of the service" indefinitely. PDF24's policy is the most permissive: it states that online tools "process files on our servers" and that data is "deleted automatically," with no specified window at all.
None of the three explicitly disclose how long server logs retain filenames, crash dumps, or processing metadata. None disclose whether engineering teams can access submitted files during a debugging session. None disclose the sub-processors that touch the file in flight — the CDN edge, the WAF, the anti-virus scanner, the cloud-storage provider. Each of those is a separate entity with its own retention policy.
This is not a knock on any single operator. It is the standard shape of a server-side SaaS privacy policy in 2026. The promises are narrow because the architecture cannot support broader ones. A tool that receives the file cannot credibly promise the file is never observed; it can only promise it is deleted soon.
The structural fix: browser-local processing
The fix is not a better privacy policy. The fix is to not upload the file at all. Modern browsers ship with a JavaScript engine and a WebAssembly runtime powerful enough to do every common PDF operation locally. PDF.js (Mozilla's open-source renderer) handles reading. pdf-lib handles editing, merging, and splitting. Tesseract.js handles OCR. Compression runs via pdf-lib or WebAssembly ports of Ghostscript. The File API gives the page read access to a file the user selects, but the bytes stay in browser memory. No fetch, no XHR, no upload.
Every tool on pdfmavericks.com/all-tools is built this way by design. Merge, split, compress, rotate, sign, watermark, unlock, redact, OCR, convert — all of them run inside the browser tab. You can verify it in two seconds: open the browser's network tab before clicking the action button, run the conversion, and watch the bytes-out counter. If it stays at zero, the file did not leave your machine.
The browser-local architecture has trade-offs. Multi-thousand-page batch jobs and gigabyte-scale OCR jobs are slower in a browser tab than on a server farm. For those rare cases, the correct alternative is your own server or a self-hosted tool like Stirling-PDF — not a third-party SaaS where your files sit next to everyone else's. For the everyday cases that make up most PDF workflow — under 200 pages, under 100MB, single-document — browser-local handles all of them in seconds.
The jsonformatter.org breach is the clearest current example of why this architectural distinction matters. The fix is structural. Pick tools where the architecture cannot leak the file, not tools where the policy promises not to. The promise is only as good as the next misconfigured bucket.
A 60-second audit for any PDF tool you use
Before you upload a bank statement, an Aadhaar PDF, or a signed contract to any online PDF tool, run this check. It takes about a minute and rules out the entire category of server-side risk.
- Open the browser network tab (DevTools, Network panel) before you load the file into the tool. Keep it open.
- Load the file and click the action. If you see a POST to any endpoint with a request payload size matching your file size, the file is being uploaded. That is a server-side tool, regardless of what the homepage says.
- Check the "Initiator" column. If a third-party domain (analytics, CDN, advert) initiated a request after the file load, your filename or processing parameters are leaving via that channel.
- Inspect the response. If the converted output comes back from a URL on the operator's domain, the operator generated it server-side and holds a copy long enough to serve it.
On every pdfmavericks.com tool, this audit returns zero bytes uploaded and zero third-party request initiators after the file is loaded. The converted output is produced inside the browser tab and offered as a local download via a blob URL. No server round-trip exists in the flow.
The same audit applied to iLovePDF, Smallpdf, Sejda Online, or PDF24's online endpoints returns a multi-megabyte POST to a backend the moment you click the action button. The file is on their server. From that point, the privacy outcome is governed by their controls — not yours.
Your files never leave your browser
PDF Mavericks processes every file locally using PDF.js, pdf-lib, and WebAssembly. No upload at any step. No account. No data retention.
Frequently asked questions
What is a server-side PDF tool, and how is it different from a browser-local one?
A server-side PDF tool uploads your file to the operator's infrastructure, runs the conversion or edit on their machines, and sends a result back. iLovePDF, Smallpdf, PDF24's online flows, Sejda Online, and most ad-funded PDF sites work this way. A browser-local tool does the same work inside your browser tab using JavaScript and WebAssembly libraries like PDF.js and pdf-lib, so the file never crosses the network. The architectural difference matters more than the marketing copy. Even if the privacy policy says files are deleted in an hour, the file still has to be received, decrypted, decompressed, written to disk, and then deleted, with every step exposed to compromise.
Is the jsonformatter.org breach really comparable to PDF tools?
The mechanism is identical. jsonformatter.org accepted user-submitted text on a server, logged it for product features and analytics, and the logs were exposed because access controls were inadequate. Every server-side PDF site has the same primitives: an upload endpoint, temporary storage, queue logs, processing logs, and access logs. Many also retain a cached copy of the converted output. If any of those pieces is misconfigured, indexed by Google, or breached by an attacker, the contents of your PDF are exposed the same way. The only thing that prevents that risk is keeping the file off the server in the first place.
Don't reputable PDF sites delete files after an hour?
Most of them say so in the privacy policy, but the deletion window is not the threat model. The threat windows that matter are the upload window (where TLS interception, supply-chain JavaScript, or a malicious browser extension can read the file in transit), the processing window (where logs, crash dumps, and temporary files exist), and the long-tail metadata window (where filenames, IP addresses, account emails, and conversion patterns can persist for years). The November 2025 jsonformatter.org disclosure showed the long-tail window in action — submissions from years earlier were still in storage. A one-hour deletion claim only addresses one of three windows.
What sensitive PDFs do people commonly upload without realizing the risk?
Bank statements requested for loan applications, salary slips for visa documentation, Aadhaar PDFs for KYC, GST invoices, ITR acknowledgements, medical reports for insurance claims, signed contracts, NDAs, legal notices, and board-deck PDFs. In India specifically, the bank-statement-PDF flow is the highest-volume sensitive upload — every loan application and rental verification triggers one. None of these documents should be on a third-party server with an unknown retention policy. The 6 to 12 seconds saved by clicking a familiar online tool is a poor trade for the credential, identity, or financial exposure that follows a breach.
How does a browser-local PDF tool actually work without uploading?
The browser ships with a JavaScript engine and WebAssembly runtime. Libraries like PDF.js (Mozilla's open-source PDF renderer), pdf-lib, and Tesseract.js compile to WebAssembly and run the conversion inside the tab. The File API gives the page read access to the file the user selected, but the bytes stay in browser memory. No fetch, no XHR, no upload. You can verify this in two seconds: open the browser's network tab, run the conversion, and watch the bytes-out counter. If it stays at zero, the file did not leave your machine. Every pdfmavericks.com tool is built this way by design.
What about the legitimate cases where I need server processing?
There are some PDF operations where browser-local processing genuinely struggles — multi-thousand-page batch jobs, OCR over scanned files larger than a few hundred MB, or workflows that need a queue and webhook. For those, the correct architecture is your own server or a self-hosted tool like Stirling-PDF, not a third-party SaaS that processes your files alongside everybody else's. The line is simple: if you would not put the file in someone else's S3 bucket, do not put it in their PDF tool. For the everyday cases — merge, compress, sign, split, convert, watermark, redact, unlock — browser-local handles all of them.
How do I move my workflow from a server-side tool to a browser-local one?
Bookmark the pdfmavericks.com tool you use most often and use it once. The interaction model is the same as iLovePDF or Smallpdf — drop the file, click the action, download the result — but the file stays local. For team workflows, share the tool URL in your wiki or onboarding doc instead of the upload-based competitor. There is no signup, no account, and no data retention to migrate. The switch is a one-day change for any team handling regulated documents like Aadhaar PDFs, GST filings, or bank statements.
Where can I read the original jsonformatter.org disclosure?
Security firm watchTowr Labs published the original disclosure in November 2025, and The Hacker News covered it in detail at thehackernews.com/2025/11/years-of-jsonformatter-and-codebeautify.html. The disclosure documented years of stored submissions across jsonformatter.org and codebeautify.org, totalling 5GB and more than 80,000 files. The exposed data included AWS access keys, GitHub tokens, database connection strings, Stripe secret keys, and internal API credentials from banks, government agencies, and large technology companies.