How the comparison works

You drop two PDFs into the panels. The tool extracts the text from each page using pdf.js — the same renderer Firefox ships with — and feeds the per-page text streams into Google's diff-match-patch algorithm. The output is a sequence of ops — equal, insert, delete — that get rendered as green-add and red-delete highlights on a side-by-side view. Page-by-page, not one giant string, so a single page break moving doesn't cascade as a hundred-page diff.

Granularity is configurable. Character mode catches typos and one-letter edits. Word mode catches phrase-level rewrites without highlighting every adjacent character. Line mode is closest to a Git-style unified diff, useful for contract clauses where the unit of change is a sentence or paragraph.

The JSON change report follows the same structure used by every diff library that handles textual deltas: an array of {type, text} ops per page. It's small enough to paste into an LLM, structured enough to feed into an automated redline pipeline, and stable enough to archive next to the contract revision it describes.

When to compare two PDFs

  • Contract revisions. Counterparty sends back a marked-up draft with track changes turned off. You need the exact word-for-word delta against your original — without uploading the contract to a third-party server.
  • Manuscript drafts. Editor returns version 4. Author wants to see what actually moved between version 3 and version 4 without re-reading the entire 60 pages.
  • Policy documents. Government regulator publishes a revised circular. Compliance team needs the deltas against last quarter's version to update internal procedures.
  • Generated reports. A nightly export changed unexpectedly. Diff today's PDF against yesterday's to find which numbers shifted.
  • Archive verification. The PDF you stored last year vs. the one a colleague just sent you — confirm they're actually identical, not just similarly-named.

Why this runs in your browser

The most common use case for a PDF compare tool is comparing two confidential documents — a contract, a draft, a regulatory filing. The entire reason someone reaches for a comparison tool is that the document content is sensitive enough to warrant careful review. Uploading it to a server to do that review is the exact failure mode the user is trying to avoid.

Most online PDF compare tools route your file through a backend. PDF24, Draftable, Adobe's online compare — all upload, run the diff server-side, and return a result. Their privacy pages will tell you the file is deleted "within an hour" or "after processing"; they will not tell you what happens to the bytes between upload and deletion. For contract negotiations, that's an unacceptable trust gap.

PDF Mavericks does it the other way. The text extraction (pdf.js), the diff (diff-match-patch), and the JSON report generation all run in your browser tab. No multipart upload, no presigned URL, no temporary S3 bucket. The Network tab in DevTools will confirm: zero outbound requests carrying the file content during the entire comparison.

For the same privacy story applied to text-only diffs, see our JSON diff compare tool. If you also need to scrub identifying content from a PDF before sending, the Redact PDF tool runs the same browser-local way.

What this tool doesn't do

Visual diffs. Two PDFs with identical text but different fonts, page sizes, or margins will show as "no change". The diff is computed on the extracted text content stream, not on the rendered pixels. If you need pixel-level visual comparison, this isn't the right tool — that workflow generally belongs on a desktop application that controls rendering deterministically.

Scanned PDFs without OCR. A pure scan has no text content — pdf.js will extract an empty string from every page, so the diff has nothing to work with. Run the file through OCR PDF first to add a searchable text layer, then compare.

Documents past 100 pages. diff-match-patch is single-threaded JavaScript. Above ~500,000 characters of combined input the browser starts hanging visibly, and the worst case can crash the tab. The tool truncates at the cap and shows a banner. Split your document and compare chapter-by-chapter for full coverage.

Encrypted PDFs. Password-protected PDFs need to be unlocked before comparison. Use Unlock PDF first if pdf.js refuses to read your file.

Frequently asked questions

Does this compare visual layout, or just the text?

Just the text. The tool extracts the text content stream from each PDF page and runs a character-level diff. Two PDFs that look visually different but contain the exact same text — same words, same punctuation, same order — will show as no change. Differences in fonts, margins, line spacing, image placement, page breaks, or color are NOT detected. If you need pixel-level visual comparison, that's a separate problem (and one that genuinely belongs on a desktop tool with stable rendering, not a browser).

Why is there a 100-page cap?

diff-match-patch's diff_main is O(N · D) where D is the edit distance, and the algorithm runs entirely in the main JavaScript thread. Past roughly 500,000 characters of combined text — about 100 pages of normal-density prose — the browser starts blocking the UI for visibly long stretches, and on slower hardware the tab can be killed by the browser's hang detector. The cap is honest engineering: you'd rather get a partial diff with a banner than a frozen tab. For larger documents, split into chapters and compare each.

What's in the JSON change report, and what is it for?

The report is structured per-page: each page has an array of ops, where each op is either {type: 'equal', text: '...'}, {type: 'insert', text: '...'}, or {type: 'delete', text: '...'}. Insertions are present in the Revised PDF and absent in the Original; deletions are the opposite; equals are unchanged segments. Use cases: feeding the diff into an automated review pipeline (LLM redline summary, contract analysis, change-tracking dashboards), archiving the exact textual delta with a contract revision, or building a custom report from the raw ops without re-running the diff. The JSON is generated client-side and saved to your downloads — it never leaves the browser either.

Is anything uploaded to a server?

No. Both PDFs are read into memory locally via pdf.js, the diff is computed by diff-match-patch in your browser, and the JSON report is built client-side. There is no server endpoint involved at any step. This matters specifically for contract revisions, manuscript drafts, and other documents whose content is the precise reason you wouldn't want to upload them to a third party.

Why is the diff comparing pages independently?

If both PDFs were concatenated into one big string, a single page break shifting on page 2 would propagate as a giant insert/delete chain through every subsequent page — even if every word were identical. Page-by-page comparison treats each page as its own diff, which is faster and produces sane output when the two files differ in pagination but not in actual content. The trade-off: if content moves across a page break, that motion shows as a delete on one page and an insert on the next.

Can I compare scanned PDFs?

Only if the scan was OCR'd first. A pure scan stores each page as an image, and pdf.js extracts no text — both panels will show as empty. Run the document through /ocr-pdf first to add a searchable text layer, then compare. The diff still works on the OCR'd output, with the usual caveat that OCR errors will show up as spurious changes.