Question 1

Does this compare visual layout, or just the text?

Accepted Answer

Just the text. The tool extracts the text content stream from each PDF page and runs a character-level diff. Two PDFs that look visually different but contain the exact same text — same words, same punctuation, same order — will show as no change. Differences in fonts, margins, line spacing, image placement, page breaks, or color are NOT detected. If you need pixel-level visual comparison, that's a separate problem (and one that genuinely belongs on a desktop tool with stable rendering, not a browser).

Question 2

Why is there a 100-page cap?

Accepted Answer

diff-match-patch's diff_main is O(N · D) where D is the edit distance, and the algorithm runs entirely in the main JavaScript thread. Past roughly 500,000 characters of combined text — about 100 pages of normal-density prose — the browser starts blocking the UI for visibly long stretches, and on slower hardware the tab can be killed by the browser's hang detector. The cap is honest engineering: you'd rather get a partial diff with a banner than a frozen tab. For larger documents, split into chapters and compare each.

Question 3

What's in the JSON change report, and what is it for?

Accepted Answer

The report is structured per-page: each page has an array of ops, where each op is either {type: 'equal', text: '...'}, {type: 'insert', text: '...'}, or {type: 'delete', text: '...'}. Insertions are present in the Revised PDF and absent in the Original; deletions are the opposite; equals are unchanged segments. Use cases: feeding the diff into an automated review pipeline (LLM redline summary, contract analysis, change-tracking dashboards), archiving the exact textual delta with a contract revision, or building a custom report from the raw ops without re-running the diff. The JSON is generated client-side and saved to your downloads — it never leaves the browser either.

Question 4

Is anything uploaded to a server?

Accepted Answer

No. Both PDFs are read into memory locally via pdf.js, the diff is computed by diff-match-patch in your browser, and the JSON report is built client-side. There is no server endpoint involved at any step. This matters specifically for contract revisions, manuscript drafts, and other documents whose content is the precise reason you wouldn't want to upload them to a third party.

Question 5

Why is the diff comparing pages independently?

Accepted Answer

If both PDFs were concatenated into one big string, a single page break shifting on page 2 would propagate as a giant insert/delete chain through every subsequent page — even if every word were identical. Page-by-page comparison treats each page as its own diff, which is faster and produces sane output when the two files differ in pagination but not in actual content. The trade-off: if content moves across a page break, that motion shows as a delete on one page and an insert on the next.

Question 6

Can I compare scanned PDFs?

Accepted Answer

Only if the scan was OCR'd first. A pure scan stores each page as an image, and pdf.js extracts no text — both panels will show as empty. Run the document through /ocr-pdf first to add a searchable text layer, then compare. The diff still works on the OCR'd output, with the usual caveat that OCR errors will show up as spurious changes.

How the comparison works

When to compare two PDFs

Why this runs in your browser

What this tool doesn't do

Frequently asked questions