PDF Tools
Privacy
No Upload

Compare Two PDFs Online Free: Side-by-Side Diff, No Upload

Compare two PDFs online free in your browser — character-level text diff, JSON change report, page-by-page alignment. For contract drafts. No upload, no signup.

PDF Mavericks·

Your client returned the 40-page master services agreement Friday evening with a one-line email: "Few minor edits, sign back today." The PDF looks identical to the version you sent Wednesday — same logo, same numbered sections, same exhibit count. You scroll the side-by-side preview your reader shows and your eyes glaze over by section 6. Three places in that document carry six-figure consequences if a clause has been quietly rewritten. You need a tool that will compare two PDFs online free, page by page, character by character, and tell you exactly what changed. You also need that tool to not upload the contract anywhere because you are actively negotiating it. This guide is about how to do both.

The same workflow applies to researchers comparing manuscript drafts (the editor returned version 4, what actually moved between version 3 and version 4 across 60 pages?), engineers reviewing spec changes (the architecture document was updated overnight, which sections did the architect actually touch?), compliance teams tracking regulator circulars (the central bank revised the KYC framework in December — what is the delta?), and anyone who has ever had to verify that the archived PDF and the freshly-sent PDF are actually identical and not just similarly named. All of these have one thing in common: visual inspection does not scale, and uploading the document defeats the point of the comparison.

The contract-revision use case

Contract redlining is the canonical reason this tool exists. Track changes is the standard mechanism, but it depends on every party using Word and leaving track changes turned on through the entire revision chain. In practice three things break that workflow:

  • The counterparty exports to PDF. Most legal teams circulate executed-but-not-yet-signed drafts as PDFs because PDFs cannot be edited in transit. Track changes does not survive the Word to PDF export — the file you receive is a clean PDF with no visible redlines.
  • Track changes was rejected silently. One reviewer accepted all changes "to clean up the document" before forwarding. Subsequent edits do not show against your last-seen version because that version no longer exists in the file's history.
  • The redline is stale. Your last-seen draft was three rounds ago. Tracking against that draft generates a redline full of changes you already approved. You need the delta against the most recent version you signed off on, which is a PDF.

In all three cases, the answer is the same: compare the PDF you sent against the PDF you received, character by character, and have a tool point at the actual changes. Manual review of a 40-page contract takes a senior associate roughly two hours. A textual diff takes two seconds and produces a JSON change report you can paste into the email response.

Why visual comparison fails on PDFs

The instinct is to open both PDFs side-by-side and scroll. This fails at any document length above a few pages, for three structural reasons.

Formatting noise dominates. A counterparty's law firm uses a different Word template — different default font, different line spacing, different first-line indent. Every paragraph looks slightly different, even when the words are identical. Your visual inspection wastes attention on cosmetic differences and misses the one clause that actually changed.

Page breaks shift. If the revised version added two sentences in section 4, every page after section 4 starts and ends in different places. The page-12 paragraph you remember from your last read is now half-on-page-12 and half-on-page-13. Comparing "same position on each page" no longer maps to "same content", and your eye has nothing stable to anchor against.

Human attention degrades fast. The literature on proofreading is brutal: a trained proofreader catches roughly 70% of errors on first pass and that drops below 50% past page 20. A two-hour senior-associate review of a 40-page contract is fundamentally unreliable as a way to verify nothing important changed. A textual diff is deterministic — every change shows up, every time, regardless of fatigue.

The fix is to extract the text from each PDF and run a character-level diff. The output is no longer a side-by-side rendering of pages; it is a list of changed segments with insert and delete highlights. The review takes the time it takes to read the changed segments, not the time it takes to scroll the entire document.

Page-by-page vs. single-string diff

Once you decide to extract text first, the next decision is how to structure the input. Two approaches.

Single-string diff. Concatenate all pages of each PDF into one big string, run the diff once. Simple to implement, and it produces the cleanest possible diff for documents where pagination is identical. The failure mode is brutal: if the revised version inserts even one paragraph early on, every subsequent page break shifts by some characters, and the diff treats every page break as a change. A 40-page contract with one paragraph added on page 3 looks like 38 pages of changes.

Page-by-page diff. Extract each page as its own text block, diff page N of the original against page N of the revised. Faster (smaller inputs run quadratically faster), and produces sane output when pagination drifts. The trade-off: if a paragraph moves across a page break, the move shows as a delete on one page and an insert on the next. For most contract review this is acceptable — you would rather see the paragraph as moved than miss it because it cascaded a 40-page diff into noise.

The PDF Mavericks compare PDF tool uses page-by-page diff with character/word/line granularity options. The same architectural choice underlies the JSON-level approach we wrote up in JSON diff online — sister concept, different data model.

Three diff granularities: character, word, line

Once you commit to a textual diff, you still have to choose what counts as a unit of change. Three options, each with a distinct best fit.

Character mode

The diff operates one character at a time. A typo fix (recievereceive) shows as the smallest possible edit. Best for: contract review where a single comma matters (the "serial comma" case in O'Connor v. Oakhurst Dairy, 851 F.3d 69 (1st Cir. 2017), turned on a missing comma and cost the dairy USD 5 million in unpaid overtime). Trade-off: character mode highlights every adjacent character of a long edit, which can be visually noisy in side-by-side rendering.

Word mode

The diff treats each whitespace-separated word as the unit of change. A clause rewritten from "the parties shall" to "the parties may" shows three changed words instead of seven changed characters. Best for: phrase-level rewrites, the most common kind of substantive edit in a contract revision. Word mode is the right default for most users.

Line mode

The diff treats each line (or each sentence, depending on the implementation) as the unit. Equivalent to a Git-style unified diff. Best for: contract clauses where the unit of legal meaning is a sentence or paragraph; also for spec documents where each requirement is a numbered line. The output is the most compact, the review is the fastest, and the false-positive rate (lines flagged as changed because of trivial whitespace) is the highest.

For a contract redline, start with word mode, drop to character mode on any flagged paragraph that needs precise verification, and use line mode only when the document is structured as numbered requirements (RFCs, ISO standards, regulatory circulars).

The privacy wedge: never upload a contract you are negotiating

The most common use case for a PDF compare tool is comparing two confidential documents — a contract draft, a manuscript, a regulatory filing under embargo. The entire reason someone reaches for a comparison tool is that the document content is sensitive enough to warrant careful review. Uploading it to a server to do that review is the exact failure mode the user is trying to avoid.

Most online PDF compare tools route your file through a backend. PDF24, Draftable, Adobe's online compare — all upload, run the diff server-side, and return a rendered HTML result. Their privacy pages will tell you the file is deleted "within an hour" or "after processing"; they will not tell you what happens to the bytes between upload and deletion, who has access, where the backups live, or what the retention policy actually is on the logs that captured the upload event.

The November 2025 jsonformatter.org and codebeautify.org incident is the closest case study. Security firm watchTowr Labs disclosed that both sites had been storing user submissions for years; the exposed archive totalled about 5GB across 80,000+ files and included AWS keys, JWTs, internal endpoints, and customer records from banks and government agencies. Users believed they were using a stateless tool; they were not. PDFs of contract drafts uploaded to an online compare tool carry the same risk pattern, with worse stakes — a leaked contract draft can blow up an active negotiation, leak counterparty terms, or expose attorney-client work product.

The browser-local approach side-steps this. Both PDFs are read into memory via pdf.js (the same renderer Firefox ships with, MIT-licensed, audited by Mozilla's security team), the text extraction runs locally, the diff is computed by Google's diff-match-patch algorithm in the same JavaScript thread, and the JSON change report is built client-side. No multipart upload, no presigned URL, no temporary S3 bucket. The DevTools Network tab will confirm: zero outbound requests with the file content during the entire comparison.

Step-by-step: compare two PDFs online free

  1. Open the tool. Visit pdfmavericks.com/compare-pdf. Two drop zones: Original on the left, Revised on the right. No account, no email gate.
  2. Drop the Original PDF in the left panel. The version you sent, signed off on, or last reviewed clean. The browser reads it into memory locally — the Network tab in DevTools will show no outbound traffic.
  3. Drop the Revised PDF in the right panel. The version that came back. Same browser-local read.
  4. Pick a granularity. Word mode is the right default for contract review. Switch to character mode for high-stakes clauses where a single comma changes meaning. Switch to line mode for numbered requirements or RFC-style documents.
  5. Click Compare. The text is extracted from each page using pdf.js, and the diff runs page-by-page using Google's diff-match-patch. For a 40-page contract this completes in under a second on a modern laptop.
  6. Review the side-by-side output. Insertions show in green on the right panel. Deletions show in red on the left. Equals are unchanged. The similarity percentage at the top is a quick triage signal — 95%+ usually means small edits, below 70% means a substantive revision.
  7. Download the JSON change report. Per-page array of ops in the structure described above. Paste it into your email response, archive it next to the contract revision, or pipe it into an LLM for a redline summary.

The 100-page cap (and why it exists)

The tool truncates the comparison at roughly 100 pages of normal-density prose, equivalent to about 500,000 characters of combined text across both files. This is a deliberate engineering choice, not a paywall. Three forces drive the cap.

First, diff-match-patch is O(N · D), where N is input length and D is edit distance. The algorithm is the gold standard for text diff (Google uses it inside Google Docs, the same algorithm powers client-side change tracking in Notion and Linear), but the time complexity is unforgiving as inputs grow. A 100-page text diff with 5% edits takes about 200 ms on a 2024-vintage laptop; doubling to 200 pages quadruples the runtime, not doubles it.

Second, the algorithm runs single-threaded in the main JavaScript thread. Web Workers can move it off-thread, but the cost is shipping the inputs across the postMessage boundary, which is itself slow on megabyte-scale strings. Past about a second of compute, the browser starts blocking the UI, which makes the tool feel broken even when it is working.

Third, the worst case is genuinely unbounded. Two completely different documents of the same length produce O(N²) work because every character pair is candidate for alignment. The browser hang detector kills tabs that block for 30+ seconds on slower hardware, which is the worst possible failure mode (the user loses their input). The cap is honest engineering: a partial diff with a visible banner is more useful than a frozen tab.

For larger documents, the workflow is to split into chapters or sections (the split tool does this in the browser, same privacy story), compare each piece, and re-assemble the JSON change reports. For a 400-page regulator circular, this is roughly 4 chapter-sized diffs and a manual stitch — still faster than a server-uploaded compare and still without the file ever leaving your machine.

One adjacent caveat: the tool requires the PDFs to have an extractable text layer. A pure scan with no OCR will show as empty in both panels and the diff will report no changes when in fact every word is different. Run the document through OCR PDF first to add a searchable text layer, then compare. Encrypted PDFs need to be unlocked first via redact-pdf adjacent tools — pdf.js refuses to read password-protected files until the password is supplied.

Both PDFs stay in your browser

The text extraction (pdf.js), the diff (diff-match-patch), and the JSON change report all run in your browser tab. Open DevTools and watch the Network tab — zero outbound requests carrying the contract.

Frequently asked questions

Does this compare PDF formatting changes too, or only the text?

Only the text. The tool extracts the text content stream from each PDF page using pdf.js and runs a character-level diff. Two PDFs that look visually different but contain the exact same words, punctuation, and order will show as no change. Differences in fonts, margins, line spacing, image placement, page breaks, or color are not detected. For pixel-level visual comparison you want a desktop tool with stable rendering — that is a separate problem the browser is not the right environment for.

What is in the JSON change report?

The report is structured per-page. Each page contains an array of ops, where each op is one of {type: 'equal', text}, {type: 'insert', text}, or {type: 'delete', text}. Insertions are present in the Revised PDF and absent in the Original; deletions are the opposite; equals are unchanged segments. The format is small enough to paste into an LLM for redline summarization, structured enough to feed into an automated review pipeline, and stable enough to archive next to the contract revision. The report is generated client-side and saved to your downloads — it never goes to a server.

Can I compare scanned PDFs without OCR first?

No. A pure scan stores each page as an image, and pdf.js extracts no text — both panels will show as empty and the diff will report no changes when in fact every word is different. Run the document through /ocr-pdf first to add a searchable text layer, then compare. The diff still works on the OCR'd output, with the standard caveat that OCR errors will register as spurious changes (an 'l' read as a '1' shows up as an edit). For high-stakes comparison of scanned documents, eyeball the OCR output first or compare two cleanly-OCR'd versions.

How does the page-by-page alignment work?

Each page in the Original is diffed against the same-numbered page in the Revised. If the two files have different page counts, extra pages on either side show as full inserts or deletes. The trade-off: if content moved across a page break — say, a paragraph that was on page 5 in the original is now on page 6 in the revised — the move shows as a delete on page 5 and an insert on page 6. The alternative (concatenating both PDFs into one big string) is worse: a single page break shifting on page 2 cascades as a giant insert/delete chain through every subsequent page.

What does the similarity percentage mean?

Similarity is the ratio of equal characters to total characters across the diff. A 1000-character document with 100 inserted, 50 deleted, and 850 unchanged scores 850 / (850+100+50) = 85% similar. Two identical documents score 100%. A document compared against an empty file scores 0%. The number is useful as a quick triage signal — 95%+ usually means a typo fix or a small clause change, 70-90% means a meaningful revision, below 50% suggests a different document version entirely. It does not weight by importance, so a single deleted clause and a hundred whitespace changes will affect it the same way.

Why is there a 100-page cap on the comparison?

The diff-match-patch algorithm is O(N · D) where N is the input length and D is the edit distance, and it runs on the main JavaScript thread. Past roughly 500,000 characters of combined text (about 100 pages of normal-density prose) the browser starts blocking the UI for visibly long stretches, and on slower hardware the tab can be killed by the browser's hang detector. The cap is honest engineering: a partial diff with a banner is more useful than a frozen tab. For larger documents, split into chapters and compare each piece separately, then re-assemble the report.

Is the comparison actually private — nothing uploaded?

Yes. Both PDFs are read into browser memory via pdf.js (the same renderer Firefox ships with), the text extraction runs locally, the diff is computed by diff-match-patch in the same JavaScript thread, and the JSON change report is built client-side. No file content is sent to any server. You can verify this in your browser's DevTools by opening the Network tab and watching for the absence of any POST or PUT carrying the file bytes. This matters specifically because most online PDF comparison tools — PDF24, Draftable, Adobe online compare — do upload, process server-side, and return a result.

How does this differ from comparing JSON files?

PDF compare runs a character/word/line text diff on the extracted text streams. JSON diff parses both files into structured objects first, then walks the trees and emits change operations against specific paths — reordering keys does not register as a change because JSON objects are unordered by spec. PDF text diff cannot do that because PDF text has no semantic structure beyond the page boundary. If your underlying data is JSON and you exported it as a PDF for comparison, diff the JSON instead via /blog/json-diff-compare-tool-online — it is faster and produces a more meaningful diff.

Related guides and tools