PDF Tools
Privacy
No Upload

Remove PDF Metadata Online Free: Strip Author, Dates, and XMP

Remove PDF metadata online free in your browser — clear Info dictionary fields, delete the XMP stream, verify with exiftool. No upload, no signup, no leak.

PDF Mavericks·

A reporter at a national daily filed a 12-page PDF to her editor in early 2024, sourced from an anonymous government insider. The story ran. Two weeks later, the source was identified, suspended, and walked out of the ministry building. The leak vector was not the content — the content was careful — it was the Author field in the PDF's Info dictionary, which still carried the source's real name from the office workstation that had drafted the original memo. The reporter never opened the file properties; her editor never opened them; the recipient of the FOIA-style follow-up did. That is the leak pattern this guide is about, and the reason to remove PDF metadata online free before any document leaves your machine is exactly that: the visible content can be perfectly clean while the invisible properties broadcast who, when, and where.

The same vector hits lawyers forwarding contract drafts (the Producer field reveals the drafting firm and CreationDate the billable day), job applicants whose CV PDF carries the personal-machine username (arunoday-laptop as Author is a common screenshot on LinkedIn), and privacy-conscious users who learned the hard way that "remove personal info on save" is not the default in any major office suite. This guide walks through what PDF metadata is, the three things people get wrong about removing it, and how to do it correctly in your browser without the file ever leaving your laptop.

The journalist-and-source leak pattern

The pattern is documented across two decades of newsroom security training. The Tor Project's 2014 guidance to journalists explicitly calls out PDF metadata as a side-channel leak; SecureDrop's documentation has shipped a "strip metadata before publishing" checklist since 2015; the EFF's Surveillance Self-Defense module on file metadata has been live since 2017. The reason it keeps appearing in training is that it keeps producing real leaks: a 2003 UK government dossier on Iraq exposed the Word author chain through hidden metadata (the "dodgy dossier" episode); a 2017 leak from the National Security Agency to The Intercept reportedly identified the leaker via printer dot-pattern metadata embedded in a scanned PDF; multiple recent litigation discovery battles have hinged on Producer / CreationDate fields revealing draft chronology counsel preferred to keep private.

The metadata fields are not bugs. They are working as designed: Adobe and the ISO 32000 PDF specification both require an Info dictionary, and modern PDF producers write XMP because workflow systems (digital-asset-management tools, archival systems, redaction audit trails) need structured metadata. The leak is the gap between that specification-mandated visibility and the user's mental model of "the document is what I see on screen." Stripping metadata closes the gap.

What metadata actually lives in a PDF

Three layers, in roughly the order PDF readers and forensic tools inspect them.

1. The Info dictionary

The original PDF 1.0 metadata format. A top-level dictionary in the document trailer with eight standard keys: /Title,/Author, /Subject, /Keywords,/Creator (the application that created the source content, e.g. "Microsoft Word for Office 365"), /Producer (the application that wrote the PDF, e.g. "Adobe PDF Library 17.0"), /CreationDate, and /ModDate. Custom keys can also be added by any producer — Word adds/Company, LaTeX often adds /PTEX.Fullbanner with the exact TeXLive version. Every PDF reader displays the Info dict in File > Properties.

2. The XMP metadata stream

Adobe's Extensible Metadata Platform, introduced in 2001 and standardized as ISO 16684. Stored as an XML stream attached to the document catalog under /Metadata. XMP can carry far more than the Info dict: full edit history (every save event, the application, the user, the timestamp), GPS coordinates (when a phone scan is converted to PDF), thumbnail images, language tags, rights metadata, and arbitrary application-specific extensions. A 2-page Word export typically has 2-4 KB of XMP; an InDesign export can carry tens of kilobytes. exiftool reads XMP comprehensively — most surprises come from this layer.

3. Per-object metadata

Less commonly cleaned and harder to reach. Each font object can carry its own metadata stream (foundry, license, embedded charset). Each image object inside the PDF can carry the original EXIF block from the camera that shot the photo, including GPS, lens model, and exposure settings. Page-level metadata streams attach to individual pages. Browser-based tools (including this one) generally do not walk per-object streams; for hostile-counterparty scenarios, run the cleaned PDF throughqpdf --object-streams=disable followed by exiftool -all= on the command line for a deeper sweep.

Three things people get wrong about removing metadata

  1. "File > Properties > clear" in Acrobat is enough. It clears the visible Info dict fields shown in the dialog. It does not touch the XMP stream. Save, reopen with exiftool, and the Author / Producer / EditingTool fields from XMP still report the original values. The Acrobat sanitize-document workflow (Tools > Redact > Sanitize Document) does clear XMP, but it is behind a paid Acrobat Pro license and is two menu levels deeper than most users find.
  2. "Print to PDF" strips metadata. Print-to-PDF writes a fresh PDF from the rendered pages, which removes the original Info dict — but the printer driver populates a new Info dict with your machine's Author and Producer values. So the source's name is gone, but yours is now in. For a journalist forwarding a leaked document this is worse than the original leak because it points the audit trail at you instead of the source.
  3. "Online metadata remover" is automatically private. Most online tools upload the PDF to a server, process it there, and return a cleaned copy. The original — with all the metadata you wanted gone — sits on the provider's machine until they decide to delete it. The November 2025 jsonformatter.org and codebeautify.org incident exposed about 5GB of user-pasted secrets for exactly this reason: users assumed a free online tool was stateless. PDF metadata removal carries the same risk pattern. The only safe online tool is one where the bytes never leave the browser.

Three approaches: command-line, desktop, browser

Command-line: exiftool and qpdf

The reference workflow for security-conscious users. Two commands clear the document-level metadata layers:

# Strip Info dict and XMP in one pass
exiftool -all= -overwrite_original yourfile.pdf

# Optional deeper sweep: rebuild with object streams disabled
qpdf --object-streams=disable yourfile.pdf cleaned.pdf
exiftool -all= -overwrite_original cleaned.pdf

exiftool is free (Perl, BSD-licensed) and runs offline. qpdf is free (Apache 2.0) and ships in every major Linux distribution. The trade-off is the install: exiftool needs a Perl runtime, qpdf needs a C++ build or a package install, and Windows users typically reach forscoop install exiftool or chocolatey rather than figure out the manual install. For one-off tasks the install friction is the blocker.

Desktop: Acrobat Pro and BatchPurifier

Adobe Acrobat Pro's Sanitize Document feature (Tools > Redact > Sanitize Document > Remove Hidden Information) clears Info dict, XMP, embedded thumbnails, JavaScript, hidden text, and form history in one pass. It is the most thorough commercial option. The cost is the Acrobat Pro license (USD 19.99/month at retail). For organizations that already have Acrobat Pro deployed this is the right tool; for everyone else the license is the blocker.

BatchPurifier LITE for PDF is a free Windows desktop tool that handles the Info dict and XMP for single files. It does not handle per-object streams and the UI is dated, but it works offline and costs nothing.

Browser: PDF Mavericks

The PDF Mavericks remove PDF metadata tool runs entirely in the browser tab using pdf-lib (MIT-licensed, JavaScript). The PDF is read into browser memory via the File API, the Info dictionary fields are cleared to empty strings, the XMP metadata stream object is deleted from the document catalog when present, and a fresh PDF is written using the same pdf-lib serializer that Vercel, Stripe, and Notion use for their server-side PDF generation. The difference here is that the serializer runs in your browser, not on a server. No upload, no signup, no account required.

Why running it in your browser matters

The case study every privacy-conscious developer should know is the November 2025 jsonformatter.org incident. Security firm watchTowr Labs disclosed that jsonformatter.org and codebeautify.org had been storing user submissions for years; the exposed archive totalled about 5GB across 80,000+ files and included AWS keys, JWTs, internal endpoints, and customer records from banks, government agencies, and Fortune 500 firms. Users believed they were using a stateless tool; they were not. The exposure surface extended to anyone who had ever pasted sensitive JSON in.

PDF metadata removal carries the same risk pattern with worse stakes, because the entire reason you are reaching for the tool is that the document content (not just the metadata) is sensitive. A contract draft, a leaked memo, a redacted court filing — these are exactly the documents you do not want sitting on a third-party server while a queue worker processes them. The promise that "files are deleted within an hour" is unverifiable; the bytes are out of your control from the moment of upload.

Browser-local processing fixes this. The PDF Mavericks tool reads the file via the browser's FileReader API, processes it with pdf-lib in the same JavaScript thread that runs the page, and writes the cleaned PDF to your downloads folder. Nothing is uploaded. You can verify this in DevTools by opening the Network tab and watching for the absence of any outbound request carrying the file content. The same architecture protects the JSON tools we built after that incident — see Never paste API keys into a JSON formatter for the longer write-up of why "processed locally" matters.

Step-by-step: remove PDF metadata online free

  1. Open the tool. Visit pdfmavericks.com/remove-pdf-metadata. The page loads in under a second on a normal connection — no signup, no email gate, no popup.
  2. Drop the PDF in. Drag the file onto the drop zone or click to pick it from your file dialog. The browser reads the bytes into memory locally; no upload happens.
  3. Review the metadata report. The tool inspects the file and lists what it found: Title, Author, Subject, Keywords, Producer, Creator, CreationDate, ModDate, and whether an XMP stream is present. This is the exact set of values that would be exposed to the recipient. Read it before clicking Remove — you will sometimes find values you did not know were there (a forgotten keyword from a template, a Subject string copied from a previous version).
  4. Click Remove metadata. Each Info dictionary field is cleared to an empty string. The XMP metadata stream object is deleted from the document catalog when present. Page content, form fields, bookmarks, and digital signatures (where intact) are preserved. The processed PDF stays in browser memory.
  5. Download the cleaned PDF. The download is a fresh file written by pdf-lib, ready to share. The original on your disk is untouched.
  6. Verify. Run the verification step below before forwarding the PDF to anyone whose chain of custody matters.

How to verify the metadata is gone

Trust nothing about a metadata-removal tool that you have not verified yourself. The verification step takes 15 seconds and uses tools your recipient will use anyway.

If you have exiftool installed:

exiftool -all -s yourfile-cleaned.pdf

# Expected output (Info dict and XMP cleared):
# ExifToolVersion : 12.85
# FileName        : yourfile-cleaned.pdf
# FileSize        : 124 KB
# FileType        : PDF
# PDFVersion      : 1.7
# Linearized      : No
# PageCount       : 12
# (Title, Author, Subject, Keywords, Producer, Creator,
#  CreateDate, ModifyDate should all be absent or empty)

If you don't have exiftool: open the PDF in any reader (Acrobat, Preview, the browser PDF viewer, Foxit) and check File > Properties > Document Properties. The Author, Title, Subject, and Keywords rows should all be blank. The Producer row may show "pdf-lib" — that is the writing library, not your name, and is acceptable to leak (it is the same string Notion and Stripe leave on every PDF they generate).

If you want a deeper sweep: the post-processing command-line pass clears anything pdf-lib could not reach.

qpdf --object-streams=disable yourfile-cleaned.pdf rebuilt.pdf
exiftool -all= -overwrite_original rebuilt.pdf

For a contract going to opposing counsel or a leaked document going to a publisher, run the deeper sweep. For a CV or a meeting handout, the browser tool alone is sufficient.

One more thing: if the document was already shared before metadata removal, treat that as a live leak. Generate a clean version, send it as a replacement with a brief acknowledgment, and audit your export workflow so the leak does not recur. The CV-with-personal-username case has been a documented vector for over twenty years and almost every major office suite ships a checkbox to default-strip metadata on save — none of them turn it on by default.

Your PDF never leaves your browser

PDF Mavericks reads, processes, and writes the cleaned PDF entirely in your browser tab using pdf-lib. No upload, no account, no logs. Open the Network tab in DevTools and confirm the bytes never go out.

Frequently asked questions

What is the difference between the PDF Info dictionary and XMP metadata?

The Info dictionary is the original PDF 1.0 metadata format — a small set of fixed keys (Title, Author, Subject, Keywords, Producer, Creator, CreationDate, ModDate) stored as a top-level dictionary in the document trailer. XMP (Extensible Metadata Platform) is the newer Adobe format introduced around 2001, stored as an XML stream attached to the document catalog. Modern producers like Acrobat, Word, and InDesign write both, and the values can drift apart. Removing only the Info dict leaves XMP behind, and an auditor running exiftool will still see your name. A complete removal clears both layers.

Can I remove metadata from a digitally signed PDF?

No. A digital signature covers a specific byte range of the PDF, and rewriting any part of the document — including the metadata fields — invalidates the signature. If signature integrity matters for your workflow, the order has to be: strip metadata first, then sign. Stripping metadata from a signed PDF is technically possible but the recipient's verifier will flag the signature as broken, which defeats the point of signing. The same rule applies to certifying signatures and timestamp tokens.

Does this remove metadata from embedded fonts and images?

Best effort. The tool removes the document-level Info dictionary and the XMP stream attached to the document catalog. Per-object metadata streams attached to individual page objects, embedded fonts, or embedded images are not walked by pdf-lib and may persist. For most leak scenarios this does not matter — auditors and recipients run exiftool against the document, not against every embedded object — but if you have a hostile counterparty with forensic tools, treat the output as best-effort and run a desktop sweep with qpdf or Acrobat Pro Redact afterward.

Why does some XMP metadata resist removal in some PDFs?

Some PDF producers write the XMP stream as an indirect object referenced from multiple places in the document tree, or wrap it in a cross-reference stream that pdf-lib does not fully traverse. In those cases the tool removes what it can cleanly delete and surfaces a warning rather than silently leaving metadata behind. The fallback is exiftool from the command line: 'exiftool -all= yourfile.pdf' clears every metadata tag it can identify, including locations the browser tool cannot reach. For 95% of typical office PDFs this is not a problem.

Will removing metadata change the file size or visual appearance?

Visual appearance is preserved exactly — page content streams, fonts, images, layout, and form fields are not modified. File size shrinks slightly because the Info dict entries and the XMP XML are removed, but the savings are usually 1-10 KB. Page content streams are not recompressed, so on a 5 MB scanned PDF the percentage change is negligible. If you also need to shrink the file before sharing, run /compress after removing metadata.

How do I verify the metadata is actually gone?

Run exiftool against the cleaned file. The command 'exiftool -all -s yourfile.pdf' lists every metadata tag exiftool can find — after a successful strip, the Info dictionary fields should be empty and the XMP-specific tags should be absent. Alternatively 'pdfinfo yourfile.pdf' from the poppler toolkit shows the document properties; Title, Author, Subject, and Keywords should be blank. Both tools are free, both ship in standard Linux distributions, and both produce output your recipient can reproduce independently.

Is it safe to remove PDF metadata online?

Only if the tool runs locally in your browser. Most online metadata removers upload your PDF to a server, process it there, and send back the cleaned file — which means the original (with all the metadata you are trying to remove) sits on a third-party machine until they decide to delete it. The November 2025 jsonformatter.org incident showed how badly this can go: about 5GB of user-pasted secrets were exposed because users assumed a free tool was stateless when it was not. PDF Mavericks runs entirely in your browser tab, so the original PDF and the cleaned version both stay on your laptop.

What should I do if my PDF was already shared with metadata in it?

Three steps. First, generate a clean version with metadata stripped and send it as a replacement, ideally with a short note acknowledging the original carried identifying information. Second, if the metadata exposed a source whose identity is sensitive, treat the leak as live — assume the recipient may have already noted the Author field. Third, audit your PDF export workflow so the leak does not recur: most office software lets you set 'remove personal information from properties on save' as a default. The CV-with-personal-username case has been a documented leak vector for over twenty years.

Related guides and tools