PDF Tools
Privacy
OPSEC
No Upload

Remove PDF Author and Creation Date: Metadata Sanitization

How to remove pdf author and creation date plus GPS coordinates, software fingerprint, and revision history before sharing. Browser-local sanitization for OPSEC, legal redaction, and source protection.

PDF Mavericks·

What gets stored in PDF metadata

Every PDF file is a content layer and a metadata layer. The content is what the reader displays: pages, text, images. The metadata is what the file says about itself: who wrote it, when, with what software, on what computer, and at what coordinates. The metadata is invisible during normal reading but trivial to extract with any inspector — Adobe Acrobat's Document Properties dialog, the open-source ExifTool, or a basic hex viewer.

For most documents this metadata is harmless. For a subset of documents — anything shared outside a small trust boundary — the metadata is a leak vector. A PDF forwarded to opposing counsel that retains the drafting attorney's name. A whistleblower document posted publicly that retains the author's username from the source machine. An anonymous research publication that retains a phone photo with embedded GPS coordinates from the author's home. Each of these has happened, and each is preventable with a basic metadata strip before sharing.

The six categories of metadata that a typical PDF carries are: Info dictionary fields (author, dates, software), XMP packet (the same fields in XML form plus Adobe extensions), embedded image EXIF (GPS, camera, timestamp), embedded font subset names, document properties (sometimes including save paths), and revision history for incrementally-edited PDFs. Each needs handling separately; clearing one without the others leaves obvious holes.

Real metadata-leak incidents

Metadata leaks are a documented category of OPSEC failure. Three cases illustrate the pattern.

The 2003 Iraq Dossier. The UK government published a PDF dossier titled "Iraq: Its Infrastructure of Concealment, Deception and Intimidation" that retained Microsoft Word metadata revealing the document had been edited by named civil servants — including a researcher whose prior work was a graduate student thesis. The Guardian, BBC, and academic literature on document forensics cite this as the modern textbook case for metadata sanitization. Plenty of secondary references exist; one accessible academic treatment is the Wikipedia summary of the Dodgy Dossier incident with citations to the original news reporting.

McAfee's 2013 target identification. McAfee published a research PDF identifying a target of a security operation. The PDF retained the Author field naming the McAfee employee who wrote it. This is the canonical example used in digital-forensics training to motivate metadata stripping in adversarial publications, and the incident was widely covered in the security press at the time.

EFF's printer tracking dots research. Separately, the Electronic Frontier Foundation has documented at eff.org/issues/printers that colour laser printers embed nearly-invisible yellow dots in printed pages encoding the printer serial number and date. When those pages are scanned back to PDF, the tracking dots survive and provide an extra metadata-like trail beyond anything in the file structure. This is not a PDF metadata field, but it is part of the same OPSEC consideration: stripping file-level metadata is necessary but does not address physical-printer trails.

The pattern across these cases: the document content was reviewed before release, the metadata was not, and the metadata produced the story.

The two metadata layers: Info and XMP

PDFs carry metadata in two parallel formats inside the same file. Both need to be cleared.

The Info dictionary. This is a top-level PDF dictionary referenced from the trailer. The standard keys are /Author, /CreationDate, /ModDate, /Title, /Subject, /Keywords, /Producer (the software that wrote the file), and /Creator (the source application like Word or InDesign). The PDF specification ISO 32000-1 §14.3.3 defines the Info dictionary. Clearing it means either removing the dictionary entry from the trailer or setting each field to an empty string.

The XMP packet. This is an XML metadata block, typically attached as a stream object in the file. XMP — Extensible Metadata Platform, an Adobe standard adopted as ISO 16684-1 — carries the same fields as the Info dictionary plus extended Dublin Core (dc:creator, dc:title, dc:description), Adobe-specific fields (xmp:CreatorTool, xmp:CreateDate, xmp:ModifyDate), and any custom schemas the source application added. Clearing the XMP packet means rewriting the XML to an empty or sanitized template. The XMP spec is at adobe.com/devnet/xmp.html.

The most common mistake — and the source of multiple real incidents — is to clear only one of the two layers. A user clears the Info dictionary using Acrobat's Document Properties dialog, ships the file, and the XMP packet still contains the original author and creation date. The pdfmavericks.com /remove-pdf-metadata tool clears both layers in a single operation and verifies the output with a built-in ExifTool-equivalent inspector.

Embedded image EXIF (the missed step)

PDFs that contain embedded photos carry the photos' metadata as well. When a phone camera takes a photo, the JPEG file written to disk includes EXIF data: GPS latitude/longitude (if location services were on), camera make/model/serial, capture timestamp, and sometimes the device's serial number. The ExifTool documentation enumerates every EXIF tag and how to inspect them.

When that photo is embedded in a PDF — a scanned document, a screenshot, an ID card image — the EXIF rides along inside the image stream. Stripping the PDF-level metadata does nothing to the image-level EXIF. The forensic inspector opens the PDF, extracts the embedded JPEG, runs ExifTool, and gets the GPS coordinates of where the photo was taken — often the author's home or office.

The /remove-pdf-metadata tool offers a Strip image EXIF option that walks every embedded image, re-encodes it without the EXIF block, and writes the cleaned image back into the PDF. For documents made from phone photos, this step is essential. For documents that contain only text and vector graphics (no embedded raster images), the step is a no-op but harmless.

Step-by-step sanitization

  1. Open the metadata tool. Navigate to pdfmavericks.com/remove-pdf-metadata. The page loads in the browser. No account, no upload.
  2. Drop the PDF. Drag the PDF into the upload zone. The tool reads the file into browser memory and runs a metadata inspection. The current metadata is displayed: author, creation date, modification date, producer, creator, plus any XMP custom fields, plus a count of embedded images with EXIF.
  3. Pick the strip mode. Default mode clears all standard fields. Advanced mode lets you keep specific fields (sometimes the title or subject is worth keeping for archival) while clearing the rest.
  4. Decide on image EXIF. Toggle Strip image EXIF on if the PDF contains photos. If the PDF is text-only, leave it off (no effect either way).
  5. Run the strip. Click Sanitize. The tool rewrites the Info dictionary to empty strings, clears the XMP packet, optionally re-encodes embedded images, and writes a fresh PDF. The operation takes 1-5 seconds for typical documents.
  6. Verify the output. The result panel shows a fresh metadata inspection on the cleaned PDF. All fields should be empty. Image EXIF count should be zero if the strip mode was on.
  7. Save the sanitized PDF. Click Save. The cleaned file is delivered through the browser's Save dialog with a default filename like document-sanitized.pdf.
  8. External verification. For high-stakes sanitization, verify independently with ExifTool: exiftool document-sanitized.pdf. The output should show empty or absent fields for all the categories you intended to clear.

OPSEC scope: when strip is not enough

Metadata stripping handles the file-level signals. For full OPSEC — anonymous whistleblowing, source protection, adversarial publication — additional steps are often needed because there are signals in the document content itself that survive metadata stripping.

Typography. Rare or licensed fonts narrow down the source system. A document set in a corporate font that only one organization licenses identifies that organization. The mitigation is to use widely-available fonts (default system fonts, common Google Fonts) or to re-render the document as flat images.

Stylometric analysis. Author identification based on writing style — sentence length distribution, function-word frequencies, distinctive phrasings — is a documented forensic capability. The mitigation is to edit the document stylistically before release, often by a different person or with the help of an anonymizing rewrite. The SecureDrop project publishes guidance on stylometric-aware document handling for whistleblowers.

Printer tracking dots. Colour laser printers embed near-invisible yellow dots encoding the printer's serial number and the print date. If the sanitization workflow involves printing and re-scanning, the tracking dots survive. Document this risk in the workflow and use a printer model known not to embed dots, or use black-and-white printing if the dots are colour-only.

Re-render to images. The strongest sanitization for adversarial publication is to render every page of the cleaned PDF as a flat PNG image (effectively a screenshot of each page), then assemble those images into a fresh PDF. This destroys any embedded structure that survived the metadata strip: interactive form fields, JavaScript, layered content, and any incremental update history. The result is bigger and not searchable, but it is the closest a PDF gets to plain pictures.

Compliance framework alignment

Several formal frameworks require or recommend metadata sanitization for externally-shared documents.

NIST SP 800-88r1. The U.S. National Institute of Standards and Technology's Guidelines for Media Sanitization, published at nvlpubs.nist.gov, addresses metadata as part of the broader sanitization scope when documents are released outside an organization's control boundary.

EU GDPR. Articles 25 (data protection by design and by default) and 32 (security of processing) implicate metadata handling when documents containing personal data are shared externally. If the metadata contains personal data (author names, machine identifiers tied to individuals), GDPR's data-minimization principle (Article 5(1)(c)) supports stripping before release.

India's DPDP Act 2023. Section 8(7) requires reasonable security safeguards for personal data processing, which extends to metadata-stripping when documents containing personal data are shared with third parties. For Indian organizations under DPDP, automated sanitization as part of the document-release workflow is the typical control.

U.S. DoD declassification guidance. DoD 5220.22-M and related guidance address metadata in declassification workflows, particularly the requirement that derivative classification markings and other internal-use metadata be sanitized before public release.

Why it runs in your browser

The whole point of metadata sanitization is to limit who sees the document. A sanitization tool that requires uploading the document to a third-party server defeats its own purpose — now the user has handed the original (with metadata intact) to a vendor whose data-handling policies are outside the user's control. For whistleblowing, OPSEC, and high-stakes legal work specifically, this failure mode is not acceptable.

The pdfmavericks.com /remove-pdf-metadata tool runs entirely in the browser using pdf-lib for PDF structure manipulation and a WebAssembly build of standard EXIF-strip libraries for image cleanup. The file bytes never leave the browser tab. You can verify this in DevTools Network tab: open it, enable Preserve log, run a strip, and confirm no POST or PUT request contains the file data.

For broader context on browser-local tools and verification, see the no-upload PDF tool overview and the GDPR-compliant redaction guide for the closely-related redaction problem.

Your file never leaves your browser

Metadata stripping runs locally via pdf-lib and a WebAssembly EXIF-strip library. No upload, no account, no retention.

Frequently asked questions

What metadata does a typical PDF contain that I should remove before sharing?

A typical PDF carries six categories of embedded metadata. The Info dictionary holds author name, creation date, modification date, title, subject, keywords, producer (the software that wrote the PDF), and creator (the source application). The XMP packet at the end of the file holds the same fields in XML form plus extended Adobe fields. Embedded images carry their own EXIF — GPS coordinates if the photo came from a phone camera, the camera serial number, and the camera software version. Embedded fonts can carry subset names that reveal the original font and sometimes user information. Document properties may include the path the file was saved from. Revision history in PDFs that have been edited incrementally can contain prior versions of the content. The ExifTool documentation at exiftool.org enumerates every metadata field a PDF can carry.

How do I remove pdf author and creation date specifically?

Two layers need attention because PDF metadata is duplicated in two formats inside the same file. First, the Info dictionary — a top-level PDF dictionary with keys /Author, /CreationDate, /ModDate, /Title, /Subject, /Keywords, /Producer, and /Creator. Setting these to empty strings (or removing them entirely from the dictionary) clears the legacy metadata. Second, the XMP packet — an XML metadata block that may contain xmp:CreateDate, xmp:ModifyDate, dc:creator, dc:title, and pdf:Producer. Clearing only the Info dictionary leaves the XMP packet intact, which is the most common mistake people make. The pdfmavericks.com remove-pdf-metadata tool at /remove-pdf-metadata clears both layers simultaneously and verifies that the output PDF returns empty values when queried by ExifTool. ISO 32000-1 §14.3 (Metadata Streams) is the spec reference.

Why is PDF metadata a real privacy risk, not a theoretical one?

Three well-documented incidents make the case. In 2003, a Tony Blair government dossier titled the Iraq Dossier was discovered to contain Microsoft Word metadata revealing the document had been edited by named civil servants — the Guardian and BBC covered this at the time and the incident is documented in the academic literature on metadata leaks. In 2013, McAfee published a report on a target's identity and the PDF retained the author field naming the McAfee employee — covered in the digital-forensics community as a textbook example of OPSEC failure. The 2013 New York Times reporting on the Steubenville rape case noted that defense documents released as PDFs contained metadata revealing the law firm's internal author names. Each of these is a case where the document content was reviewed but the metadata was not, and the metadata produced a story.

Is the metadata strip enough to anonymize a document for whistleblowing?

It is necessary but not sufficient. Stripping the Info dictionary and XMP packet removes the obvious fields — author, creation date, software fingerprint, GPS-tagged images. What remains are subtler signals: the typography (rare fonts narrow down to specific systems), the writing style (stylometric analysis can fingerprint authors), the page geometry and printer-specific artifacts (yellow-dot printer tracking codes documented at eff.org/issues/printers), and any embedded images that were not also stripped of EXIF. For high-stakes anonymous publication — leaking to a journalist, posting to a public forum — additional steps are needed: re-render the document as flat images, then re-create the PDF from those images so any embedded structure (fonts, layers, scripts) is destroyed. The SecureDrop project at securedrop.org publishes the operational guidance for whistleblower-grade document handling.

Does the remove pdf metadata online tool upload my file to a server?

No. The pdfmavericks.com /remove-pdf-metadata tool runs entirely in the browser. The file is read via the File API into a Uint8Array, the metadata is rewritten in memory using pdf-lib (documented at pdf-lib.js.org), and the cleaned PDF is delivered to the user's disk through the browser's Save dialog. There is no upload to pdfmavericks. You can verify this in DevTools Network tab by enabling Preserve log and running a metadata strip — there will be no POST or PUT request containing the file bytes. For documents the user is sanitizing precisely because they don't want third parties to see them (leaked drafts, whistleblower materials, redacted-but-not-flattened legal filings), uploading to any third-party server defeats the purpose.

What about images embedded inside the PDF — do they have metadata too?

Yes, and this is the most commonly missed step. Photos taken on a phone carry EXIF data: GPS latitude/longitude, camera make/model/serial, capture timestamp, and sometimes the device's serial number. When those photos are embedded in a PDF, the EXIF rides along inside the image stream. Stripping PDF-level metadata does not touch image-level EXIF. The /remove-pdf-metadata tool offers an optional Strip image EXIF mode that walks every embedded image, re-encodes it without the EXIF block, and writes the cleaned image back into the PDF. For documents containing phone photos (scanned documents, screenshots with location data, ID card scans), this step matters. ExifTool documentation at exiftool.org shows how to inspect embedded image metadata to verify the strip worked.

Will stripping metadata change how the PDF looks when opened?

No. Metadata is bookkeeping data about the file; it has no effect on rendering. The pages display identically before and after. The visible content — text, images, page layout, fonts — is untouched by the metadata strip. What changes is what a forensic tool like ExifTool or Adobe Acrobat's Document Properties dialog reports about the file. Before: full author/date/software fingerprint. After: empty fields. For workflows where the PDF will be visually reviewed (printing, sending to a counterparty, posting), the strip is invisible. For workflows where the PDF will be forensically examined (litigation discovery, security audit, OSINT analysis), the strip removes the trail.

What other PDF privacy steps should pair with the metadata strip?

Four pair naturally. First, flatten form fields — if the PDF has interactive form fields, their default values may carry user-entered data from prior fills; flatten converts them to static content. The flatten-pdf tool at pdfmavericks.com/flatten-pdf handles this. Second, redact properly — if there are visual blocks over text that the user thinks are redactions, those blocks may be removable in Acrobat to reveal the text underneath. The redact-pdf tool replaces the actual text with blocks rather than just covering it. Third, remove annotations and comments — these often carry author identities. Fourth, re-scan if the source is uncertain — for highest stakes, print the cleaned PDF, scan it back to a fresh PDF, and ship the scan. The scan destroys any structural metadata that survived the strip.

Are there compliance frameworks that require pdf metadata sanitization?

Several, in varying degrees of specificity. NIST Special Publication 800-88 (Guidelines for Media Sanitization), published at nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-88r1.pdf, addresses metadata as part of the broader sanitization scope for documents released outside an organization. The U.S. Department of Defense DoD 5220.22-M handling guidance addresses metadata in declassification workflows. EU GDPR Article 25 (data protection by design) and Article 32 (security of processing) implicate metadata handling when documents containing personal data are shared externally. India's DPDP Act §8(7) similarly requires reasonable security practices for personal-data handling. For organizations under any of these frameworks, automated metadata sanitization as part of the document-release workflow is the standard control.

What if I just want a quick anonymized PDF without learning the details?

The default mode of the /remove-pdf-metadata tool does the right thing without configuration: it clears the Info dictionary, clears the XMP packet, strips image EXIF, removes the producer and creator fields, and rewrites the file with a fresh xref. The output PDF has no identifying metadata when inspected by ExifTool or any equivalent tool. For users who don't need the configurability — most users, most of the time — drop the PDF, click Strip Metadata, save the output. The advanced mode exposes per-field controls for users who want to keep some fields (e.g., the title for archival purposes) while clearing others. Both modes run in the browser without upload.

Related guides