Remove Sensitive Info From a PDF Before Sharing
A black box is not redaction. The text stays underneath. Here is how to truly remove sensitive info from a PDF before sharing — in your browser, with no upload.
You need to share a PDF, but one line is a salary figure, an account number, or a name that cannot go out. The instinct is to draw a black rectangle over it and send the file. That instinct leaks data. To remove sensitive info from a PDF safely, you have to delete the content, not cover it — and then clear the hidden data the page never showed you in the first place.
This guide explains the exact mechanism behind the most common redaction failures, the three layers where confidential data hides, and a browser-local process that leaves nothing recoverable.
Why a black box is not redaction
A PDF page is not a picture. It is a set of instructions: place this text here, draw this line there, paint this rectangle on top. When you draw a black box over a Social Security number, you add one more instruction — paint a black rectangle — on top of the instruction that still says “draw the digits 123-45-6789.” The digits are untouched. Anyone can select the area, copy it, and paste the hidden text into a notepad. Delete the rectangle in an editor and the number reappears.
This is not theoretical. In January 2019, a court filing in the Paul Manafort case leaked redacted passages because the black bars were cosmetic — readers copied the text straight out from underneath. The same mistake has exposed names in unsealed depositions and figures in government reports for years. The US National Security Agency published a guide, “Redacting with Confidence,” precisely because covering text in a PDF viewer does not remove it.
The lesson is blunt: if the original characters are still in the file, the file is not redacted. It only looks redacted on screen.
You can prove this to yourself in ten seconds. Open any PDF where someone drew a black box over text, click just before the box, and drag your cursor across it as if selecting a sentence. If the hidden words highlight, they are still there — and if they highlight for you, they highlight for whoever you send the file to. Now paste into a plain text field and the supposedly removed content appears. That two-second test is the difference between a document that looks safe and one that is safe.
The three things that leak
Sensitive data hides in three layers of a PDF, and a black box addresses none of them.
- Covered text and images. The content stream still holds every character and picture you painted over. This is the copy-paste leak above.
- Metadata. Every PDF carries a document information block and XMP metadata: author name, the software that produced it, the title, and creation and modification timestamps. You can perfectly redact the body and still broadcast who wrote the file on which laptop at what time.
- Hidden structure. Form fields keep their stored values even when they look blank. Annotations, comments, layers, and incrementally saved revisions can retain earlier text. A “saved over” edit is not always gone.
Removing sensitive info means closing all three: delete the visible content for real, flatten the structure so fields and layers collapse into static page content, and strip the metadata.
First, know which kind of PDF you have
The right method depends entirely on one question: does your PDF have a real text layer, or is it a picture of a page? The answer decides whether a black box is enough.
- Scanned or image-only PDF (a photo or scan of a document — you cannot select any text with your cursor). There is no text layer underneath, so a black box painted over the sensitive area is genuine redaction. There is nothing in the file to copy back out. Bank statements, Aadhaar copies, and ID scans are very often this type, which is good news: covering them works.
- Digitally-generated text PDF (text is selectable). The characters live in the page’s content stream. A black box sits on top of them and the text stays recoverable. For this type, visual coverage alone is not safe.
To check: open the PDF and try to select a line of text with your mouse. If it highlights, you have a text PDF and must account for the text layer. If nothing selects, it is an image and coverage is enough.
How to remove sensitive info, layer by layer
Confidential data hides in three layers, and each needs its own step. Everything below runs in your browser, so the original never travels to a server.
- Cover the sensitive regions. Open the Redact PDF tool, mark the areas that must go, and apply. It bakes black rectangles into the page content so they survive printing, screenshots, and every viewer. On a scanned PDF this is true redaction — there is no text layer to recover. On a text PDF this is visual coverage: the box is permanent on screen, but the underlying characters remain in the stream. For legal-grade redaction of a text PDF (court filings, leak-resistant work), use desktop software that performs content-stream removal, or first export the page to an image to drop the text layer, then cover.
- Remove stored field values. If the document is a filled form, run it through Flatten PDF. It burns AcroForm fields, signatures, and annotations into static page graphics, which removes the stored values a counterparty could otherwise read out of a field that merely looks blank. (Flattening collapses form data; it does not rasterize body text, so it is not a substitute for step 1 on a text PDF.)
- Strip the metadata. Use Remove PDF Metadata to clear the author, title, producer software, and timestamps held in the document info block and XMP stream. This is the step most people forget, and it is how a leaked document gets traced back to a person and a machine.
- Re-open and verify. Open the finished file fresh. Try to copy text from the covered areas (on a text PDF, this is your proof of whether the content is truly gone), check the document properties for leftover metadata, and click where form fields used to be. Trust the file only when these come up empty.
Order matters. Cover first, then flatten form data, then strip metadata last so no pass re-stamps producer information. The sequence takes under two minutes for a typical document.
Where redaction quietly fails
The same mistake shows up across the documents people share most. Knowing the failure mode for each makes it easy to avoid.
- Bank statements. People highlight the balance and account number with a marker tool or a colored box, thinking it hides the value. The number is still selectable underneath. Worse, the statement’s metadata often names the bank’s statement-generation software and the exact issue date. Cover the figures (a scanned statement has no text layer, so coverage is real redaction; for a downloaded text statement, drop the text layer first), then strip metadata.
- Resumes and CVs. A resume saved from a word processor carries the author name and template source in its metadata. Candidates redact a previous employer’s confidential line but ship a file that names them in the document properties even when the visible header is anonymized.
- Signed contracts. Contracts are frequently filled as PDF forms. Blanking a field on screen does not clear its stored value, so a counterparty can read the “removed” figure by inspecting the form data. Flattening collapses the fields into fixed content and removes the stored values.
- Screenshots embedded in a PDF. A screenshot pasted into a document is an image. Drawing a box over part of it leaves the full image in the file; the covered pixels are recoverable. True redaction has to remove or re-rasterize the image region, not overlay it.
The thread connecting all four is the difference between what a viewer renders and what the file contains. A viewer paints the black box last and you see a clean page. The file still carries everything below it. That gap is the entire problem, and deleting content rather than layering over it is the entire fix.
Why this has to happen on your device
There is a quiet contradiction in redacting a confidential file with a tool that uploads it. The moment you send a bank statement to a server to “redact” it, the unredacted original has already left your control — it sat on someone else’s infrastructure, possibly logged, possibly cached, before a single mark was applied. The exposure you were trying to prevent happened during the upload.
Browser-local processing removes that contradiction. The file is read into the page, edited in memory using WebAssembly, and written back out, all on your machine. No version of the document — redacted or not — is transmitted anywhere. For a routine flyer that distinction is academic. For the documents that actually need redaction, it is the whole game.
The before-you-share checklist
Run this quick pass on any PDF leaving your hands with confidential content in it:
- Can you select or copy any text in the redacted areas? It must be impossible.
- Do the document properties still show your name, your software, or edit dates? Clear them.
- Were there form fields? Confirm they are flattened and hold no stored values.
- Did the file go through any server to be processed? For sensitive documents, it should not have — a browser-local tool keeps the original on your device.
For anyone handling bank statements, KYC documents, or identity proofs, this matters twice over. Indian financial and government forms routinely combine an account number, a PAN, and an address on the same page, so a single missed redaction exposes a full identity set. If you regularly share statements, our guide on whether online PDF tools are safe for bank statements walks through the upload risk in detail, and the broader PDF security guide covers passwords and encryption alongside redaction.
The principle under all of it is simple. A document is only as private as the data it still contains. Cover-ups look done; deletions are done. Remove the content, flatten the structure, strip the metadata, and verify — then the PDF you share holds only what you meant to send.
Your files never leave your browser
PDF Mavericks processes everything locally using WebAssembly. The confidential original is never uploaded to any server, which is the whole point when the document is the reason you are redacting in the first place.
Frequently asked questions
Does drawing a black box over text in a PDF remove it?
No. A black rectangle is a separate object drawn on top of the text. The original characters stay in the file and can be copied, searched, or revealed by removing the box. This is the most common redaction failure: in 2019 a court filing in the Manafort case leaked redacted passages because readers could select and copy the text hidden under the black bars.
What is the right way to remove sensitive info from a PDF?
It depends on the file. For a scanned or image-only PDF, covering the sensitive area with a baked-in black box is genuine redaction because there is no text layer to recover. For a digitally-generated text PDF, the characters live in the content stream, so true redaction means removing them at the stream level (desktop content-stream tools), or dropping the text layer by exporting the page to an image before covering. In both cases, also flatten any form fields and strip the document metadata.
Does a PDF carry hidden data beyond what I can see?
Yes. PDFs store metadata (author, creation tool, dates), and can carry form-field values, layers, annotations, and earlier revisions saved incrementally. You can blank a visible field and still ship its stored value. Flattening and metadata removal close these gaps.
Is it safe to redact a confidential PDF with an online tool?
Only if the file stays on your device. Most online redactors upload your document to a server, which defeats the purpose when the file is a bank statement, a contract, or a medical record. PDF Mavericks redacts entirely in your browser using WebAssembly, so the original never leaves your machine.
How do I remove the author name and metadata from a PDF?
Use a metadata removal step after redacting the visible content. It clears the document information dictionary and XMP metadata that hold the author, title, producer software, and timestamps. Doing this is a separate action from redaction, and skipping it is a frequent leak of who made the file and when.
Can covered text be recovered from a text PDF?
Yes, if the text layer is still there. A black box drawn over selectable text leaves the characters in the content stream, and flattening form fields does not remove them. The only ways to make text unrecoverable are stream-level removal with desktop redaction software, or converting the page to an image so there is no text layer left. On a scanned PDF the question does not arise, because there was never a text layer to recover.