Question 1

Which image formats does the tool extract?

Accepted Answer

It extracts every embedded image stream pdf.js can decode — typically JPEG, PNG, and raw RGB / RGBA bitmaps. In 'Original' output mode, JPEG and PNG streams are written byte-for-byte into the ZIP without re-encoding, so you keep full quality. In PNG or JPG mode, every image is decoded into a canvas and re-encoded uniformly. JBIG2 and CCITT-encoded scans (common in old fax-style PDFs) decode when pdf.js's bundled decoder supports them, and are reported as skipped otherwise.

Question 2

Why does my PDF show 0 images detected?

Accepted Answer

Three reasons cover most cases. First, the PDF is text-only — what looks like a scanned page might actually be vector text, not a raster image. Second, your minimum-size filter is excluding everything (try lowering it from the default 100px to 0). Third, the page content uses vector drawing operators rather than image XObjects — for example, charts in a financial report are often vector paths, not images, so there's nothing for an image extractor to pull. Vector graphics need a PDF → SVG converter, not an image extractor.

Question 3

Why a single ZIP instead of individual file downloads?

Accepted Answer

Browsers throttle or block automatic multi-file downloads as a security measure — extracting a 30-image PDF would trigger a browser confirmation dialog for every file, or the downloads would silently fail after the first few. Packaging the lot into one ZIP sidesteps that, gives you a single click to save, and keeps related images together. Most operating systems unzip in place with a double-click, so there's no real friction added.

Question 4

Is the PDF uploaded anywhere?

Accepted Answer

No. The PDF is read directly into your browser's memory using pdf.js, decoded locally, and the ZIP is built locally with JSZip. Nothing is sent to a server — there's no upload step, no temporary cloud copy, and no analytics on the file's contents. We log a tool-started and tool-completed event with image counts and timing for our own diagnostics; the file itself, the images inside it, and the resulting ZIP never leave your device.

Question 5

Will the extracted images match the original quality?

Accepted Answer

In 'Original' output mode, yes — when an embedded JPEG or PNG stream is found, those bytes are dropped into the ZIP unchanged, with zero re-encoding loss. That's the recommended setting for medical records, archival, or anything where pixel-level fidelity matters. If you choose 'Convert all to PNG' or 'Convert all to JPG', every image goes through a one-time canvas re-encode. PNG re-encoding is lossless; JPG re-encoding uses quality 0.92 — visually indistinguishable for most content but technically a generation-loss step.

Question 6

What about the file size limit?

Accepted Answer

Soft limit around 100 MB for browser memory reasons. Above that, the operator-list scan and image decoding can run slowly on older devices, especially mobile. The actual ZIP-build step (JSZip with DEFLATE level 6) is light. If a large PDF stalls during pre-scan, try a smaller test file first to confirm the tool works on your browser, then come back for the big one.

Question 7

Can I extract images from password-protected PDFs?

Accepted Answer

No — encrypted PDFs need to be decrypted before pdf.js can read the image streams. Use the unlock-pdf tool (also browser-local) to remove the password first, then bring the unlocked PDF here. We won't pretend to extract from a locked file because the underlying image data isn't accessible until the encryption is stripped.

Question 8

Why do some images appear duplicated, page after page?

Accepted Answer

PDFs commonly reuse one image XObject across multiple pages — a logo in a header, a watermark, a signature stamp. pdf.js sees that as a separate paintImageXObject call on each page, so the extractor writes one copy per page that uses it. We dedupe within a single page, but cross-page reuse is intentional: if your goal is to recover every visible image position, you want all copies. If you only want unique source images, deduplicate the ZIP afterward by file hash.

When to extract images from a PDF

How the extraction actually works

Privacy posture

Comparison with upload-based competitors

Common failure modes and how we handle them

Frequently asked usage questions

Related tools on PDF Mavericks

Related Tools