PDF to Markdown

Convert a text-based PDF to clean Markdown — headings, lists, and paragraphs detected automatically. Everything runs in your browser. Your file never leaves your device.

Drag & drop your PDF here

or browse files

Single file · PDF supported

Images and complex layouts are simplified — review the output. Scanned (image-based) PDFs require OCR and are not supported in v1. OCR support is planned for v2.

Markdown output

What this tool does

PDF is a presentation format, not a document structure format. Converting it to Markdown means recovering the structure — headings, lists, paragraphs — from positional clues in the text layer. This tool uses PDF.js to extract per-character position and font size data, then applies heuristics: text significantly larger than the median body font becomes a heading, lines starting with bullet characters become list items, and unusual vertical gaps become paragraph breaks.

The output is GitHub-flavored Markdown compatible — paste it into any .md file, Notion, Obsidian, or a RAG pipeline. It works well for research papers, reports, and structured documents. Heavily designed marketing PDFs with multi-column layouts will produce messier output that needs review.

Why no server?

Your PDF never leaves your device. The entire conversion runs client-side using PDF.js in your browser. This matters for confidential documents — salary slips, medical records, NDA drafts, research data. Competitors that process PDFs server-side retain a copy, even briefly. We never see it.

Common questions

Does the PDF to Markdown converter preserve formatting?

It preserves text-based structure — headings are detected from font size changes, lists are identified by bullet or numbered prefixes, and paragraph breaks follow vertical-gap thresholds. Complex multi-column layouts and decorative elements are simplified to plain text. For a document with clean heading hierarchy and standard body text, output quality is high. Heavily styled or tabular PDFs may need manual cleanup.

Can I export to GitHub-flavored Markdown (GFM)?

Yes. The output uses standard ATX headings (# H1, ## H2, etc.), dashes for unordered list items, and numbered lists for ordered sequences — all of which are fully compatible with GitHub-flavored Markdown. Paste it directly into a .md file in any GitHub repo.

Are images extracted from the PDF?

Not in this version. Image extraction requires significantly more processing and is planned for v2. The current tool focuses on text content. Images are silently skipped, and a note is shown in the output so you know what was omitted. OCR for scanned PDFs is also a v2 item.

Is the conversion done on a server?

No. Everything runs in your browser using the PDF.js library. Your PDF never leaves your device — there is no upload, no server-side processing, and no data retention. This makes it safe for confidential documents like research papers, financial reports, or internal documentation.

What about scanned PDFs?

Scanned PDFs are image-based and contain no machine-readable text layer. This tool works on text-based PDFs only — the kind you can select and copy text from in a PDF viewer. For scanned documents, you need OCR first. OCR support is on the roadmap as a separate tool.