PDF Tools
Free
No Upload
India

Convert PDF to CSV: Bank Statements and GST Invoices (2026)

Convert PDF to CSV in your browser to extract bank statements and invoices. No upload, no signup. Ready for Tally, Zoho Books, and GST filings.

PDF Mavericks·

Convert PDF to CSV is the bottleneck every accountant, founder, and freelancer in India hits at the end of the month. The bank sends statements as PDF, the books need entries as CSV. Tally Prime imports CSV. Zoho Books imports CSV. ClearTax wants CSV for GSTR-2B reconciliation. Excel obviously eats CSV. The PDF is the last format anyone wants their numbers stuck inside.

The problem is that almost every "PDF to CSV" tool on the public web asks you to upload the statement first. A document that has your full name, account number, IFSC, last six months of transactions, and your employer's name on every salary credit row. To a server. In another country. So they can run a parser you could run in your browser tab.

This guide walks through the case for browser-local PDF-to-CSV extraction, the actual steps for an Indian bank statement (with examples for SBI, HDFC, and Axis), and the edge cases — scanned statements, multi-page tables, password-protected files — that break most converters.

Why most online PDF-to-CSV converters fail on bank statements

Three failure modes show up in the same order every time you evaluate a converter.

  1. The upload tax. Tools like Smallpdf, iLovePDF, and Sejda send your PDF to their server, run extraction there, and stream the CSV back. Their privacy policies say files get deleted after a few hours. That's a promise, not a guarantee, and the data was already in their logs the moment it hit their ingress. For a single internet recipe PDF, fine. For a six-month statement of your salary account, not fine.
  2. The paywall after three rows. A tool that lets you preview the first ten rows of a 200-row statement and asks for a Pro subscription before exporting the rest is not a converter. It is a sales funnel.
  3. The broken table detection. Bank statements use irregular layouts: SBI has a top-line summary block, then the transaction table, then a closing summary. HDFC inserts an empty row between months. Axis wraps long narration text across two physical lines. Generic table extractors collapse all three of those quirks into one mangled CSV with the wrong columns shifting halfway down the file.

All three failures share a root cause: the tool was built to process arbitrary PDFs, not bank statements. It does not know that "Tran Date" and "Value Date" are siblings, that "Withdrawal Amt." goes to the debit column, or that the closing balance is a single trailing row that should be dropped before import.

How browser-local PDF-to-CSV extraction actually works

Inside the browser tab, three layers do the work. The first is PDF.js, Mozilla's pure-JavaScript PDF parser. It reads the byte stream and exposes every text object on every page with its x and y coordinates intact. No upload, no native binary, no plugin.

The second layer is a table-detection step. It groups text objects that share a y-coordinate band into rows, then groups the rows that share a left-edge alignment of header strings ("Date", "Description", "Debit", "Credit", "Balance") into tables. The heuristic is the same one Tabula and Camelot use, ported to TypeScript and compiled to WebAssembly so it runs at roughly 6,000 rows per second on a mid-range laptop.

The third layer is column inference. Each detected row is split into cells by detecting whitespace gaps between text runs. For ambiguous cases — a long narration that overflows into the amount column, for instance — the converter lets you drag the column boundary in a preview before export.

The whole pipeline ships as static JavaScript and a 380 KB WebAssembly module. After the first page load it is cached, so subsequent conversions are zero-network operations. You can verify this by opening the browser DevTools, switching to the Network panel, clearing it, and converting a file. You will see no outbound traffic for the conversion itself.

Step by step: PDF to CSV in three minutes

The flow below uses the /pdf-to-csv tool. Sample input: a six-month HDFC savings account statement with 184 transactions across 22 pages.

  1. Unlock first if needed. If the bank emailed you a password-protected PDF, run it through /unlock-pdf before anything else. The unlock page accepts the password, strips it, and hands back a clean PDF — also browser-local.
  2. Drop the file. Drag the unlocked PDF onto the drop zone or click to pick it. Parse time for a 184-row statement: about 2.4 seconds on a 2022 MacBook Air, about 4.1 seconds on a mid-range Windows laptop.
  3. Preview the detected tables. The page splits into a left-side raw view and a right-side parsed view. Header row at the top, transaction rows below, closing balance row at the bottom. If the header was misdetected (rare but happens on older statement templates), click any cell and pick "this is the header row".
  4. Adjust columns if needed. Rare on standard Indian bank templates; common on annual reports and creative invoice formats. Drag the vertical column dividers to resize. The preview re-renders in under 100 ms.
  5. Pick the output schema. Default schema is Date, Description, Debit, Credit, Balance. Presets include Tally Prime, Zoho Books, ClearTax GSTR-2B, and Generic Excel. The presets reorder columns and rename headers; they do not touch numeric values.
  6. Export. The CSV downloads to your machine directly from the browser. No round-trip. Open it in Excel, LibreOffice Calc, or pipe it to your accounting software.

Total wall-clock time for the HDFC sample: 2 minutes 47 seconds including the unlock step.

India-specific: SBI, HDFC, GST returns, Tally Prime

Three Indian use cases drive the bulk of pdf-to-csv volume on this site. Calling them out explicitly because every one of them has at least one quirk a generic converter misses.

Bank statement reconciliation

SBI, HDFC, and Axis are the three statements we see most. SBI uses Tran Date and Value Date as separate columns; the converter keeps both and labels them clearly. HDFC inserts an empty row at month boundaries; the parser drops empty rows by default. Axis wraps long narration over two lines; the parser joins those lines back into one cell when the second line has no value in the amount column. All three use Indian-format numerals (1,23,456.78) which the export preserves or converts to ISO-format (123456.78) depending on your downstream tool.

GST input reconciliation (GSTR-2B)

Vendor invoices arrive as PDFs. To reconcile them against your GSTR-2B in ClearTax or Cleartax-equivalent, you need a CSV with GSTIN, invoice number, invoice date, taxable value, IGST, CGST, and SGST. The converter has a GSTR-2B preset that pulls those fields out of standard tax-invoice layouts (the ones GST law mandates the format of), and falls back to the generic table detector for non-standard layouts. The 28% IGST edge case for imports works correctly. The reverse-charge mechanism column, when present, is exported as is.

ITR-2 and ITR-3 capital gains

Zerodha and Groww send broker contract notes and capital-gain reports as PDFs every quarter. For ITR-2 and ITR-3 filing you need a CSV with date, scrip, quantity, buy price, sell price, and STCG/LTCG flag. The converter recognises the standard Zerodha and Groww layouts and exports the right schema. For other brokers (ICICI Direct, Kotak Securities, HDFC Sec) the generic extractor still works — you just rename the columns after export.

Tally Prime import in three clicks

Tally Prime accepts CSV bank-statement imports under Gateway of Tally → Banking → Bank Statement. The format it expects: Date in DD-MM-YYYY, Particulars (free text), Voucher Type (Receipt or Payment), Voucher Number (your reference), Debit amount, Credit amount. Pick the Tally Prime preset on the converter and the columns land in that order with the date format normalised. Tested against a 6-month HDFC statement and a 3-month SBI statement; both imported with zero manual edits on Tally Prime release 4.0 and later. The same preset feeds Tally ERP 9 if you flip the column order in Excel first (Voucher Number before Particulars).

Verifying nothing is uploaded

Trust-but-verify takes about 30 seconds. Open Chrome or Firefox DevTools, switch to the Network panel, and clear the log. Then drop your PDF on the converter. The only outbound requests you should see are the initial page load (HTML, JS, WASM, CSS) — nothing during or after the file is parsed. If you see a request that contains your file in the payload, that's the bug to report. So far it has not happened on any browser version we test against.

Edge cases: scanned PDFs, merged cells, multi-page tables

Three failure modes worth flagging up front so you don't fight the tool when it can't do what you're asking.

Scanned PDFs (image-only). If the bank statement was printed and re-scanned, the file contains an image of text rather than text. PDF.js sees no text objects to extract. The fix is to OCR the file first; we cover this in our browser-local OCR guide. After OCR, save as a searchable PDF and run it through the converter. Total time penalty: about 8 seconds per page on a modern laptop.

Multi-line transactions. Some banks split a single transaction over two physical rows when the narration is long. The default heuristic merges these correctly when the second row has no value in the amount column. When the heuristic misses (for instance, when the second row contains a currency code), the preview shows the offending row with a warning indicator and a "merge with previous" button.

Multi-page tables with header drift. Annual reports often repeat the header on every page but with slightly different text ("Schedule III" on page 1, "Schedule III contd." on page 2). The converter clusters these and treats them as the same table. If you do want them separate, click the table header in the preview and pick "split this from previous".

How browser-local compares: Tabula, Adobe, Sejda

Four converters, picked by intent.

  • Tabula. Open source. Local. Java required. Desktop install. Works well, but the install friction disqualifies it for casual one-off use. Closest spiritual match to browser-local. (tabula.technology)
  • Adobe Acrobat Pro DC. Best detection on creative layouts. Behind a paid subscription, with current pricing on Adobe's plans page. The web export uploads the file; the desktop app does the work locally but is a heavy install. (adobe.com pricing)
  • Sejda. Browser tool. Free tier with daily limits, then paywall — current limits on their pricing page. Uploads the file by default; their separate desktop app keeps things local. (sejda.com)
  • PDF Mavericks /pdf-to-csv. Browser-local. No install, no upload, no signup, no per-file limit. Free by default. Indian-bank presets and Indian-numeral handling included.

For one-off conversions of sensitive documents, browser-local is the path with the lowest friction and the smallest data-exposure surface. For automated bulk extraction (10,000 statements a night for a fintech), use Tabula or Camelot in a server pipeline you control.

The takeaway

The PDF was never the problem. The upload was. Once the extraction step runs in your browser, every objection — privacy, paywall, signup, install size, India-specific layouts — falls away. The CSV ends up where it always needed to be: in your spreadsheet, ready for Tally, Zoho, or ClearTax, with the file that started it still on your disk and never anywhere else.

Your files never leave your browser

PDF Mavericks parses everything locally using PDF.js and WebAssembly. No file is uploaded to any server.

Frequently asked questions

Will my bank statement be uploaded to a server?

No. PDF Mavericks parses your PDF inside the browser tab using PDF.js and a WebAssembly table-detection step. The file never leaves your device. You can confirm this by opening DevTools, switching to the Network panel, and watching the conversion run with zero outbound requests.

Does it work on password-protected bank statements?

Run the file through /unlock-pdf first. SBI uses DDMMYYYY of birth, HDFC uses the first four letters of your name (uppercase) plus the date in DDMM format, and Axis uses your customer ID plus date of birth. Once the password layer is stripped you can drop the unlocked PDF into the converter.

Why do some pages export with merged or scrambled columns?

Two reasons. First, the PDF was scanned rather than digitally generated, so the cells contain images of text and not extractable characters. Run OCR first. Second, the bank used a multi-line layout where one transaction spans two physical rows. Switch to the per-row preview before exporting and merge those rows manually.

Can I import the CSV directly into Tally or Zoho Books?

Yes for Zoho Books and ClearTax — both accept the standard date, narration, debit, credit, balance schema we export. Tally Prime uses its own format that needs the column order: Date, Particulars, Voucher Type, Voucher Number, Debit, Credit. Reorder the columns in Excel before importing or use the Tally-specific preset on /pdf-to-csv.

How does this compare to Tabula or Adobe Acrobat export?

Tabula is the closest open-source equivalent and runs locally too, but it needs Java and a desktop install. Adobe Acrobat Pro DC exports tables but ships behind a 1,599 INR per month subscription and uploads the file to Adobe servers when you use the web export. Sejda lets you do three free exports per hour and then paywalls. Browser-local extraction has no install, no paywall, and no upload.

Does the tool detect Indian number formats like the lakh and crore separators?

Yes. The parser handles 1,23,456.78 (Indian comma grouping) the same as 123,456.78 (Western). Currency symbols (₹, INR, Rs.) are stripped from numeric cells before export. If you spot a misread number — most often when the bank wraps an amount across two lines — the preview lets you click the cell and correct it before downloading.

Can I extract tables from a multi-page invoice or annual report?

Yes. Select all pages or a range; the converter stitches per-page tables that share the same header row into a single CSV. If a page has a different table structure (for example, a summary on the last page of an annual report) you'll see it as a separate sheet in the preview and can export each independently.

What's the file size limit?

There is no hard limit because nothing is uploaded — the limit is your browser's memory. A 50 MB statement with 200 transaction pages parses in under three seconds on a mid-range laptop. Beyond about 500 pages, switch to splitting the file at /split first and converting in batches.

Related guides