Deskew PDF Online Free: Fix Tilted Bank Statements
Deskew PDF online free in your browser. Fix 0.5°-3° tilt on scanned bank statements, court orders, property docs before OCR. Hough-line detection, no upload.
- Why a 2° tilt destroys OCR accuracy
- Why ADF scanners always introduce skew
- How Hough-line skew detection works
- The vector-to-raster trade-off
- How to deskew PDF online free in your browser
- India-specific scenarios — bank statements, court orders, property
- The pre-OCR pipeline: deskew → OCR → search
- Frequently asked questions
Your scanned bank statement is tilted 2 degrees. OCR will be roughly 10 times worse on it than on a straight scan. Deskew first. The rest of this guide explains why that is — and gives you a browser-only path to deskew PDF online free on the kinds of documents Indian SMBs, lawyers, and accountants actually deal with: SBI and HDFC bank statements, sale deeds, court orders, GST filing scans, property tax bills, salary slips. The whole pipeline runs in your browser. No upload, no login, no third-party server seeing the account numbers.
The argument has three parts. First, why even a tiny skew matters: OCR engines tolerate roughly 0.5 degrees before accuracy starts dropping, and ADF scanners almost always introduce 0.5 to 3 degrees of tilt. Second, how Hough-line detection actually finds the dominant text-line angle on a page. Third, the practical workflow — drop the PDF, click Detect skew, review per-page angles, click Apply, download.
Why a 2° tilt destroys OCR accuracy
Tesseract and most OCR engines train on text rendered horizontally. They handle a tiny amount of skew — perhaps 0.5 degrees before you measure any accuracy drop. Past that, character segmentation starts to fail in predictable ways. Letters that should sit on a baseline get split across two scan-lines. Descenders disappear into the line below. The engine guesses "rn" where the original is "m," misreads "1" as "l," and ends up returning a paragraph that does not pass a basic spellcheck.
By 3 to 5 degrees of skew you are losing roughly 30 to 50 percent of words on a typical bank statement, and full lines on dense pages. ocrmypdf — the standard open-source OCR pipeline used in production document processing — runs deskew by default for exactly this reason. So does Adobe Acrobat's scan workflow, ABBYY FineReader, and every commercial document-processing pipeline that has to hit production accuracy. The pre-OCR deskew step is not optional infrastructure; it is how the rest of the pipeline gets its accuracy numbers.
The math behind this is simple geometry. A 1-degree skew with 10-point text means a character row drifts roughly 6 pixels over a 10 cm line. Six pixels is enough to break the OCR's row-projection step — the algorithm that finds horizontal bands of dark pixels and treats each band as one line of text. Once row projection fails, the engine tries to compensate with column projection, which works on rigid grids but not on justified or proportional text. By 2 degrees the drift is 12 pixels and the column projection also breaks. By 3 degrees the engine basically gives up.
Why ADF scanners always introduce skew
Indian small businesses, lawyers, CAs, and individuals run more PDF scans than almost any other audience globally. The reasons are familiar: bank account opening, GST registration, property mutation, court submissions, education board verifications, RTI filings, FRRO and visa paperwork, NRI document attestations. Each one demands scanned copies of original documents — usually fed through an ADF scanner at a Xerox shop, a small office multifunction printer, or a smartphone in a hurry.
ADF scanners feed paper through rollers. The mechanical reality is that paper enters the rollers slightly off-axis more than 80 percent of the time. Even high-end production scanners with auto-deskew compensation routinely leave 0.5 to 1 degree of residual tilt. Mid-range office multifunction printers sit at 1 to 3 degrees of skew on most pages. Cheap document scanners and Xerox-shop machines can produce 3 to 5 degrees consistently.
Smartphone scans (any of the post-CamScanner-ban India apps, or just a plain camera capture) skew worse: hand-held photography routinely produces 5 to 10 degrees of tilt depending on the user. Apps with edge-detection and perspective correction reduce this — that is what the /scan-to-pdf tool does on capture — but residual skew of 1 to 2 degrees is still common after capture. So the deskew step is useful even after a clean smartphone scan, especially before OCR.
The documents that need OCR after scanning are almost exactly the documents you would least like to upload to a third-party server: bank statements with account numbers and transaction history, sale deeds with PAN and Aadhaar references, divorce decrees, court orders, GST returns with revenue figures, property tax bills, school transcripts, employment letters with salary breakdowns. Sending those through an upload-based deskew or OCR service hands the data to whoever runs the service and whoever runs their cloud provider. Browser-local deskew sidesteps that whole class of risk.
How Hough-line skew detection works
The classical text-document deskew pipeline has been the same since the late 1990s. The implementation in our browser tool uses OpenCV.js (a WebAssembly build of the OpenCV computer vision library) and runs in roughly seven steps per page:
- Render the page to a canvas. pdf.js rasterises each page at 1.5× device-pixel-ratio scale. High enough to expose text-edge detail without exhausting browser memory on a 30-page document.
- Convert to grayscale and binarise. Single-channel grayscale, then an adaptive threshold (15-pixel block, mean adjustment 10) so uneven scan lighting does not wash out the text. Binarisation matters because Hough transforms work on binary edge maps, not full color images.
- Invert. Hough transforms find lines on bright pixels. Text on a binarised scan is dark on white, so we invert before running edge detection.
- Canny edge detection. Standard 50/150 thresholds. The output is an edge map — the boundary of every letter and every horizontal text underline.
- Probabilistic Hough lines. The OpenCV call is
cv.HoughLinesPwith rho=1, theta=π/180, threshold=100, minLineLength=100, maxGap=10. This finds straight line segments at all angles and returns each as a (x1, y1, x2, y2) tuple. - Filter to text-line range. For each detected line,
atan2(dy, dx)gives the angle. We keep only lines in [-15°, 15°] — the realistic range for text-line skew. Vertical lines (table rules, page borders), steep diagonals (underlines that cross at angles, signatures), and decorative elements get rejected. - Take the median. The median of the remaining angles is more robust than the mean against outliers. Even in tables and forms, the dominant horizontal direction usually wins. Median filtering is the standard choice in document-processing pipelines for this exact reason.
The detected median is reported per-page in the UI. You can override individual pages by unchecking them, or raise the threshold to skip near-zero detections that are likely noise. For pages with unusual content (large signatures, half-blank pages, decorative borders), the detector may report a wrong angle — uncheck those pages and they pass through unrotated.
The vector-to-raster trade-off
pdf-lib — the JavaScript library we use to write the output PDF — supports page rotation only at 90, 180, or 270 degrees via its setRotation() API. PDFs encode arbitrary-angle rotation as a transformation matrix on the page's content stream, and rewriting that matrix in pure JavaScript without breaking text positioning, font references, and annotation coordinates is non-trivial work that no in-browser library handles correctly today.
So the workable browser path is: render the page to a canvas, rotate the canvas using OpenCV's affine warp, embed the rotated canvas as a JPEG in the output PDF. The cost is rasterisation. Vector text becomes pixels. For scanned PDFs (where the input is already raster) this is invisible — you cannot lose vector quality you never had. For born-digital vector PDFs that happen to be slightly skewed, the deskewed output loses selectable text and crisp vector graphics.
That trade-off is usually fine for the actual use case. Born-digital PDFs are rarely skewed in the first place. They came out of Word, LaTeX, or a reporting tool with the text axis already aligned to the page. The PDFs that need deskewing are scanned ones, where the source is already a raster image. If your input is a vector PDF and you genuinely need arbitrary-angle rotation while preserving vector content, the right tool is Ghostscript on the desktop with a custom PostScript snippet — not a browser-only tool.
We embed the rasterised pages at JPEG quality 0.92 and at 2× device-pixel-ratio render scale — high enough to keep text crisp at normal viewing zoom on a Retina display. The output file is bigger than the input (a 5 MB scanned PDF might come out at 8 to 12 MB) because re-encoded JPEGs do not always hit the same compression as the original scanner output. If file size matters for an email or upload limit, run the deskewed file through /compress afterward.
How to deskew PDF online free in your browser
Four steps. The first run includes a one-time 8 MB OpenCV.js download, so plan for 5 to 10 seconds before processing actually starts. Subsequent runs in the same browser tab are instant.
- Open /deskew-pdf. The page loads its baseline JavaScript bundle. Drag the scanned PDF onto the drop zone, or click and pick the file. The PDF appears in the page list with the page count and file size.
- Click Detect skew. The page lazy-loads opencv.js (one-time fetch from docs.opencv.org, about 8 MB) and runs the seven-step Hough-line pipeline on every page. For a 10-page scanned bank statement, total processing time is usually 5 to 15 seconds. The page list updates with the detected angle for each page.
- Review per-page angles. Each page shows its detected angle and a checkbox. Pages with detection below the threshold (default 0.5°) are unchecked by default — they are too close to zero to bother rotating. Uncheck any page you do not want corrected, or raise the threshold to skip more near-zero angles.
- Click Apply deskew & download. Each checked page is re-rendered, rotated by the negation of its detected angle, and embedded as a JPEG in the output PDF. Unchecked pages pass through unchanged, preserving any vector content. The output downloads as
deskewed-document.pdf.
For password-protected bank statements (most Indian banks ship statements with a DDMM-PAN or similar password), use /unlock-pdf first to remove the password locally before deskew. The deskew tool cannot read encrypted page bytes; the unlock step is mandatory for processing.
India-specific scenarios — bank statements, court orders, property
Five concrete document categories where deskew matters before any downstream processing:
1. SBI, HDFC, Axis bank statement scans
Bank statements ship as native PDFs in most cases — straight export from the bank's core banking system, no skew, no need to deskew. The skewed-statement scenario is when a customer prints the statement, the printer feeds slightly off-axis, and the customer rescans the printout for an OCR-based reconciliation pipeline (matching transactions to ledger entries). The native digital export is always cleaner — get the password from the bank email, unlock it, and use the digital PDF directly. Deskew applies only to the printed-then-rescanned copy.
2. Property registration documents (sale deeds, mutation applications)
Sale deeds in India are typically registered as paper originals and scanned for any digital workflow afterward. The Sub-Registrar office scan quality varies wildly — some offices use proper flatbed scanners with auto-deskew, others use mobile-phone cameras held over the document by an assistant. The latter produces 3 to 8 degrees of skew. For property due diligence (title verification, encumbrance search), the deskewed scan feeds an OCR pipeline that extracts the parties' names, survey numbers, registration dates, and consideration amounts. Without deskew, the OCR misses survey numbers consistently — the long digit strings break across two pseudo-rows.
3. Court orders and case papers
District court orders are often hand-scanned at the court registry. The scan quality is usually adequate for human reading but not for OCR-based search across a case bundle. For a litigant working on an appeal or a cross-petition, being able to grep the case papers for a specific finding or citation matters. Deskew first, then run /ocr-pdf with both English and Hindi traineddata for the bilingual documents that show up in many state High Court proceedings.
4. GST filing scans (invoices, refund supporting documents)
For GST refund applications, supporting documents like shipping bills, BRC copies, and bank realisation certificates are uploaded as scanned PDFs. The GSTN portal accepts files up to 5 MB. Skewed scans not only fail OCR if you need to extract data later — they also waste pixels on the white margins outside the actual content, which inflates the file size. A deskewed scan is more compressible, which helps when you are butting against the 5 MB cap.
5. School and college transcripts for visa or employment
Education board transcripts, university mark sheets, and degree certificates are typically scanned at private Xerox shops on mid-range ADF scanners. The skew is consistent at 1 to 2 degrees per page. For employer verification systems and visa attestation portals (FRRO, US embassy DS-160 supporting documents), the validator software runs OCR on the scan to cross-check names and grades against the form. Deskew before uploading.
The pre-OCR pipeline: deskew → OCR → search
The full document-processing pipeline for an Indian SMB or professional working with scanned files looks like this:
- Capture. Either a fresh /scan-to-pdf session with the phone camera, or a flatbed/ADF scan from an office multifunction printer.
- Unlock if encrypted. /unlock-pdf for password-protected bank statements (most Indian banks ship statements encrypted with DDMM-PAN or similar date-based passwords).
- Deskew. This step. Hough-line detection, per-page review, output PDF with rotated raster pages.
- OCR. /ocr-pdf runs Tesseract.js with both English and Hindi traineddata and embeds the recognised text as an invisible layer on top of the page image. The output PDF is searchable — you can copy text, grep the file from the command line with
pdfgrep, or feed it to a RAG pipeline. - Compress if needed. /compress brings the file under email or portal upload caps without breaking the OCR layer.
- Submit. Upload to GSTN, MCA21, the relevant court e-filing portal, or whatever the destination is. Or just save it as a searchable archive.
Every step in that pipeline runs in your browser at pdfmavericks.com. The scanned bank statement, the property sale deed, the court order, the GST refund supporting document — none of them touch a third-party server. The only outbound network calls are the initial fetch of opencv.js and the pdf.js worker, both code coming down to your browser, not data going up.
Your scans never leave your browser
PDF Mavericks runs the deskew detection and rotation entirely in your browser using WebAssembly. No file is uploaded to any server. Bank statements, court orders, property documents, and salary slips stay on your device where they belong.
Frequently asked questions
Why is my bank statement skewed if it came from a scanner?
Auto-document-feeder (ADF) scanners pull paper through rollers, and the mechanical reality is that paper enters the rollers slightly off-axis more than 80 percent of the time. The skew that results is small — usually 0.5 to 3 degrees — but it is persistent across batches and visible to the eye on dense pages. Flatbed scanners avoid the issue if you align the page carefully against the corner; ADF scanners are convenient for multi-page documents but always introduce some skew. Smartphone camera captures (without an explicit scanner app) skew worse, often 5 to 10 degrees, because the user is hand-holding the phone over the document.
What is the right skew threshold for the deskew pdf online free tool?
The default 0.5 degrees works for most scanned documents and is the value we ship. Real ADF skew typically shows up at 0.5 to 3 degrees. Anything below 0.5 degrees is usually noise from the Hough transform — the algorithm finds spurious lines in halftone patterns, header logos, and signature areas. Setting the threshold below 0.5 tends to over-correct and rotate clean pages by accident, which costs vector quality without buying meaningful OCR improvement. If your file came from a smartphone camera (not a flatbed scanner), bump the threshold to 1 degree because phone scans have more random skew variance.
Why do I lose vector quality after deskewing?
pdf-lib (the in-browser library we use to write the output PDF) supports rotation only at 90, 180, or 270 degrees at the page level. Arbitrary-angle rotation must happen on the rendered image, not the PDF page object. So the pipeline is: render the page to a canvas via pdf.js, rotate the canvas using OpenCV's affine warp, embed the rotated canvas back as a JPEG in the output PDF. If the input was a scanned PDF (already raster), nothing visible is lost. If the input was a born-digital vector PDF (text + vector graphics) that happened to be slightly skewed, the deskewed output becomes a JPEG of that page — selectable text and crisp vector lines turn into pixels. Most PDFs that need deskewing are scans, so this trade-off is rarely visible in practice.
Can I deskew a single page only?
Yes. After running Detect skew, the page list shows the per-page detected angle and a checkbox for each page. Uncheck any page you do not want corrected — the checked pages are rotated, and the unchecked pages pass through unchanged. This is useful when most of the PDF is straight but a few pages came out tilted, or when the detector picked up a wrong angle on a page with unusual content (large signatures, decorative borders, half-blank pages). The output PDF preserves page order whether each individual page was deskewed or not.
What about handwritten documents — does the deskew detector work on those?
The Hough-line transform looks for straight horizontal text lines. Handwritten content rarely has those — handwriting baselines drift, characters slope, the ruled lines on the underlying paper are usually the strongest signal the detector picks up. So on a handwritten document, the detector usually finds the angle of the ruled paper itself rather than the actual writing. That is fine if you want the page edges to be horizontal — the ruled lines and the paper edges typically agree on the skew angle. For pure unruled handwritten content, the detector may report low confidence and skip the page; you can manually rotate the page in another tool if needed.
Why do some PDFs already have skew baked into the page rotation flag?
PDFs have a /Rotate entry in each page dictionary that takes 0, 90, 180, or 270 degrees. Some scanner software writes a 90-degree rotation flag instead of physically rotating the page — the page bytes are sideways but the viewer flips them on render. Adobe Reader and most modern viewers honor the flag transparently, so the page looks fine. But the page bytes themselves are still sideways, which confuses pdf.js and our deskew detector if we read the raw byte stream. We respect the /Rotate flag during rendering, so the detector sees the page as you see it on screen. If your PDF is sideways in your viewer (page rotation flag is 0 but the content is sideways), use a separate rotate tool first to fix the underlying orientation.
Does the deskew pdf online free tool upload my bank statements?
No. The PDF is read into your browser's memory, processed by JavaScript and WebAssembly running on this page, and offered as a download. It never crosses a network. Verify yourself — open browser dev tools, switch to the Network tab, run a deskew, and confirm there are zero PDF requests. The only network calls are a one-time fetch of opencv.js (about 8 MB) from docs.opencv.org and the pdf.js worker from a CDN. After those fetches, your PDF stays on your device. Bank statements with account numbers, transaction history, salary breakdowns, sale deeds with PAN and Aadhaar references, court orders — none of it touches our servers.
Will small detected angles like 0.05 or 0.1 degrees actually improve OCR?
No. OCR engines including Tesseract tolerate up to about 0.5 degrees of skew without measurable accuracy loss. Below that, the Hough transform itself has too much measurement noise to be reliable — what looks like 0.1 degree of detected skew is often an artifact of which edge pixels happened to round which way. The 0.5 degree threshold default exists for this reason. If you are chasing perfection on a clean scan, you are optimising past where it matters; spend the effort on the actual OCR step (using /ocr-pdf with both English and Hindi traineddata for India docs) instead of micro-tuning a deskew that adds no real recognition gain.