Compliance
India
No Upload

PDF to PDF/A Converter Online: An Honest India Guide

PDF to PDF/A converter online — why browser-only conversion is impossible, what to do for MCA, GST, court e-filing instead. Free pre-flight checker, no upload.

PDF Mavericks·

PDF/A conversion in your browser is impossible. We do not try. We built a free pre-flight checker that tells you exactly what would prevent your PDF from being PDF/A-compliant — so you can fix it locally with Ghostscript or Acrobat Pro before submitting to the MCA21 portal, the GSTN attachment slot, or a High Court e-filing system. Search results for "pdf to pdf/a converter online" return dozens of tools that claim to convert in the browser. Most ship output that fails veraPDF, the canonical ISO 19005 validator. For Indian regulatory filings, that is a trust-killing failure mode we refuse to ship.

This guide is the honest version of the search query. Section 1 explains why browser-only PDF/A conversion does not work. Section 2 explains when PDF/A actually matters in India — MCA21 annual returns, GSTN attachments, court e-filing — with citations. Section 3 walks through what ISO 19005-1b conformance level B requires, ten clauses at a time. Sections 4 through 6 cover the pre-flight checker, the real conversion tools (Ghostscript, Acrobat Pro, LibreOffice), and the end-to-end workflow.

Why this is a checker, not a converter

A real PDF/A-1b converter has to do four things that pdf-lib — the JavaScript PDF library that runs in the browser — cannot do well. The four are:

  1. Embed a complete ICC color profile. ISO 19005-1 §6.2.3 requires that any document using DeviceRGB, DeviceCMYK, or DeviceGray declare an OutputIntent dictionary pointing at an embedded ICC profile (typically sRGB IEC61966-2.1, a binary blob of about 3 KB). pdf-lib has no built-in API to embed an ICC stream as a /DestOutputProfile reference. Writing the dictionary by hand is possible but the resulting structure rarely passes validation on the first try.
  2. Subset every font. §6.3.4 mandates that every font referenced in the document have its program embedded in the PDF. PDF/A-1 also requires fonts be subset where possible to avoid licensing complications. pdf-lib can embed a font you supply, but it cannot extract a font already referenced in an existing PDF, subset it to the glyphs used, and re-embed the subset. That is a font-engineering problem solved by HarfBuzz and FreeType — not a pdf-lib feature, and not a browser-only feature any time soon.
  3. Write a valid XMP packet with the pdfaid namespace. §6.7 mandates an XMP metadata stream in the document catalog. §6.7.11 mandates the packet include the pdfaid namespace (xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/") declaring the conformance part and level. pdf-lib has no XMP authoring API — you would write the XML by hand, including the rdf:Description machinery, byte order marks, and the trailing whitespace padding the spec recommends. Possible, but error-prone in a way that breaks validators.
  4. Flatten transparency without rasterising. §6.4 forbids transparency. If the source PDF uses transparency (modern Office exports often do), every transparent region needs to be flattened — split into opaque regions that approximate the visual result. Done correctly, this is what Adobe calls the Flattener Library; done badly, the document becomes a giant rasterised image that loses text searchability and fails PDF/A-1a tagged-content compliance. pdf-lib cannot flatten transparency at all.

Browser-based competitors that claim to convert to PDF/A typically skip all four problems and just set a metadata flag. The output renders fine in Adobe Reader. The marketing copy is satisfied. The file fails veraPDF. The regulator portal rejects it. The CS or CA who used the tool blames the portal, not the tool, because the tool said it converted successfully.

For a mass-market consumer audience, that gap might not matter. For an India regulatory audience filing annual returns, GST refunds, or court submissions, it matters a lot. The cost of a failed validation is concrete: a missed deadline, a paid extension, or a paper-print fallback at 11 PM. We built the /pdf-to-pdfa-precheck tool as a checker rather than a converter because the honest path is to tell you exactly what would fail before you waste another upload attempt.

MCA21, GSTN, and court e-filing — when PDF/A actually matters

Three regulatory contexts in India routinely demand PDF/A-conformant output. Each has its own validation behaviour and its own cost of rejection.

Ministry of Corporate Affairs (MCA21)

The MCA21 portal accepts annual returns (Form AOC-4, Form MGT-7), board resolutions attached to e-forms, and director KYC submissions (Form DIR-3 KYC). Several form types require PDF/A and the upload screen rejects non-conformant files with a generic "file not in PDF/A format" error. The MCA general circulars on e-filing reference ISO 19005 directly. The attachment validator runs at upload time, so you find out immediately — but a CS preparing an annual return on the day of the deadline does not have time for two or three rounds of re-conversion. Searching for "pdf to pdf/a converter online" near 5 PM on the last day is the most common reason this guide gets read.

GST Network (gst.gov.in)

GSTN attachment slots — refund claim supporting documents, advance ruling applications, notice replies, audit replies — sometimes require PDF/A archival format. The portal validation is less strict than MCA21 but still rejects encrypted PDFs, files containing JavaScript or embedded multimedia, and files larger than 5 MB. For invoice attachments under refund claims, the file format requirement is documented in the GST Refund Manual; non-conformant uploads stall the application. The common failure mode is encryption — Indian banks ship statements password-protected, and CAs sometimes attach the encrypted file directly. Run such files through /unlock-pdf first to remove the password before any PDF/A pre-check.

High Court and e-Courts e-filing

The e-Courts project and several individual High Court e-filing portals (Delhi HC, Bombay HC, Karnataka HC, Madras HC) accept case documents in PDF/A. The Delhi HC e-filing rules document explicitly references PDF/A-1b. Validation rules vary by court — some portals run veraPDF server-side, others do a lighter check. For a missed-deadline matter, a non-conformant PDF rejection is the reason a litigant pays a printer to certify a paper copy at midnight. For ongoing matters, a CS or advocate preparing affidavits, plaints, and replies routinely sees PDF/A as the required format.

For all three contexts, the cost of a failed validation is concrete and time-sensitive. That is the audience this guide is written for. The pre-check tells you the truth in 5 to 15 seconds, in your browser, without ever uploading the file.

What ISO 19005-1b actually requires

ISO 19005-1:2005 defines PDF/A-1, the original archival profile of PDF. It has two conformance levels: A (accessible — adds structure tagging requirements) and B (basic — visual fidelity only). For Indian regulatory submissions, level B is the common target because the structural-tagging requirements of level A are harder to meet on documents originally authored in Word, LaTeX, or accounting software without explicit tag output.

The ten-point summary of conformance level B:

  1. No encryption. §6.1.3 forbids the /Encrypt dictionary. Password-protected PDFs are out by definition.
  2. No JavaScript, URI, or Launch actions. §6.6.1 and §6.6.2 forbid these action types in any annotation, the catalog OpenAction, or any name tree. PDF/A is for archival; executable content is the opposite of archival.
  3. No embedded files or attachments. §6.9 forbids /EmbeddedFiles, /Filespec, and FileAttachment annotations. The PDF must be self-contained.
  4. No transparency. §6.4 forbids /SMask soft masks and any /CA or /ca alpha value less than 1.0 in any extended graphics state. This is the hardest clause to fix automatically because flattening transparency well is a non-trivial graphics problem.
  5. XMP metadata stream required. §6.7 mandates a /Metadata reference in the document catalog pointing at an XMP packet.
  6. PDF/A identifier in XMP. §6.7.11 mandates pdfaid:part and pdfaid:conformance elements in the XMP packet, declaring the conformance part (1, 2, or 3) and level (A, B, or U).
  7. All fonts embedded and subset. §6.3.4 requires every font program be embedded in the PDF. Type3 fonts must have charprocs. Composite fonts must have their CIDFont program embedded.
  8. OutputIntent with embedded ICC profile. §6.2.3 and §6.2.4 require an /OutputIntents array with at least one entry whose /DestOutputProfile points at an embedded ICC profile stream.
  9. No multimedia annotations. §6.5.3 forbids Movie, Sound, and 3D annotation subtypes.
  10. AcroForm fields must have appearance streams. §6.9 requires that interactive form fields have their appearance streams baked in. The /NeedAppearances flag must be either absent or false. Form fields without appearance streams may render differently in different viewers, which violates the visual-fidelity guarantee.

The full standard is ISO 19005-1:2005 (purchase required). The PDF Association maintains a free primer at pdfa.org that covers the practical implications of each clause without the ISO membership barrier.

What the pre-flight checker does

The /pdf-to-pdfa-precheck tool runs the ten checks above against your PDF, in your browser, and reports PASS, WARN, or FAIL per check with the exact remediation tool for each failure. The implementation uses a hybrid approach: pdf-lib parses the catalog and page tree for structural inspection, while raw byte-stream regex scanning catches objects and annotations that pdf-lib does not expose directly. Total runtime is 5 to 15 seconds depending on the document size.

Each FAIL row in the output expands to show what we found and the exact tool to fix it. Examples:

  • FAIL on encryption → /unlock-pdf to remove the password locally, then re-run the pre-check.
  • FAIL on AcroForm appearances → /flatten-pdf to bake form fields into the page content stream. After flattening, the form is no longer interactive but the visual layout is preserved and PDF/A clauses pass.
  • FAIL on missing OutputIntent or ICC profile → use Ghostscript with the PDF/A flag to add a sRGB OutputIntent. The exact command is in the next section.
  • FAIL on transparency or font embedding → these are not fixable by single-tool tweaks. The honest answer is to re-export the source from Word, LaTeX, or LibreOffice with PDF/A enabled, or run the existing PDF through Ghostscript or Acrobat Pro.
  • FAIL on embedded files or JavaScript → use /redact-pdf workflow to remove the offending objects, or re-export from the source if the embedded content is part of the original document layout.

The checker also exports a JSON report with each finding, byte offset, and recommended fix — useful for CI gates that run on generated PDFs, or for compliance evidence packages where the pre-check output is part of the audit trail.

Tools that actually produce valid PDF/A-1b

If the pre-check shows FAIL on your file, three tools produce veraPDF-clean output today. None is browser-only.

1. Ghostscript (free, open source)

One command, runs locally, no upload:

gs -dPDFA=1 -dPDFACompatibilityPolicy=1 \
   -sColorConversionStrategy=RGB \
   -sOutputICCProfile=sRGB.icc \
   -sDEVICE=pdfwrite \
   -o output.pdf input.pdf

sRGB.icc ships with most Ghostscript installs in the iccprofiles/ directory. On macOS install with brew install ghostscript. On Ubuntu use apt install ghostscript. On Windows download from ghostscript.com/releases. The output passes veraPDF on most office documents — the failure modes are usually edge cases with non-Latin scripts (font character encoding completeness) or content streams inside Form XObjects.

2. Adobe Acrobat Pro (paid, ~USD 14.99/month)

File → Save As Other → PDF/A. Acrobat Pro is the most reliable conversion path because the same vendor wrote the original PDF specification and the reference implementation. The internal handling of ICC profiles, font subsetting, XMP authoring, and transparency flattening is the most complete in the market. The subscription cost is meaningful for individual filers; for CS firms and CA practices it is usually already in the budget.

3. LibreOffice (free, open source)

File → Export As → Export As PDF → General tab → check PDF/A-1a. Works only when the source document is editable in LibreOffice (.docx, .odt, .xlsx, .pptx). The output is generally clean but has occasional font-embedding gaps for less common scripts — run the pre-check on the result before submitting. For documents that originated in Word or Google Docs, exporting to PDF/A directly from the source application is usually cleaner than exporting to PDF and converting after.

The full workflow: precheck → fix → re-precheck → submit

For a CS preparing an annual return at 4 PM with a 5 PM deadline, the practical workflow is:

  1. Run the pre-check first. Drop the assembled PDF into /pdf-to-pdfa-precheck. In 5 to 15 seconds you get a per-clause PASS / WARN / FAIL report. If everything is PASS, skip to step 4.
  2. Fix each FAIL with the recommended tool. Encryption → unlock-pdf. AcroForm appearances → flatten-pdf. Missing OutputIntent → run the Ghostscript command above. Transparency → re-export from source with PDF/A enabled, or run through Acrobat Pro / Ghostscript. Each fix is local, with no upload.
  3. Re-run the pre-check. Confirm every clause now reports PASS. WARN findings are usually safe to submit but worth flagging in your filing notes if the document is large or unusual.
  4. Validate with veraPDF before the deadline. For high-stakes filings, the final gate is veraPDF itself. Download from verapdf.org/software, run java -jar verapdf-greenfield-X.Y.Z.jar --format text yourfile.pdf. If both the pre-check and veraPDF pass, the regulator portal will accept the file.
  5. Submit. Upload to MCA21, GSTN, or the relevant e-filing portal. Keep the JSON pre-check report and the veraPDF output in your records — useful evidence if the file is challenged later or if there is a dispute about upload time.

Your PDF stays in your browser

The pre-flight checker reads your PDF into the page's memory, runs the ten clause checks with JavaScript and byte-pattern scanning, and renders the report locally. No upload — verify it yourself in the DevTools Network tab. PDFs that get pre-checked are disproportionately sensitive (annual returns, board minutes, court submissions); browser-local is the only honest posture.

Frequently asked questions

Can a browser tool truly convert PDF to PDF/A?

No, not in a way that passes veraPDF. A real PDF/A-1b converter has to embed a complete ICC color profile (typically sRGB IEC61966-2.1, a binary blob of about 3 KB), subset every font down to the glyphs actually used, write a valid XMP packet declaring the pdfaid namespace and conformance level, and flatten any transparency without rasterising the page. pdf-lib — the JavaScript PDF library that runs in the browser — cannot do any of those four things correctly. Anyone shipping a browser-only PDF/A converter is producing files that render fine and fail strict validation. We would rather build a checker that tells you the truth than a converter that lies.

Why did you build a checker instead of a converter?

For an India regulatory audience submitting to MCA21, the GSTN portal, or High Court e-filing, shipping a fake converter is a trust-killing event. The cost of a non-conformant submission near a deadline is concrete — a missed filing, a paid extension, or a paper-print fallback at midnight. A pre-flight checker that runs the same byte-level checks veraPDF runs, surfaces every issue, and recommends a real fix tool for each one is more useful than a converter whose output looks fine and fails downstream. The pre-check runs in your browser in about 5 to 15 seconds; veraPDF takes one command and free.

What is the relationship between this checker and veraPDF?

veraPDF (verapdf.org) is the canonical open-source PDF/A validator, funded by the PDF Association and the EU PREFORMA project. It is the reference implementation that other PDF/A claims are measured against. Most archive systems and government portals use veraPDF or libraries that embed veraPDF rules to validate submissions. Our checker is a faster first pass — it covers the common 80 percent of failure modes, runs in your browser without a Java install, and finishes in seconds. veraPDF is the definitive verdict before you submit. Use the pre-check to catch issues fast; use veraPDF as the final gate.

Will MCA21 or GSTN really reject a non-compliant PDF?

Yes, and the rejection mechanism varies by portal. MCA21 uses a portal-side validator that returns a generic 'file not in PDF/A format' error and refuses to attach the file to the form — your filing cannot proceed until you upload a conformant PDF. GSTN is less strict on most attachment slots but still rejects encrypted PDFs, files with embedded multimedia, and files larger than 5 MB. High Court e-filing portals vary by court — Delhi HC explicitly references PDF/A-1b in its e-filing rules document; Bombay and Karnataka HCs have similar requirements with different cut-offs. The common pattern is rejection at upload, not later, so you can fix and resubmit if you have time before the deadline.

How do I validate locally with veraPDF?

Download the veraPDF Greenfield release from verapdf.org/software (it is a ZIP containing a Java runner, ~80 MB). Extract it, then run java -jar verapdf-greenfield-X.Y.Z.jar --format text yourfile.pdf from the terminal — replace X.Y.Z with the version number. The output is a list of clauses checked, with PASS or FAIL per clause. Setting --format machine produces JSON for scripting. Java 11 or newer is required. On macOS install via brew install --cask temurin; on Ubuntu apt install default-jre; on Windows download from adoptium.net.

What about Adobe Acrobat Pro for PDF/A conversion?

Adobe Acrobat Pro is the most reliable conversion path because the same vendor wrote the original PDF specification and the reference implementation. Open the PDF in Acrobat Pro, choose File → Save As Other → PDF/A, pick PDF/A-1b in the dialog, and save. Acrobat handles ICC profile embedding, font subsetting, XMP authoring, and transparency flattening internally. The subscription is around USD 14.99 per month at the time of writing. For a one-off filing this is sometimes cheaper than the cost of a missed deadline. For ongoing CS or CA workflow, the cost adds up — Ghostscript is the free alternative with comparable output quality on most documents.

What does the Ghostscript command actually do?

The command gs -dPDFA=1 -dPDFACompatibilityPolicy=1 -sColorConversionStrategy=RGB -sOutputICCProfile=sRGB.icc -sDEVICE=pdfwrite -o out.pdf in.pdf invokes Ghostscript's pdfwrite device with PDF/A mode set to part 1, conformance B. -dPDFACompatibilityPolicy=1 tells Ghostscript to flag and adjust elements that are not PDF/A compliant rather than failing outright. -sColorConversionStrategy=RGB converts all device colors to RGB. -sOutputICCProfile=sRGB.icc embeds the standard sRGB color profile. The output is usually veraPDF-clean for typical office documents. Run it on a fresh terminal — Ghostscript is at ghostscript.com or available via brew install ghostscript on macOS, apt install ghostscript on Ubuntu.

Can I use the pre-flight checker for compliance evidence?

The checker exports a machine-readable JSON report with each check, its PASS/WARN/FAIL status, the byte offset where the issue was detected (where applicable), and the recommended fix. That JSON is suitable for inclusion in a compliance evidence package, audit log, or CI gate. For ISO 27001 or SOC 2 audits where you need to show that submitted PDFs went through validation, including the JSON report alongside the submitted PDF and the regulator acknowledgement is a reasonable evidence chain. For court submissions, the JSON shows that you ran a pre-check before filing — useful if a registry challenges the PDF format compliance later.

Related guides