Introducing SDF: A New Standard for Machine-Readable Business Documents

Today we're publishing the first public release of the SDF specification, the reference TypeScript implementation (@etapsky/sdf-kit), the CLI, and the Python SDK. Here's the problem we're solving and why we built it the way we did.

The problem: documents are a black hole for business data

Every B2B transaction — invoice, purchase order, delivery note, customs declaration — is accompanied by a document. That document travels as a PDF. The receiving system, whether it's an ERP, an accounting tool, or a government portal, can't read a PDF directly. So it does one of two things: runs OCR (error-prone, expensive) or someone re-keys the data by hand (slow, even more error-prone).

This is not a niche problem. It happens in every industry, at every scale, in every country. It's been "solved" dozens of times — EDI, XML invoices, ZUGFeRD, XRechnung, PEPPOL — and yet the PDF re-keying problem persists because these solutions are either industry-specific, XML-based (which developers hate), or require both parties to have adopted the same vertical standard.

The SDF approach: bundle them both

SDF is a ZIP archive with a .sdf extension. Inside:

invoice.sdf

invoice.sdf  (ZIP)
├── visual.pdf    ← Human layer — any PDF viewer opens it
├── data.json     ← Machine layer — zero OCR, zero re-keying
├── schema.json   ← Validation rules, bundled offline
└── meta.json     ← Identity and provenance

The key insight is that the PDF and the JSON serve different audiences. The PDF is for humans and existing PDF-based workflows — it opens in any viewer, it can be printed, it satisfies legal archival requirements. The JSON is for machines — it can be parsed directly into your ERP, your accounting system, your API.

If your counterparty supports SDF, you get structured data automatically. If they don't, they still get a perfectly readable PDF. Adoption is additive, never disruptive.

Why JSON, not XML

ZUGFeRD and XRechnung — the dominant European structured invoice standards — embed ZUGFeRD XML into PDF metadata. They work, but the developer experience is painful. XML namespaces, XPATH, XML Schema — these are unfriendly to modern application developers who think in JSON.

SDF uses JSON throughout. The data layer is JSON. The schema is JSON Schema Draft 2020-12 — the same format your TypeScript types might be generated from. Validation is via ajv, which most Node.js developers already have in their stack.

General purpose from day one

Unlike ZUGFeRD (invoices only), SDF makes no assumptions about document type. The format supports any business document: invoice, purchase order, nomination, government permit application, health report, HR document, contract. The only constraints are structural — the container, the meta fields, the signing algorithm.

The spec ships with 7 reference examples: invoice, nomination, purchase order, tax declaration, customs declaration, permit application, and health report. These cover B2B, B2G, and G2G use cases.

What we're releasing today

@etapsky/sdf-kit 0.2.2 — TypeScript producer, reader, validator, signer. Node.js 20/22, browser, Electron.
@etapsky/sdf-cli 0.3.2 — CLI: inspect, validate, sign, verify, keygen, wrap, convert, schema. Homebrew + binary releases.
@etapsky/sdf-schema-registry 0.1.1 — Schema management with diff and migration engine.
@etapsky/sdf-server-core 0.1.6 — Fastify 5 server: multi-tenant REST API, BullMQ queue, S3/MinIO, ERP connectors.
etapsky-sdf 0.1.1 (PyPI) — Python SDK: producer, reader, validator, signer.
SDF Format Specification — 17 sections, 1,200+ lines. CC-BY 4.0. The normative reference for any implementation.

What's next

The documentation site (docs.etapsky.com) is the next major deliverable. After that: the SaaS platform GA, subscription billing, and the ISO standardization process.

If you're building enterprise document workflows, we'd love to talk. Reach us at contact@etapsky.com or open an issue on GitHub.