Reader — parse_sdf()
The reader module unpacks .sdf archives and exposes their contents as Python objects. It enforces ZIP bomb protection, path traversal guards, and file size limits during extraction.
Import
from sdf import parse_sdf, extract_jsonparse_sdf(buffer)
Parses a .sdf file and returns an SDFResult containing all archive components.
parse_sdf(buffer: bytes) -> SDFResult| Parameter | Type | Description |
|---|---|---|
buffer | bytes | Raw .sdf file contents |
Returns: SDFResult
Raises: SDFError with an appropriate error code on any of the following:
- File is not a valid ZIP archive (
SDF_ERROR_NOT_ZIP) - Required file is missing from the archive (
SDF_ERROR_MISSING_FILE) - Uncompressed size exceeds 200 MB (
SDF_ERROR_ARCHIVE_TOO_LARGE) - A ZIP entry path contains
..components (SDF_ERROR_INVALID_ARCHIVE) meta.jsonfails schema validation (SDF_ERROR_INVALID_META)
Basic usage
from sdf import parse_sdf
with open("invoice.sdf", "rb") as f: result = parse_sdf(f.read())
# meta.json fieldsprint(result.meta.document_id) # UUID v4print(result.meta.document_type) # "invoice"print(result.meta.schema_id) # "invoice/v0.2"print(result.meta.issuer) # "Acme Supplies GmbH"print(result.meta.issued_at) # ISO 8601 string
# data.jsonprint(result.data["invoice_number"]) # "INV-2026-001"print(result.data["total"]) # {"amount": "1250.00", "currency": "EUR"}
# schema.jsonprint(result.schema["$schema"]) # "https://json-schema.org/draft/2020-12/schema"
# visual.pdf byteswith open("invoice.pdf", "wb") as f: f.write(result.visual)
# signature.sig (None if unsigned)if result.signature is not None: print(f"Document is signed ({len(result.signature)} bytes)")SDFResult fields
@dataclassclass SDFResult: meta: SDFMeta # Parsed meta.json as SDFMeta dataclass data: dict # Parsed data.json as dict schema: dict # Parsed schema.json as dict visual: bytes # Raw PDF bytes from visual.pdf signature: bytes | None # Raw signature.sig bytes, or None if absentextract_json(buffer, filename)
Extracts and parses a single JSON file from an SDF archive without fully parsing the document. Useful when you only need one file, e.g. for routing based on document_type.
extract_json(buffer: bytes, filename: str) -> dict| Parameter | Type | Description |
|---|---|---|
buffer | bytes | Raw .sdf file contents |
filename | str | File to extract: 'meta.json', 'data.json', or 'schema.json' |
Returns: Parsed JSON as a Python dict.
Raises: SDFError if the file is missing or the JSON is malformed.
from sdf import extract_json
with open("invoice.sdf", "rb") as f: buffer = f.read()
# Route by document type without full parsemeta = extract_json(buffer, "meta.json")document_type = meta["document_type"]
if document_type == "invoice": process_invoice(buffer)elif document_type == "nomination": process_nomination(buffer)Processing a batch of SDF files
from pathlib import Pathfrom sdf import parse_sdffrom sdf.errors import SDFError
sdf_dir = Path("./incoming")results = []errors = []
for sdf_path in sdf_dir.glob("*.sdf"): try: with open(sdf_path, "rb") as f: result = parse_sdf(f.read()) results.append({ "file": sdf_path.name, "document_id": result.meta.document_id, "document_type": result.meta.document_type, "is_signed": result.signature is not None, }) except SDFError as e: errors.append({"file": sdf_path.name, "error": e.code, "message": str(e)})
print(f"Processed: {len(results)}, Errors: {len(errors)}")Security guarantees
parse_sdf() enforces the following limits on all archives:
| Limit | Value | Error code |
|---|---|---|
| Max single file size | 50 MB | SDF_ERROR_ARCHIVE_TOO_LARGE |
| Max total uncompressed size | 200 MB | SDF_ERROR_ARCHIVE_TOO_LARGE |
| Path traversal | Blocked | SDF_ERROR_INVALID_ARCHIVE |
These limits are applied before any file content is read. An archive that exceeds any limit is rejected immediately.
No network requests are made during parsing. Schema validation uses the schema.json embedded inside the archive — never an external URL.