Skip to content

Extraction

Apply a template to a spreadsheet file and get canonical JSON out.

extract

extract(
    path: str | Path,
    template: Template,
    *,
    engine: Engine | None = None,
) -> ExtractResult

Apply a template to a spreadsheet file and return canonical JSON.

The returned ExtractResult carries the canonical data and any structural / row-level extraction problems. It also carries a reference to the template, so projection methods (to_pydantic, to_pandas, iter, get) can find it.

Parameters:

Name Type Description Default
path str | Path

Path to the source file. Calamine (the default backend) reads .xls, .xlsx, .xlsb, and .ods; openpyxl (the fallback for templates that need cell-hidden metadata) reads .xlsx only.

required
template Template

A loaded crease.Template.

required
engine Engine | None

"calamine" or "openpyxl" to force a specific backend. Default (None) auto-selects: openpyxl if the template uses locate.skip_hidden_rows, calamine otherwise.

None

Returns:

Type Description
ExtractResult

An ExtractResult holding the canonical dict and any errors.

Raises:

Type Description
SourceFileError

If the file is missing, corrupt, encrypted, or in a format the chosen backend cannot read.

stream

stream(
    path: str | Path,
    template: Template,
    *,
    entity: str,
    model: Any | None = None,
    allow_partial: bool = False,
    engine: Engine | None = None,
) -> Iterator[Any]

Stream records of one entity from a spreadsheet file.

For v1, this delegates to extract and iterates the materialized result.

Parameters:

Name Type Description Default
path str | Path

Path to the source file.

required
template Template

A loaded crease.Template.

required
entity str

The entity to stream.

required
model Any | None

Optional Pydantic model to project each record into. When set, yields model instances instead of dicts.

None
allow_partial bool

If False (default), raises crease.ValidationError when extraction produced errors.

False
engine Engine | None

See extract.

None

Yields:

Type Description
Any

Dicts, or model instances if model was passed.

get

get(
    path: str | Path,
    template: Template,
    entity: str,
    *,
    model: Any | None = None,
    allow_partial: bool = False,
    engine: Engine | None = None,
) -> Any

Extract a single entity in one call. Convenience wrapper over extract.

Parameters:

Name Type Description Default
path str | Path

Path to the source file.

required
template Template

A loaded crease.Template.

required
entity str

The entity name (must have cardinality: one).

required
model Any | None

Optional Pydantic model to project into.

None
allow_partial bool

If False (default), raises crease.ValidationError when extraction produced errors.

False
engine Engine | None

See extract.

None

Returns:

Type Description
Any

A dict (or a model instance), or None if the entity wasn't found.

ExtractResult dataclass

ExtractResult(
    template_id: str,
    source_file: str,
    canonical: dict[str, Any] = dict(),
    errors: list[ExtractionError] = list(),
    row_errors: list[RowExtractError] = list(),
    template: Any = None,
    _cached_report: Any = None,
)

The output of crease.extract.

Holds the canonical JSON plus any structural / row-level problems encountered while applying the template. Use the projection methods (iter, get, to_pydantic, to_pandas) to consume the data in a typed shape — they halt by default if errors are present, opt-in to partial recovery with allow_partial=True.

Attributes:

Name Type Description
template_id str

The id of the template used to produce this result.

source_file str

Filename of the source xlsx (without path).

canonical dict[str, Any]

{entity_name: list_or_dict} — the raw extracted data.

errors list[ExtractionError]

Structural extraction errors (missing tab, header mapping failed, etc.). Internal — lifted into self.report.errors() on access.

row_errors list[RowExtractError]

Per-row coercion errors. Internal — lifted into self.report.errors() on access.

template Any

The Template used to produce this result. Set by crease.extract; projection methods rely on it.

canonical class-attribute instance-attribute

canonical: dict[str, Any] = field(default_factory=dict)

report property

report

Lazily compute (and cache) the Report for this result.

get

get(
    entity: str,
    *,
    model: Any | None = None,
    allow_partial: bool = False,
) -> Any

Return a single record for a cardinality: one entity.

Parameters:

Name Type Description Default
entity str

The entity name (matches Entity.name in the template).

required
model Any | None

Optional Pydantic model to project into. If None and pydantic is available, the canonical dict is returned as-is.

None
allow_partial bool

If False (default), raises crease.ValidationError when the report has any errors for this entity. If True, returns whatever was extracted regardless.

False

Returns:

Type Description
Any

A dict (or a model instance if model was passed), or

Any

None if the entity wasn't found in canonical.

Raises:

Type Description
ValueError

If the entity is declared with cardinality: many — use iter or to_pandas instead.

ValidationError

If errors are present and allow_partial is False.

iter

iter(
    entity: str,
    *,
    model: Any | None = None,
    allow_partial: bool = False,
) -> Iterator[Any]

Iterate over records of a cardinality: many entity.

Parameters:

Name Type Description Default
entity str

The entity name.

required
model Any | None

Optional Pydantic model to project each record into.

None
allow_partial bool

If False (default), raises crease.ValidationError if the report has any errors before yielding. If True, yields rows that successfully project; rows that fail are skipped and surfaced via self.report.errors().

False

Yields:

Type Description
Any

Dicts, or model instances if model was passed.

Raises:

Type Description
ValueError

If the entity has cardinality: one.

ValidationError

If errors are present and allow_partial is False.

to_pydantic

to_pydantic(
    entity: str,
    *,
    model: Any | None = None,
    allow_partial: bool = False,
) -> Any

Project canonical records into Pydantic model instances.

Field matching is opportunistic by attribute name: fields the model doesn't declare are silently dropped (extra="ignore"); type mismatches raise.

Parameters:

Name Type Description Default
entity str

The entity name.

required
model Any | None

Pydantic BaseModel subclass to project into. If None, a model is auto-generated from the template's field declarations via Template.model(entity).

None
allow_partial bool

See iter.

False

Returns:

Type Description
Any

For cardinality: many entities, list[model].

Any

For cardinality: one, a single model instance (or

Any

None if the entity wasn't found).

Raises:

Type Description
ValidationError

If errors are present and allow_partial is False, or if a row can't project into the model.

to_pandas

to_pandas(
    entity: str, *, allow_partial: bool = False
) -> Any

Project canonical records into a pandas DataFrame.

Requires the pandas extra (pip install crease[pandas]). Pandas is imported lazily inside this method, so callers who never use it don't pay the import cost.

Parameters:

Name Type Description Default
entity str

The entity name.

required
allow_partial bool

If False (default), raises crease.ValidationError when the report has any errors.

False

Returns:

Type Description
Any

A pandas DataFrame. For cardinality: one entities, a

Any

single-row DataFrame.