Extraction¶

Apply a template to a spreadsheet file and get canonical JSON out.

extract ¶

extract(
    path: str | Path,
    template: Template,
    *,
    engine: Engine | None = None,
) -> ExtractResult

Apply a template to a spreadsheet file and return canonical JSON.

The returned ExtractResult carries the canonical data and any structural / row-level extraction problems. It also carries a reference to the template, so projection methods (to_pydantic, to_pandas, iter, get) can find it.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	Path to the source file. Calamine (the default backend) reads `.xls`, `.xlsx`, `.xlsb`, and `.ods`; openpyxl (the fallback for templates that need cell-hidden metadata) reads `.xlsx` only.	required
`template`	`Template`	A loaded `crease.Template`.	required
`engine`	`Engine \| None`	`"calamine"` or `"openpyxl"` to force a specific backend. Default (`None`) auto-selects: openpyxl if the template uses `locate.skip_hidden_rows`, calamine otherwise.	`None`

Returns:

Type	Description
`ExtractResult`	An `ExtractResult` holding the canonical dict and any errors.

Raises:

Type	Description
`SourceFileError`	If the file is missing, corrupt, encrypted, or in a format the chosen backend cannot read.

stream ¶

stream(
    path: str | Path,
    template: Template,
    *,
    entity: str,
    model: Any | None = None,
    allow_partial: bool = False,
    engine: Engine | None = None,
) -> Iterator[Any]

Stream records of one entity from a spreadsheet file.

For v1, this delegates to extract and iterates the materialized result.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	Path to the source file.	required
`template`	`Template`	A loaded `crease.Template`.	required
`entity`	`str`	The entity to stream.	required
`model`	`Any \| None`	Optional Pydantic model to project each record into. When set, yields `model` instances instead of dicts.	`None`
`allow_partial`	`bool`	If `False` (default), raises `crease.ValidationError` when extraction produced errors.	`False`
`engine`	`Engine \| None`	See `extract`.	`None`

Yields:

Type	Description
`Any`	Dicts, or `model` instances if `model` was passed.

get ¶

get(
    path: str | Path,
    template: Template,
    entity: str,
    *,
    model: Any | None = None,
    allow_partial: bool = False,
    engine: Engine | None = None,
) -> Any

Extract a single entity in one call. Convenience wrapper over extract.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	Path to the source file.	required
`template`	`Template`	A loaded `crease.Template`.	required
`entity`	`str`	The entity name (must have `cardinality: one`).	required
`model`	`Any \| None`	Optional Pydantic model to project into.	`None`
`allow_partial`	`bool`	If `False` (default), raises `crease.ValidationError` when extraction produced errors.	`False`
`engine`	`Engine \| None`	See `extract`.	`None`

Returns:

Type	Description
`Any`	A dict (or a `model` instance), or `None` if the entity wasn't found.

ExtractResult `dataclass` ¶

ExtractResult(
    template_id: str,
    source_file: str,
    canonical: dict[str, Any] = dict(),
    errors: list[ExtractionError] = list(),
    row_errors: list[RowExtractError] = list(),
    template: Any = None,
    _cached_report: Any = None,
)

The output of crease.extract.

Holds the canonical JSON plus any structural / row-level problems encountered while applying the template. Use the projection methods (iter, get, to_pydantic, to_pandas) to consume the data in a typed shape — they halt by default if errors are present, opt-in to partial recovery with allow_partial=True.

Attributes:

Name	Type	Description
`template_id`	`str`	The id of the template used to produce this result.
`source_file`	`str`	Filename of the source xlsx (without path).
`canonical`	`dict[str, Any]`	`{entity_name: list_or_dict}` — the raw extracted data.
`errors`	`list[ExtractionError]`	Structural extraction errors (missing tab, header mapping failed, etc.). Internal — lifted into `self.report.errors()` on access.
`row_errors`	`list[RowExtractError]`	Per-row coercion errors. Internal — lifted into `self.report.errors()` on access.
`template`	`Any`	The `Template` used to produce this result. Set by `crease.extract`; projection methods rely on it.

canonical `class-attribute` `instance-attribute` ¶

canonical: dict[str, Any] = field(default_factory=dict)

report `property` ¶

report

Lazily compute (and cache) the Report for this result.

get ¶

get(
    entity: str,
    *,
    model: Any | None = None,
    allow_partial: bool = False,
) -> Any

Return a single record for a cardinality: one entity.

Parameters:

Name	Type	Description	Default
`entity`	`str`	The entity name (matches `Entity.name` in the template).	required
`model`	`Any \| None`	Optional Pydantic model to project into. If `None` and pydantic is available, the canonical dict is returned as-is.	`None`
`allow_partial`	`bool`	If `False` (default), raises `crease.ValidationError` when the report has any errors for this entity. If `True`, returns whatever was extracted regardless.	`False`

Returns:

Type	Description
`Any`	A dict (or a `model` instance if `model` was passed), or
`Any`	`None` if the entity wasn't found in canonical.

Raises:

Type	Description
`ValueError`	If the entity is declared with `cardinality: many` — use `iter` or `to_pandas` instead.
`ValidationError`	If errors are present and `allow_partial` is `False`.

iter ¶

iter(
    entity: str,
    *,
    model: Any | None = None,
    allow_partial: bool = False,
) -> Iterator[Any]

Iterate over records of a cardinality: many entity.

Parameters:

Name	Type	Description	Default
`entity`	`str`	The entity name.	required
`model`	`Any \| None`	Optional Pydantic model to project each record into.	`None`
`allow_partial`	`bool`	If `False` (default), raises `crease.ValidationError` if the report has any errors before yielding. If `True`, yields rows that successfully project; rows that fail are skipped and surfaced via `self.report.errors()`.	`False`

Yields:

Type	Description
`Any`	Dicts, or `model` instances if `model` was passed.

Raises:

Type	Description
`ValueError`	If the entity has `cardinality: one`.
`ValidationError`	If errors are present and `allow_partial` is `False`.

to_pydantic ¶

to_pydantic(
    entity: str,
    *,
    model: Any | None = None,
    allow_partial: bool = False,
) -> Any

Project canonical records into Pydantic model instances.

Field matching is opportunistic by attribute name: fields the model doesn't declare are silently dropped (extra="ignore"); type mismatches raise.

Parameters:

Name	Type	Description	Default
`entity`	`str`	The entity name.	required
`model`	`Any \| None`	Pydantic `BaseModel` subclass to project into. If `None`, a model is auto-generated from the template's field declarations via `Template.model(entity)`.	`None`
`allow_partial`	`bool`	See `iter`.	`False`

Returns:

Type	Description
`Any`	For `cardinality: many` entities, `list[model]`.
`Any`	For `cardinality: one`, a single `model` instance (or
`Any`	`None` if the entity wasn't found).

Raises:

Type	Description
`ValidationError`	If errors are present and `allow_partial` is `False`, or if a row can't project into the model.

to_pandas ¶

to_pandas(
    entity: str, *, allow_partial: bool = False
) -> Any

Project canonical records into a pandas DataFrame.

Requires the pandas extra (pip install crease[pandas]). Pandas is imported lazily inside this method, so callers who never use it don't pay the import cost.

Parameters:

Name	Type	Description	Default
`entity`	`str`	The entity name.	required
`allow_partial`	`bool`	If `False` (default), raises `crease.ValidationError` when the report has any errors.	`False`

Returns:

Type	Description
`Any`	A pandas DataFrame. For `cardinality: one` entities, a
`Any`	single-row DataFrame.

Extraction¶

extract ¶

stream ¶

get ¶

ExtractResult dataclass ¶

canonical class-attribute instance-attribute ¶

report property ¶

get ¶

iter ¶

to_pydantic ¶

to_pandas ¶

ExtractResult `dataclass` ¶

canonical `class-attribute` `instance-attribute` ¶

report `property` ¶