Extraction¶
Apply a template to a spreadsheet file and get canonical JSON out.
extract ¶
Apply a template to a spreadsheet file and return canonical JSON.
The returned ExtractResult carries the canonical data and any
structural / row-level extraction problems. It also carries a reference
to the template, so projection methods (to_pydantic, to_pandas,
iter, get) can find it.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to the source file. Calamine (the default backend)
reads |
required |
template
|
Template
|
A loaded |
required |
engine
|
Engine | None
|
|
None
|
Returns:
| Type | Description |
|---|---|
ExtractResult
|
An |
Raises:
| Type | Description |
|---|---|
SourceFileError
|
If the file is missing, corrupt, encrypted, or in a format the chosen backend cannot read. |
stream ¶
stream(
path: str | Path,
template: Template,
*,
entity: str,
model: Any | None = None,
allow_partial: bool = False,
engine: Engine | None = None,
) -> Iterator[Any]
Stream records of one entity from a spreadsheet file.
For v1, this delegates to extract and iterates the materialized
result.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to the source file. |
required |
template
|
Template
|
A loaded |
required |
entity
|
str
|
The entity to stream. |
required |
model
|
Any | None
|
Optional Pydantic model to project each record into. When
set, yields |
None
|
allow_partial
|
bool
|
If |
False
|
engine
|
Engine | None
|
See |
None
|
Yields:
| Type | Description |
|---|---|
Any
|
Dicts, or |
get ¶
get(
path: str | Path,
template: Template,
entity: str,
*,
model: Any | None = None,
allow_partial: bool = False,
engine: Engine | None = None,
) -> Any
Extract a single entity in one call. Convenience wrapper over extract.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to the source file. |
required |
template
|
Template
|
A loaded |
required |
entity
|
str
|
The entity name (must have |
required |
model
|
Any | None
|
Optional Pydantic model to project into. |
None
|
allow_partial
|
bool
|
If |
False
|
engine
|
Engine | None
|
See |
None
|
Returns:
| Type | Description |
|---|---|
Any
|
A dict (or a |
ExtractResult
dataclass
¶
ExtractResult(
template_id: str,
source_file: str,
canonical: dict[str, Any] = dict(),
errors: list[ExtractionError] = list(),
row_errors: list[RowExtractError] = list(),
template: Any = None,
_cached_report: Any = None,
)
The output of crease.extract.
Holds the canonical JSON plus any structural / row-level problems
encountered while applying the template. Use the projection methods
(iter, get, to_pydantic, to_pandas) to consume the data in a
typed shape — they halt by default if errors are present, opt-in to
partial recovery with allow_partial=True.
Attributes:
| Name | Type | Description |
|---|---|---|
template_id |
str
|
The id of the template used to produce this result. |
source_file |
str
|
Filename of the source xlsx (without path). |
canonical |
dict[str, Any]
|
|
errors |
list[ExtractionError]
|
Structural extraction errors (missing tab, header mapping
failed, etc.). Internal — lifted into |
row_errors |
list[RowExtractError]
|
Per-row coercion errors. Internal — lifted into
|
template |
Any
|
The |
canonical
class-attribute
instance-attribute
¶
get ¶
Return a single record for a cardinality: one entity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
entity
|
str
|
The entity name (matches |
required |
model
|
Any | None
|
Optional Pydantic model to project into. If |
None
|
allow_partial
|
bool
|
If |
False
|
Returns:
| Type | Description |
|---|---|
Any
|
A dict (or a |
Any
|
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If the entity is declared with |
ValidationError
|
If errors are present and
|
iter ¶
Iterate over records of a cardinality: many entity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
entity
|
str
|
The entity name. |
required |
model
|
Any | None
|
Optional Pydantic model to project each record into. |
None
|
allow_partial
|
bool
|
If |
False
|
Yields:
| Type | Description |
|---|---|
Any
|
Dicts, or |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the entity has |
ValidationError
|
If errors are present and
|
to_pydantic ¶
Project canonical records into Pydantic model instances.
Field matching is opportunistic by attribute name: fields the model
doesn't declare are silently dropped (extra="ignore"); type
mismatches raise.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
entity
|
str
|
The entity name. |
required |
model
|
Any | None
|
Pydantic |
None
|
allow_partial
|
bool
|
See |
False
|
Returns:
| Type | Description |
|---|---|
Any
|
For |
Any
|
For |
Any
|
|
Raises:
| Type | Description |
|---|---|
ValidationError
|
If errors are present and
|
to_pandas ¶
Project canonical records into a pandas DataFrame.
Requires the pandas extra (pip install crease[pandas]).
Pandas is imported lazily inside this method, so callers who never
use it don't pay the import cost.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
entity
|
str
|
The entity name. |
required |
allow_partial
|
bool
|
If |
False
|
Returns:
| Type | Description |
|---|---|
Any
|
A pandas DataFrame. For |
Any
|
single-row DataFrame. |