Skip to content

Streaming large files

For multi-hundred-thousand-row files, stream instead of materializing. Streaming takes the same model= and allow_partial= arguments as the materialized projections, so the API shape stays symmetric.

# Yields dicts
for order in crease.stream("big.xlsx", template, entity="order"):
    pipeline.send(order)

# Yields validated Pydantic instances
for order in crease.stream("big.xlsx", template, entity="order", model=Order):
    pipeline.send(order)

Errors flow through the report, not the iterator

The iterator returns the happy path. Problems land on the session's report and can be inspected after consumption:

with crease.open("big.xlsx", template) as session:
    for order in session.stream("order", model=Order, allow_partial=True):
        pipeline.send(order)

    if not session.report().is_valid:
        log.warning(session.report().errors())

Implementation note

In v1, crease.stream and Session.stream materialise the result internally and yield from it. True row-by-row streaming via openpyxl's read-only mode is a follow-on once the eager extraction path is proven.