I have an incremental transform that we recently added a new column. There is an incremental lightweight transform that consumes it as a snapshot input and using polars reading it in lazy mode. Until we added the new column it worked fine but then it started to break with the error:
polars.exceptions.SchemaError: extra column in file outside of expected schema: {col name}, hint: specify this column in the schema, or pass extra_columns='ignore' in scan options.
extra_columns is not exposed in the api so I am not sure how to fix it.
We cannot reprocess things and run it as a snapshot
Hey, we managed to solve it by adding the following code:
import polars as pl from functools import wraps _original_scan_parquet = pl.scan_parquet
@wraps(_original_scan_parquet) def _patched_scan_parquet(*args, **kwargs): # Set default for extra_columns if not explicitly provided if "extra_columns" not in kwargs: kwargs["extra_columns"] = "ignore" kwargs["missing_columns"] = "insert"