I suspect this error occurs because the select operation removes columns not explicitly included before the where filter is applied. This results in the where clause failing because the column letter no longer exists in the dataset.
In the foundry.transforms.Dataset API, operations like select and where are applied lazily.
Why This Happens
select removes unused columns: When you call select("column1"), the API only retains the column1 column in the dataset.
where depends on letter: The where(Column.get("letter") == "a") operation requires the letter column to exist in the dataset.
Order of operations matters: By chaining select before where, you’re effectively trying to filter rows based on a column that has already been removed by the select operation.
Possible Solution
Retain Columns Required for Filtering
If you must select columns early for some reason, ensure that the column required for filtering (letter in this case) is included in the selection:
It would be nice to have predicate push down in the Dataset API so it can handle any order of operations and do the filtering like in Spark or other tools here: https://sites.google.com/view/raybellwaves/blog/what-data-processing-tool-should-i-use