Incremental Lightweight Transform Bug/Issues and Feature Request

Hello! I am working in Lightweight transforms in a python code repo. I am using @incremental with two input datasets, both as snapshot inputs. I need to read the previous version of the output dataset for the transform I’m building. I am running into two issues and one feature that would be nice to have with this:

1.There is a platform bug when the transform is running as a snapshot (aka, when it hasn’t been built before or the semantic_version has changed), which produces this error:
A 400 Client Error : Bad Request error occurred calling https://localhost:8188/foundry-data-sidecar/api/datasets/dataset_name/downloadTableAsFilesV2 . ErrorName: FoundryDataSidecar:ReadingFromOutputWithIncrementalTransformRunningAsSnapshot . Please review and contact support if the problem persists.

Once the dataset is built, and has an output, this issue does not occur (I tested this multiple times). This makes sense, as the downloadTableAsFilesV2 function has something to download. But the whole point of providing a schema when reading from a previous output is for the instance where there is no data in the output data (aka, it’s running as a snapshot). This is a bug that needs to be fixed.

2.To read the previous version of a dataset, you need to provide a schema (like with incremental outputs in Spark transforms). It isn’t documented anywhere that you need to use the FoundryFieldSchema type, which uses FoundryFieldColumn and FoundryFieldType, which is similar to building a Spark schema. There is an error message that states: ValueError: User-provided schema is currently only supported if of FoundryFieldSchema type, but there is no documentation or indication of how to import or use the FoundryFieldSchema type. I was thankfully able to figure this out myself, but this really needs to be documented for other users, as it was a real pain to find this import statement and how to use it:

from foundry_data_sidecar_api.foundrydatasidecar_api import (
FoundryFieldColumn,
FoundryFieldType,
FoundryFieldSchema
)

Here’s an example of how to build it:
schema = FoundryFieldSchema([
FoundryFieldColumn("col1", FoundryFieldType.STRING),
FoundryFieldColumn("col2", FoundryFieldType.DATE)
])
dataset.polars(lazy=True, mode='previous', schema=schema)

And here are all the possible data types:
FoundryFieldType.ARRAY
FoundryFieldType.DECIMAL
FoundryFieldType.MAP
FoundryFieldType.STRUCT
FoundryFieldType.LONG
FoundryFieldType.BINARY
FoundryFieldType.BOOLEAN
FoundryFieldType.BYTE
FoundryFieldType.DATE
FoundryFieldType.DOUBLE
FoundryFieldType.FLOAT
FoundryFieldType.INTEGER
FoundryFieldType.SHORT
FoundryFieldType.STRING
FoundryFieldType.TIMESTAMP
FoundryFieldType.UNKNOWN

3.Lightweight transforms does not support previewing with incremental inputs/outputs, so the only way I can test code is building it, which is time consuming and wastes compute resources.

1 Like

Agree on 1 and 2, on 3) incremental preview + lightweight is supported in VSCode when the sampleless preview is used, see the table in the docs:

https://www.palantir.com/docs/foundry/palantir-extension-for-visual-studio-code/transforms-preview/

This was recently added.

1 Like

Hey, thanks for the report.

We are working on a fix to support reading from incremental outputs on the initial build. I will make sure that we track the documentation changes for user-specified schema in tandem as we were not aware of this gap in documentation before. Further, we will expose the necessary schema types as part of the lightweight APIs.

  • Ted
1 Like