Hello all,
I wanted to know if there is a way to dynamically switch from a lightweight transform to a spark one ? I mean suppose we are having an incremental transform that can everyday run using lightweight, but if we want to snapshot this dataset and the size of this dataset is huge ( ~500GB ), we should do it in spark, so can we in the same code have both and depending on what we are doing switching from one to another ?
Thank you,
Soufiane
Not a direct answer: There is a concept of batched incremental, where only a number of transactions are processed in one given build, and this slides build after build.
This is directly addressing this problem of “big snapshot”: https://www.palantir.com/docs/foundry/transforms-python-spark/incremental-transaction-limits
This is to my knowledge only available in spark for now, but this would be a good feature to have in lightweight too.
Second partial answer: It is possible to switch from spark to lightweight in one click in Pipeline Builder.
I’m not aware of something equivalent in code repository.