Hi I’m trying to combine incremental functionality with non-incremental in a code repo, and unsure how to approach this task. I’d like advice on implementing a series of transforms within a transform generator that outputs a new transformed output dataset for each input dataset, yet also have a single lookup dataset that gets appended to in each transform and is output in an incremental fashion (starting with a new dataset at the beginning of the loop of transforms yet is appended to with each transform)
caveats:
- you can’t have multiple transforms writing to the same dataset (and thus even attempting to use incremental decorators won’t work here)
- using **inputs to bring in multiple input to a single transform and process all the files would be difficult to avoid OOMs as the datasets are billions of rows