Multiple append outputs in Builder changes are not independent

rponnekanti · May 28, 2024, 6:19pm

Seeing some interesting behavior with output datasets that have append mode. We have a pipeline with 4 dataset outputs that are all append type. On May 13, I made a schema change to 1 of the outputs and only deployed that output, which ran as a Snapshot because of the schema change. All 4 of these are included in the same schedule that builds once a day. The next time all 4 of of them were built, the changed dataset ran as an Append as expected, but the other 3 ran as a Snapshot (erasing all the history that had been built up). And then the following day when it ran again they all ran as Appends. Seems like this isn’t correct behavior here to affect the other datasets in the pipeline.

rponnekanti · May 28, 2024, 6:20pm

This is currently expected behavior. There’s no way to independently replay outputs. Even deploying only a single output in a pipeline will still create snapshots for the rest.

My guess is the best practice here is to only have one append transaction output per builder

taylor · May 31, 2024, 3:16pm

Were the output datasets all in the same job group or had you made different job groups? My intuition would be that different job groups should allow one dataset to be snapshot and another append, but a single job group would have to be run either as a snapshot txn or incrementally.