Understanding how "snapshot only new rows" works

hsahni · January 13, 2025, 8:02pm

Hi! I have an incremental PB question – if i configure an output to the write mode: “snapshot only new rows” and hit build twice (without the input data changing), shouldn’t I expect an empty dataset on the second run? asking because I see the same dataset between runs 1 and 2, so I’m probably not understanding something.

my input data is updated via snapshot txns if it matters, and i set the anti-join PK to be all columns.

ty!

david · January 13, 2025, 8:09pm

Couple questions: first, is there any non determinism in the pipeline (eg. create uuid, llm calls)? Second, if you go to the output dataset and look at the latest transaction, does it show as completing successfully?

hsahni · January 13, 2025, 9:11pm

no non-determinism!
latest txn does look to have completed successfully

I just tried with notional data (2 rows of unchanging, snapshotted data) and it works as expected: I see 0 rows for the second build.

However, with this much larger dataset (174m rows, 133cols), I do not get 0 rows and instead see all the rows after the second build

system · March 14, 2025, 9:12pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.