Hey Community, I have a schedule that triggers dataset 1 is updated. I want to trigger dataset 2 to build, and then, after this is complete, have dataset 3 build. Should I do this in two steps, or will one trigger followed by these two targets have the desired effect?
If the flow is Dataset1 → Dataset2 → Dataset3, you can create a single schedule for Dataset3 which triggers when Dataset1 updates, and then enable “include upstream resources”.
There are a few things to consider here:
- While most experienced FDEs would easily spot how this is connected, a more autopoietic setup would be to set up schedules for each dataset.
- Which role does Dataset2 play? Is it feeding other transforms, or is it only used for Dataset3 – and if so, should these transforms be merged into one (Dataset1 → Dataset3)?
- If it feeds into other datasets, when do these need to be updated, and when do they need updated data?
- If these datasets are very large, it can make a lot of sense to try and coordinate scheduling and building, since you would otherwise end up spending a lot of compute.
2 Likes
Thank you @jakehop! Very helpful.