Synchronous scheduling for multiple targets

Hey Community, I have a schedule that triggers dataset 1 is updated. I want to trigger dataset 2 to build, and then, after this is complete, have dataset 3 build. Should I do this in two steps, or will one trigger followed by these two targets have the desired effect?

If the flow is Dataset1 → Dataset2 → Dataset3, you can create a single schedule for Dataset3 which triggers when Dataset1 updates, and then enable “include upstream resources”.

There are a few things to consider here:

  • While most experienced FDEs would easily spot how this is connected, a more autopoietic setup would be to set up schedules for each dataset.
  • Which role does Dataset2 play? Is it feeding other transforms, or is it only used for Dataset3 – and if so, should these transforms be merged into one (Dataset1 → Dataset3)?
  • If it feeds into other datasets, when do these need to be updated, and when do they need updated data?
  • If these datasets are very large, it can make a lot of sense to try and coordinate scheduling and building, since you would otherwise end up spending a lot of compute.
2 Likes

Thank you @jakehop! Very helpful.