Projections, Triggers and Schedule

Hello everyone,
There might be some crucial setup step I missed, but as far as I can tell right now, there is a design flaw in how projections are used in relation to schedules.

Let’s assume the following: I have a dataset with a region column. There is a set of downstream datasets that use this dataset as input and filter on a subset of regions. So, I have setup a projection on the source dataset with a filter on “region”. I have put the projection on a schedule to immediately build once the base dataset updates.

Now, all downstream datasets have a trigger on the original dataset to start building, when the input dataset has a committed transaction. All of these builds would largely benefit from the projection. However, since both the projection and the downstream datasets are triggered in parallel, the projection does not provide any benefit, because it is out of date.

Request: Could we tie the schedule “trigger” to a projection build completion instead of the actual dataset transaction commit? This way we can ensure to only trigger all the downstream datasets once the projection has been updated. This would make our pipelines much more efficient and put projections to a much better use in connecting schedules that are transaction triggered and not time triggered.

Also, smaller request: In the build of a dataset, could you please show in the UI if the build is hitting a projection or the dataset itself. That would make debugging and troubleshooting much more efficient? (This could be similar to how “incremental” vs. “non-incremental” build status are shown …

1 Like

Hey @SimonH - thanks for your message!

There is a workaround available in these cases where you can trigger one schedule based on another schedule successfully finishing.

So in this case your setup would look like this:

dataset A → projection A’ → downstream target datasets

Schedule 1: Build projection A’ when dataset A updates

Schedules 2…: Build downstream target dataset X when Schedule 1 succeeds (this can be done for each of the downstream target datasets).

This should solve the parallel schedules problem you mention.

For more information on event-based scheduled triggers, please refer to the Scheduler documentation here

1 Like

Okay, I will give it a try. Sounds reasonable, yet not really convenient …

No worries!

Another option also worth considering here is to use the Job succeeded event (instead of Schedule ran successfully event) on the projection

Documentation for this as well can be found in the link above.

1 Like