I have a project with multiple streaming pipelines. In order to ensure high availability, I have a schedule on each pipeline to build every minute - when one streaming build inevitably fails after some number of days or weeks, another build is immediately kicked off. I would like the ability to start and stop (pause) all these schedules at once, thereby starting or stopping the entire project.
I know that I can add a condition to a schedule using the “When multiple time or event conditions are met” but the extra conditions I can add all detect file changes, rather than value… ideally I’d like a build condition such as “Every minute AND start.txt exists” or maybe “Every minute AND start.txt has value ‘yes’”.
Has anyone done something like this before? Thank you
One avenue could be to rely on a common trigger dataset.
Considering 3 pipelines A, B and C with each their own schedule. You could create a new simple dummy transform which outputs anything, an empty output is fine (something efficient, with a lightweight transform, or no executors). You would then create a schedule D on that trigger dataset to force build every minute. And this dataset would become a trigger to A, B and C, so that they each run when <existing conditions> AND <this new trigger dataset has updated>. This update would happen every minute and always be satisfied, unless you pause specific schedule D.
Thank you for the response! This works the way you described. I set up a “Scheduler Orchestrator” pipeline, which builds every minute with a dummy transform so that its single output dataset updates every minute. Then I was able to use that dataset in my downstream build schedules to say “build every minute AND if dummy dataset is updated”.
This exercise made me realize I actually need something stronger than pausing or unpausing schedules. I actually want the ability to stop/start running builds. We are working on an SDK-driven approach to do this.