Hi all – there are instances where I’d rather not have pipeline builder run through unnecessary transforms if I can determine whether a condition is met. So far, I just filter to rows where x condition is met and rows where it isn’t and let each branch off into a separate path of transforms – and then I just union (or join where applicable) at the end to ensure I have the most comprehensive set of data. But often in the instance above (where if x condition is met there will be no rows for ‘is y condition met’), I’d just need to compute one path – the one for whom the condition is met.
I haven’t noticed a way I could achieve this in pipeline builder, although it’s fairly simple set-up in PySpark code (would be glad for this to be pointed out to me if it’s there!). Does Pipeline Builder automatically handle this case with optimal efficiency (e.g. stop running a path once the source dataset’s rows = 0), or is there a best practice here – or some upcoming feature I should keep an eye on?