Sneak peek into what’s next for Pipeline Builder!

The Pipeline Builder team has been hard at work building the next set of features for you. We wanted to share a sneak peek of upcoming ones we’re excited about. These features will be released in the upcoming months, with opportunities to beta test along the way. Comment below with any thoughts or ideas of what we should do next!

First up, we are introducing histogram charts into your pipeline graph. We’ve seen that many of you hop to Contour to do pipeline debugging or data validation. Now, you’ll be able to start this validation from straight within Pipeline Builder.

Next up, we have reusable custom functions. Right now, you need to manually copy-paste logic across multiple pipelines to reuse it. With this new work, you can now save your custom function and use the same one across all your pipelines! You will be able to version your custom function as well, enabling you to upgrade all your pipelines at once.


Finally, we are adding a new lightweight pipeline type for faster processing on smaller-scale datasets. In initial testing, we’ve seen 5x speedups of builds when moving from batch to lightweight. The technology leverages Apache DataFusion in the backend to optimize pipeline processing on in-memory builds.

We look forward to hearing your thoughts and suggestions as you test out these new features!

15 Likes

Pipeline Builder is amazing and I’m very excited for these changes, particularly not needing to break out to a different tool for validation and having in-line histograms!

One thing that would be great to see in Pipeline Builder is improved JSON handling - whether that be expanded schema inference or partial schemas, dealing with JSON data is something that I typically find myself breaking out into pyspark before heading back into pipeline builder for the rest of the integration.

1 Like

Could we please get a timeline on media set outputs from pipeline builder? Any chance that converting datasets to media sets made its way onto a timeline for planned future releases?

All really exciting enhancements! I’ve got two additional ideas to request:

  1. One of the recent changes that our team has really loved was the ability to toggle the transformations view to see pseudocode. This makes the pipelines a lot easier to scan, when reviewing a proposal, for instance. Something that could take this to the next level would be a way to view the entire pipeline as pseudocode so we could review all of the transforms at a high level the same way we might do a code review. (this would eliminate a lot of clicking around to view different nodes). (This could either be a fully text-based rendering of the pipeline or a vertical graph with the pseudo-code rendered inside each node (similar to the Quiver graph)).

  2. In pipeline builder, we can currently see all of the Pipeline outputs in the right sidebar, but it would be immensely helpful to have a similar view to view all of the inputs to the pipeline. Especially during proposal review, it’s important that we ensure the inputs are all datasets which are being build frequently, or are in a production-type catalog we’re encouraging use for. (Currently we either have to scan the graph for input nodes (for a large pipeline, this can be tedious), or open the pipeline in monocle to review there).

1 Like

Hi Joel,

Media set outputs are in progress! The team has been working hard to enable them and we’re hoping they will be officially released within the next 2 months (hopefully sooner).

Converting datasets to media sets is also in the plans, but we don’t have concrete timelines yet.

Best,
Isy

2 Likes