Faster previews in Pipeline Builder

krandoing · June 25, 2024, 6:52am

Faster previews in Pipeline Builder

Pipeline Builder previews are now significantly faster due to caching enhancements. Internal tests show a reduction in computation times from 40 seconds to 1 second, speeding up resource intensive previews by 40x. This feature is now generally available on all enrollments.

Faster previews powered by caching improvements

Caching improvements now allow nodes in Pipeline Builder to use cached, or “stored” results from upstream nodes that have already been previewed. This allows downstream nodes to skip recomputes and swiftly display your data previews.

Pipeline Builder’s improved caching features include:

Decreased redundant computations: Previews in Pipeline Builder now only compute additional nodes when previewing downstream of cached nodes.
Efficient caching: Computationally expensive nodes such as joins and use LLM are now proactively cached, allowing downstream nodes to avoid repetitive and time-consuming computations.

Users will benefit from snappy previews and decreased processing costs, saving time and resources when working downstream of expensive nodes. A lightning bolt icon will appear on previews that use cached upstream preview results, as shown below:

Improved node caching

Before this change, if you previewed a node twice without any logic changes, it would cache the results from the first preview and reuse them for the second preview. If you then previewed a downstream node, it would not have access to other cached node results. Upstream nodes needed to be recomputed from scratch.

Now, all node previews in Pipeline Builder can make use of cached results, so only additional downstream nodes need to be computed.

Take the following example dataset:

If you preview C, nodes Dataset → A → B → C are computed. Before this change, if you then preview D:

Nodes Dataset → A → B → C are recomputed, in addition to D.

After this change, if you preview D after C:

Only nodes C → D are computed, because node D can now use cached results from node C.

Note that all node previews compute up to 500 rows. Operations that will not benefit from this feature include operations that change row counts, joins, aggregations, or changes in logic.

Learn more about Pipeline Builder previews.