Handling Incremental Data with Transformations in Pipeline Builder

Hi everyone,

I’m working on a project in Pipeline Builder where I’ve applied several transformations (filtering, joining, aggregations, pivoting, etc.) and then deployed the result as an output dataset.

The challenge I’m facing is with incremental input data.

  • Since the dataset is very large, I only want to build the pipeline once a week (instead of daily).

  • However, when I do this, the pipeline only applies my transformations to the new incremental rows from that week.

  • This leads to misleading results. For example:

    • Let’s say I initially calculate a yearly average of a column (using snapshot mode).

    • On the next weekly build, the average is computed only on the last 7 days of data, completely ignoring the previous year.

    • As a result, the average is no longer representative.

I could potentially solve this for averages by using rolling calculations, but this problem extends to other transformations (like joins, pivots, and aggregations) where rolling approaches don’t work.

What I need:
Even though I want to build the pipeline only once a week and process only the new rows, I want my transformations to still take into account the entire historical dataset, not just the weekly increment.

Has anyone solved a similar issue, or is there a recommended approach in Palantir for handling this kind of incremental + historical transformation scenario?

Thanks in advance!

Could you split this into two datasets where one is incremental and one is a snapshot? The snapshot would be the full historical dataset but only performs the average (or other metrics where you need the entire dataset) And then you could join the two.

Or – could you have a downstream dataset that looks at the averages across the incremental dataset and recalculates the average to be the average across all rows?

1 Like