Data from parallel branches disappears in Union/Combine after Deploy, but appears in Preview

Hello, community!

I’m facing a challenge with a data pipeline I’ve built and was hoping someone might have encountered a similar issue or could offer some insights.

Pipeline Context

I am building a pipeline that processes persona data, applies 9 different analytical models in parallel, and then joins all the results into a single final audience dataset.

The flow is essentially as follows:

  1. Input and Initial Transformation: I ingest the data and perform some initial transformations.

  2. Parallel Processing: The flow is split into 9 parallel branches, one for each model (from “Model 1” to “Model 9”). Each branch contains its own processing logic.

  3. Final Union: At the end, I use a “Combine datasets by name” node (a type of Union) to merge the results from all 9 branches into a final dataset.

The Problem

The strange behavior occurs at the final union step.

  1. In Preview Mode: When I click on the “Combine datasets by name” node and run a preview of the data, the result looks correct. I can see the aggregated data from several of my models, as shown in the image below.

  2. After Deployment: However, after I deploy and run the full pipeline, the final generated dataset is incorrect. Specifically, the data from the “Model 8” and “Model 9” branches is completely missing. It’s simply not in the final output.

What I’ve Checked So Far

  • The configuration of the nodes for Models 8 and 9 appears identical to the other branches that are working.

  • There are no obvious filters in these branches that would eliminate all data during a full run.

  • I suspect the issue might be related to how the Union node behaves during a full deployment (with the entire dataset) versus in preview mode (which typically uses a data sample).

Hmm.. can you try adding a check point on the last green node in your screenshot? You can materialize the dataset too to help you debug but I guess I’m mainly curious if at that point there are definitely rows from Model 8 and 9

https://www.palantir.com/docs/foundry/pipeline-builder/management-checkpoints/