Query over the incremental load at pipeline

Debiprasad · July 22, 2024, 2:57pm

Dear Community, Could you please advise how we can enable incremental load to be processed on daily basis at pipeline builder ?

Have checked incremental option with output data set options (like append only new rows, Always append new rows etc. ), but not much help.

Any suggestion or guide link with be helpful.
Thanks in advance !

smiglani · July 22, 2024, 3:03pm

Hey @Debiprasad ,

Are you trying to create a schedule for processing an incremental pipeline everyday?

This may help you - https://www.palantir.com/docs/foundry/building-pipelines/create-schedule/

Once you’ve set your pipeline to be incremental, this can then be used to create a schedule to build your output dataset at a frequency you determine.

Debiprasad · July 23, 2024, 10:22am

Thanks for the link, but just need to know how I can add each day’s incremental load to another dataset holding and keep adding the incremental data on daily basis.

Any further guidance ?
Thanks in advance.

helenq · July 23, 2024, 12:35pm

Hi @Debiprasad did you set the input to be incremental as well? I would follow the steps here and let us know if you get stuck somewhere specific: https://www.palantir.com/docs/foundry/building-pipelines/create-incremental-pipeline-pb/

Debiprasad · July 23, 2024, 6:33pm

Thank you again for your support here

Just to mention that as mentioned in the example for the data set flights, that is getting appended or added on daily baisis… I am willing to know how I can keep adding new data to an existing data set… what is happening for flights dataset… not what is happening at filtered_flights.

Just mentioning, I am a bit new to this tool ! so thanks in advance for keep explaining on this.

helenq · July 23, 2024, 7:03pm

Hi @Debiprasad, just to confirm you want to do this in the Pipeline Builder tool right? Do you have a screenshot of your pipeline and what is currently happening that’s not what you’re expecting?

Debiprasad · July 24, 2024, 4:16pm

Dear @helenq ,
I am not able to attach screenshot, the tool is not allowing me to do so.

Trying to explain the need, For me suppose the TEST data set is incremental and (lets assume each day new data is getting appended here while also keeping old records within it). My objective is to have only those incremental data to be transferred to the TEST_SET (each day after pipeline will run TEST_SET will completely refresh with new data only from TEST Dataset) which I’ll use for further processing.

For that purpose also I need to know the ‘Out Put dataset write mode’ option for TEST_SET, what I need to select to keep only the new data coming after each pipeline execution ?

Thanks again for your continued support on this.

helenq · July 24, 2024, 7:58pm

Oh I see, if you want to make an output dataset that only has the newest rows from your latest build, the easiest thing to do would be to just have another output off of TEST’s input and make it a snapshot build instead

If you want it to be downstream of TEST, then you can play around with the Snapshot replace and Snapshot replace and remove output write modes in Pipeline Builder.

If none of those work, is there a timestamp column you could filter on to only get the rows associated with the latest timestamp?

shay · May 13, 2025, 2:38pm

hey @helenq , i’m trying to use append only new rows in the output dataset write mode. i’m filtering out the records based on a timestamp and trying to append them to an existing dataset. so, let’s say if in my output dataset i had some 100 rows and after build some new rows were filtered and it’s supposed to be appended and the count needs to be 120 but it’s being something around 20

helenq · May 13, 2025, 3:25pm

Hey @shay on the first build do you get 100 rows? And then afterwards you only see 20?