Query over the incremental load at pipeline

Dear Community, Could you please advise how we can enable incremental load to be processed on daily basis at pipeline builder ?

Have checked incremental option with output data set options (like append only new rows, Always append new rows etc. ), but not much help.

Any suggestion or guide link with be helpful.
Thanks in advance !

Hey @Debiprasad ,

Are you trying to create a schedule for processing an incremental pipeline everyday?

This may help you - https://www.palantir.com/docs/foundry/building-pipelines/create-schedule/

Once you’ve set your pipeline to be incremental, this can then be used to create a schedule to build your output dataset at a frequency you determine.

Thanks for the link, but just need to know how I can add each day’s incremental load to another dataset holding and keep adding the incremental data on daily basis.

Any further guidance ?
Thanks in advance.

Hi @Debiprasad did you set the input to be incremental as well? I would follow the steps here and let us know if you get stuck somewhere specific: https://www.palantir.com/docs/foundry/building-pipelines/create-incremental-pipeline-pb/

Thank you again for your support here :slight_smile:

Just to mention that as mentioned in the example for the data set flights, that is getting appended or added on daily baisis… I am willing to know how I can keep adding new data to an existing data set… what is happening for flights dataset… not what is happening at filtered_flights.

Just mentioning, I am a bit new to this tool ! so thanks in advance for keep explaining on this.

Hi @Debiprasad, just to confirm you want to do this in the Pipeline Builder tool right? Do you have a screenshot of your pipeline and what is currently happening that’s not what you’re expecting?

Dear @helenq ,
I am not able to attach screenshot, the tool is not allowing me to do so.

Trying to explain the need, For me suppose the TEST data set is incremental and (lets assume each day new data is getting appended here while also keeping old records within it). My objective is to have only those incremental data to be transferred to the TEST_SET (each day after pipeline will run TEST_SET will completely refresh with new data only from TEST Dataset) which I’ll use for further processing.

For that purpose also I need to know the ‘Out Put dataset write mode’ option for TEST_SET, what I need to select to keep only the new data coming after each pipeline execution ?

Thanks again for your continued support on this.

Oh I see, if you want to make an output dataset that only has the newest rows from your latest build, the easiest thing to do would be to just have another output off of TEST’s input and make it a snapshot build instead

If you want it to be downstream of TEST, then you can play around with the Snapshot replace and Snapshot replace and remove output write modes in Pipeline Builder.

If none of those work, is there a timestamp column you could filter on to only get the rows associated with the latest timestamp?