Thanks for the link, but just need to know how I can add each day’s incremental load to another dataset holding and keep adding the incremental data on daily basis.
Hi @Debiprasad did you set the input to be incremental as well? I would follow the steps here and let us know if you get stuck somewhere specific: https://www.palantir.com/docs/foundry/building-pipelines/create-incremental-pipeline-pb/
Just to mention that as mentioned in the example for the data set flights, that is getting appended or added on daily baisis… I am willing to know how I can keep adding new data to an existing data set… what is happening for flights dataset… not what is happening at filtered_flights.
Just mentioning, I am a bit new to this tool ! so thanks in advance for keep explaining on this.
Hi @Debiprasad, just to confirm you want to do this in the Pipeline Builder tool right? Do you have a screenshot of your pipeline and what is currently happening that’s not what you’re expecting?
Dear @helenq ,
I am not able to attach screenshot, the tool is not allowing me to do so.
Trying to explain the need, For me suppose the TEST data set is incremental and (lets assume each day new data is getting appended here while also keeping old records within it). My objective is to have only those incremental data to be transferred to the TEST_SET (each day after pipeline will run TEST_SET will completely refresh with new data only from TEST Dataset) which I’ll use for further processing.
For that purpose also I need to know the ‘Out Put dataset write mode’ option for TEST_SET, what I need to select to keep only the new data coming after each pipeline execution ?
Oh I see, if you want to make an output dataset that only has the newest rows from your latest build, the easiest thing to do would be to just have another output off of TEST’s input and make it a snapshot build instead
If you want it to be downstream of TEST, then you can play around with the Snapshot replace and Snapshot replace and remove output write modes in Pipeline Builder.
If none of those work, is there a timestamp column you could filter on to only get the rows associated with the latest timestamp?