Using Incremental transforms to shorten object indexing time OSv2

CodeWizard · July 10, 2024, 12:26pm

Hello, I have a large Object Set and after each spark build object indexing takes about 1 hour. Would incremental spark transform of object backing dataset improve indexing performance and if so which incremental mode should be used? My guess is that in my case modify mode should increase the performance of object indexing.

dherls · July 11, 2024, 4:58pm

Hello! We have a section in our documentation that goes into this topic in depth - https://www.palantir.com/docs/foundry/object-indexing/funnel-batch-pipelines/#incremental-and-full-reindexing

To summarize:

Incremental append-only pipelines can speed up your indexing in OSv2.
If you append a row to your dataset that has the same primary key as a previous row in your dataset, the newer row will overwrite the older row when it is synced to the Ontology.
You must ensure that there is only one row per primary key per transaction or your sync will fail.

CodeWizard · July 11, 2024, 8:46pm

Thank you for sharing.