Hello, I have a large Object Set and after each spark build object indexing takes about 1 hour. Would incremental spark transform of object backing dataset improve indexing performance and if so which incremental mode should be used? My guess is that in my case modify mode should increase the performance of object indexing.
Hello! We have a section in our documentation that goes into this topic in depth - https://www.palantir.com/docs/foundry/object-indexing/funnel-batch-pipelines/#incremental-and-full-reindexing
To summarize:
- Incremental append-only pipelines can speed up your indexing in OSv2.
- If you append a row to your dataset that has the same primary key as a previous row in your dataset, the newer row will overwrite the older row when it is synced to the Ontology.
- You must ensure that there is only one row per primary key per transaction or your sync will fail.
Thank you for sharing.