Hello,
I am struggling a bit to understand the performance and compute costs for an ObjectType backed by a snapshot transaction Type vs an Incremental transaction type.
Intuition tells me that you would need the backing dataset to be incremental for the sync to be incremental as well but I am seeing…
" * Significantly improved performance for Ontology data indexing through incremental object indexing (enabled by default) for all object types." in the documentation here that leads me to think that it doesn’t matter because of some backend magic that makes the sync incremental no matter what.
If anyone can speak to the details on this I would appreciate it!
Thanks in advance,
Paul Burns
You will find more information there: https://www.palantir.com/docs/foundry/object-indexing/funnel-batch-pipelines#incremental-and-full-reindexing
In short: If your backing dataset as a unique set of Primary keys per transaction, then only those “new rows” will be editing/changing the objects currently in the Object Storage backend.
Hence, the syncing from the pipeline to the ontology will be less processing intensive, hence faster.
However, as you can imagine, even if you have a snapshot backing dataset, the Object Storage backend will do smart things to avoid “resyncing all the objects”.
So all in all:
- You have an incremental pipeline => Try to comply with the above condition so that the object syncing will benefit from those incremental transaction
- You have a snapshot pipeline => Don’t “compute the diff” yourself. The ontology backend will do it for you/likely better with other tricks so that the Object syncing is efficient.
- You have a snapshot pipeline but you can easily (without added computation, maybe because of some structural change) make it incremental ? => Try to move to incremental and respect the above condition to benefit from the incremental transactions.
This is helpful, thank you!