Time required for synchronization of Ontology

6e36fab80b29b09304f5 · September 16, 2024, 6:58am

Is there any way to reduce the time required to synchronize dataset and Ontology other than reducing the amount of data? Also, what is the easiest way for users to know when the synchronization is complete?

6e36fab80b29b09304f5 · October 10, 2024, 9:31am

Based on my experience, changing the “Interaction” properties in the Ontology settings can sometimes improve speed. Also, I understand that even with the same object, the synchronization time differs significantly depending on whether or not the update involves a change in the schema structure. Is my understanding correct?

sandpiper · October 18, 2024, 11:33pm

These are good questions, and I have answered them to the best of my ability below. In the future, you should raise a separate Topic for each question; when you combine different questions in one Topic, it becomes difficult for people to answer because some people might know the answer to only a subset of them.

As documented at https://www.palantir.com/docs/foundry/object-link-types/metadata-render-hints, this is indeed true for Object Storage V1. The extent to which it also may be true for Object Storage V2 is not currently documented and may be subject to change. It looks like there was a question asked about this before at https://community.palantir.com/t/render-hints-for-osv2/671, so I suggest that you watch that previous question for any updates on this topic.

Yes; a schema change that causes live pipelines to fail will mean that data will not be updated in Object Storage V2 until the replacement pipeline completes, and replacement pipelines generally take longer to run that live pipelines. See https://www.palantir.com/docs/foundry/object-indexing/funnel-batch-pipelines/#live-and-replacement-funnel-pipelines for more information about live and replacement pipelines.

If you are able to architect your data pipeline as a streaming pipeline instead of a batch pipeline, you can then a use Funnel streaming pipeline for the Ontology sync. For batch pipelines, there is not much you can do, though of course if you can make your pipeline incremental, that will improve the end-to-end latency between raw data and Ontology. Funnel has specific behavior for indexing incremental datasets that you can reference. It’s also worth sanity checking that your pipeline logic does not include any nondeterministic operations that could result in many rows having different values between dataset builds even though their corresponding input data did not change; any such behavior would dramatically increase the work that Funnel needs to do in its indexing pipeline.

It’s hard to answer this question without better understanding the use-case (who needs to know this, and what is the workflow that requires them to know it? Do they need to know how fresh the data is in the Ontology, or to they just need to know whether a given dataset build has been reflected or not?). At least for developer users, manually checking the job history in Job Tracker (filtering to the Object Type in-question) or the object’s Datasources tab is standard. Just bear in mind that it may not be obvious what version (transaction) of a dataset was used as input to a given Funnel pipeline, especially if a dataset is built frequently. If you have further questions on this subject, I suggest creating a separate Topic with a more detailed description of the use-case.

6e36fab80b29b09304f5 · October 20, 2024, 10:04pm

Thank you very much for your thoughtful response!